[jira] Commented: (MAPREDUCE-899) When using LinuxTaskController, localized files may become accessible to unintended users if permissions are misconfigured.

2010-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805823#action_12805823
 ] 

Hadoop QA commented on MAPREDUCE-899:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12431637/patch-899-4.txt
  against trunk revision 903563.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/412/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/412/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/412/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/412/console

This message is automatically generated.

> When using LinuxTaskController, localized files may become accessible to 
> unintended users if permissions are misconfigured.
> ---
>
> Key: MAPREDUCE-899
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-899
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-899-20090828.txt, patch-899-1.txt, 
> patch-899-2.txt, patch-899-3.txt, patch-899-4.txt, patch-899.txt
>
>
> To enforce the accessibility of job files to only the job-owner and the 
> TaskTracker, as per MAPREDUCE-842, it is _trusted_ that the  setuid/setgid 
> linux TaskController binary is group owned by a _special group_ to which only 
> TaskTracker belongs and not just any group to which TT belongs. If the trust 
> is broken, possibly due to misconfiguration by admins, the local files become 
> accessible to unintended users, yet giving false sense of security to the 
> admins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1421) TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser fail on trunk

2010-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805822#action_12805822
 ] 

Hadoop QA commented on MAPREDUCE-1421:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12431638/patch-1421.txt
  against trunk revision 903563.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/291/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/291/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/291/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/291/console

This message is automatically generated.

> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk
> 
>
> Key: MAPREDUCE-1421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker, test
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1421.txt, TestJobExecutionAsDifferentUser.patch
>
>
> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk after the commit of MAPREDUCE-1385

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-899) When using LinuxTaskController, localized files may become accessible to unintended users if permissions are misconfigured.

2010-01-27 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805814#action_12805814
 ] 

Hemanth Yamijala commented on MAPREDUCE-899:


I think we can simplify the approach still. 

Please note that on other JIRAs, we have assumed that the task-controller 
binary would be set up with permissions and ownership that prevent misuse. 
Specifically, the binary would be setuid/setgid executable, owned by root/a 
special group, and importantly, other users will not have any permissions. I 
had proposed on this JIRA earlier on that we trust administrators to setup the 
special group correctly and specify it in the taskcontroller.cfg file. 

Following up on this, I think if we verify that only root and the admin 
specified group can execute the file in a setuid mode, then I think we are 
pretty much done. We should specifically check that others *cannot* execute the 
file. Can we change the approach in the patch to match this ? Any concerns ?

> When using LinuxTaskController, localized files may become accessible to 
> unintended users if permissions are misconfigured.
> ---
>
> Key: MAPREDUCE-899
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-899
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-899-20090828.txt, patch-899-1.txt, 
> patch-899-2.txt, patch-899-3.txt, patch-899-4.txt, patch-899.txt
>
>
> To enforce the accessibility of job files to only the job-owner and the 
> TaskTracker, as per MAPREDUCE-842, it is _trusted_ that the  setuid/setgid 
> linux TaskController binary is group owned by a _special group_ to which only 
> TaskTracker belongs and not just any group to which TT belongs. If the trust 
> is broken, possibly due to misconfiguration by admins, the local files become 
> accessible to unintended users, yet giving false sense of security to the 
> admins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1420) TestTTResourceReporting failing in trunk

2010-01-27 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1420:
--

Description: 
TestTTResourceReporting failing in trunk. 

The most specific issue from the logs seems to be : Error executing shell 
command org.apache.hadoop.util.Shell$ExitCodeException: kill: No such process 

Link :
http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/217/

Attaching output in a  file.



  was:
TestTTResourceReporting failing in trunk. 

The most specific issue from the logs seems to be : Error executing shell 
command org.apache.hadoop.util.Shell$ExitCodeException: kill: No such process 

Link :
http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/217/

Giving the complete raw output:

[junit] 2010-01-26 14:49:47,885 INFO mapred.JobQueueTaskScheduler 
(TestTTResourceReporting.java:assignTasks(159)) - expected memory values : 
(totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
availablePhysicalMemoryOnTT, mapSlotMemSize, reduceSlotMemorySize, 
cumulativeCpuTime, cpuFrequency, numProcessors) = (-1, -1,-1, 
-1,-1,-1,-1,-1,-1,-1.0) [junit] reported memory values : 
(totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
availablePhysicalMemoryOnTT, reportedMapSlotMemorySize, 
reportedReduceSlotMemorySize, reportedCumulativeCpuTime, reportedCpuFrequency, 
reportedNumProcessors) = (-1, -1,-1, -1,-1,-1,-1,-1,-1,-1.0) [junit] 2010-01-26 
14:49:47,930 WARN conf.Configuration (Configuration.java:set(601)) - 
jobclient.output.filter is deprecated. Instead, use 
mapreduce.client.output.filter [junit] 2010-01-26 14:49:47,943 WARN 
conf.Configuration (Configuration.java:set(601)) - 
mapred.used.genericoptionsparser is deprecated. Instead, use 
mapreduce.client.genericoptionsparser.used [junit] 2010-01-26 14:49:48,013 WARN 
mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(226)) - No job 
jar file set. User classes may not be found. See Job or Job#setJar(String). 
[junit] 2010-01-26 14:49:48,088 WARN conf.Configuration 
(Configuration.java:set(601)) - mapred.map.tasks is deprecated. Instead, use 
mapreduce.job.maps [junit] 2010-01-26 14:49:48,088 INFO mapreduce.JobSubmitter 
(JobSubmitter.java:submitJobInternal(351)) - number of splits:1 [junit] 
2010-01-26 14:49:48,293 WARN conf.Configuration 
(Configuration.java:handleDeprecation(332)) - 
mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
mapreduce.job.committer.setup.cleanup.needed [junit] 2010-01-26 14:49:48,327 
INFO mapred.JobTracker (JobTracker.java:addJob(3017)) - Job 
job_20100126144930543_0001 added successfully for user 'hudson' to queue 
'default' [junit] 2010-01-26 14:49:48,328 INFO mapred.JobTracker 
(JobTracker.java:initJob(3192)) - Initializing job_20100126144930543_0001 
[junit] 2010-01-26 14:49:48,330 INFO mapreduce.Job 
(Job.java:monitorAndPrintJob(999)) - Running job: job_20100126144930543_0001 
[junit] 2010-01-26 14:49:48,333 INFO mapred.JobInProgress 
(JobInProgress.java:initTasks(591)) - Initializing job_20100126144930543_0001 
[junit] 2010-01-26 14:49:48,369 INFO jobhistory.JobHistory 
(JobHistory.java:setupEventWriter(242)) - SetupWriter, creating file 
file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_hudson
 [junit] 2010-01-26 14:49:48,549 INFO jobhistory.JobHistory 
(JobHistory.java:setupEventWriter(256)) - LogDirConfPath is 
file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_conf.xml
 [junit] about to write out: token = 1; sec = 0 [junit] 2010-01-26 14:49:48,638 
INFO mapred.JobInProgress (JobInProgress.java:generateAndStoreTokens(3567)) - 
jobToken generated and stored with users keys in 
/tmp/hadoop-hudson/mapred/system/job_20100126144930543_0001/jobToken [junit] 
2010-01-26 14:49:48,645 INFO mapred.JobInProgress 
(JobInProgress.java:createMapTasks(722)) - Input size for job 
job_20100126144930543_0001 = 0. Number of splits = 1 [junit] 2010-01-26 
14:49:48,647 INFO mapred.JobInProgress (JobInProgress.java:initTasks(653)) - 
Job job_20100126144930543_0001 initialized successfully with 1 map tasks and 1 
reduce tasks. [junit] 2010-01-26 14:49:49,335 INFO mapreduce.Job 
(Job.java:monitorAndPrintJob(1013)) - map 0% reduce 0% [junit] 2010-01-26 
14:49:50,906 INFO mapred.JobTracker (JobTracker.java:createTaskEntry(1770)) - 
Adding task (JOB_SETUP) 'attempt_20100126144930543_0001_m_02_0' to tip 
task_20100126144930543_0001_m_02, for tracker 
'tracker_host0.foo.com:localhost/127.0.0.1:41432' [junit] 2010-01-26 
14:49:50,915 INFO mapred.TaskTracker (TaskTracker.java:registerTask(2059)) - 
LaunchTaskAction (registerTask): attempt_20100126144930543_0001_m_02_0 
task's state:UNASSIGNED [junit] 2010-01-26 14:49:50,917 INFO mapr

[jira] Updated: (MAPREDUCE-1420) TestTTResourceReporting failing in trunk

2010-01-27 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1420:
--

Attachment: output.rtf

The output in a readable format

> TestTTResourceReporting failing in trunk
> 
>
> Key: MAPREDUCE-1420
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1420
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Scott Chen
> Attachments: MAPREDUCE-1420-v1.patch, output.rtf
>
>
> TestTTResourceReporting failing in trunk. 
> The most specific issue from the logs seems to be : Error executing shell 
> command org.apache.hadoop.util.Shell$ExitCodeException: kill: No such process 
> Link :
> http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/217/
> Giving the complete raw output:
> [junit] 2010-01-26 14:49:47,885 INFO mapred.JobQueueTaskScheduler 
> (TestTTResourceReporting.java:assignTasks(159)) - expected memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, mapSlotMemSize, reduceSlotMemorySize, 
> cumulativeCpuTime, cpuFrequency, numProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] reported memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, reportedMapSlotMemorySize, 
> reportedReduceSlotMemorySize, reportedCumulativeCpuTime, 
> reportedCpuFrequency, reportedNumProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] 2010-01-26 14:49:47,930 WARN 
> conf.Configuration (Configuration.java:set(601)) - jobclient.output.filter is 
> deprecated. Instead, use mapreduce.client.output.filter [junit] 2010-01-26 
> 14:49:47,943 WARN conf.Configuration (Configuration.java:set(601)) - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used [junit] 2010-01-26 14:49:48,013 
> WARN mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(226)) - 
> No job jar file set. User classes may not be found. See Job or 
> Job#setJar(String). [junit] 2010-01-26 14:49:48,088 WARN conf.Configuration 
> (Configuration.java:set(601)) - mapred.map.tasks is deprecated. Instead, use 
> mapreduce.job.maps [junit] 2010-01-26 14:49:48,088 INFO 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(351)) - number of 
> splits:1 [junit] 2010-01-26 14:49:48,293 WARN conf.Configuration 
> (Configuration.java:handleDeprecation(332)) - 
> mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
> mapreduce.job.committer.setup.cleanup.needed [junit] 2010-01-26 14:49:48,327 
> INFO mapred.JobTracker (JobTracker.java:addJob(3017)) - Job 
> job_20100126144930543_0001 added successfully for user 'hudson' to queue 
> 'default' [junit] 2010-01-26 14:49:48,328 INFO mapred.JobTracker 
> (JobTracker.java:initJob(3192)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,330 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(999)) - Running job: job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,333 INFO mapred.JobInProgress 
> (JobInProgress.java:initTasks(591)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,369 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(242)) - SetupWriter, creating file 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_hudson
>  [junit] 2010-01-26 14:49:48,549 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(256)) - LogDirConfPath is 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_conf.xml
>  [junit] about to write out: token = 1; sec = 0 [junit] 2010-01-26 
> 14:49:48,638 INFO mapred.JobInProgress 
> (JobInProgress.java:generateAndStoreTokens(3567)) - jobToken generated and 
> stored with users keys in 
> /tmp/hadoop-hudson/mapred/system/job_20100126144930543_0001/jobToken [junit] 
> 2010-01-26 14:49:48,645 INFO mapred.JobInProgress 
> (JobInProgress.java:createMapTasks(722)) - Input size for job 
> job_20100126144930543_0001 = 0. Number of splits = 1 [junit] 2010-01-26 
> 14:49:48,647 INFO mapred.JobInProgress (JobInProgress.java:initTasks(653)) - 
> Job job_20100126144930543_0001 initialized successfully with 1 map tasks and 
> 1 reduce tasks. [junit] 2010-01-26 14:49:49,335 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(1013)) - map 0% reduce 0% [junit] 2010-01-26 
> 14:49:50,906 INFO mapred.JobTracker (JobTracker.java:createTaskEntry(1770)) - 
> Adding task (JOB_SETUP) 'attempt_20100126144930543_0001_m_02_0' to tip 
> task_20100126144930543_0001_m_02, for tracker 
> 'tracker_host0

[jira] Commented: (MAPREDUCE-1421) TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser fail on trunk

2010-01-27 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805797#action_12805797
 ] 

Vinod K V commented on MAPREDUCE-1421:
--

bq. We should have an easier way to run all tests that use the 
linux-task-controller rather than having to guess that.
While these tests were originally checked in, we argued fiercely for and 
against having a separate ant target to run just these tests and eventually 
didn't add one. Time we get one in, keeping in view all the recent history of 
these tests?

> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk
> 
>
> Key: MAPREDUCE-1421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker, test
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1421.txt, TestJobExecutionAsDifferentUser.patch
>
>
> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk after the commit of MAPREDUCE-1385

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1410) Task-Cleanup attempt cannot be given KillTaskAction if the main attempt is killed with a KillTaskAction

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1410:
---

Status: Open  (was: Patch Available)

> Task-Cleanup attempt cannot be given KillTaskAction if the main attempt is 
> killed with a KillTaskAction
> ---
>
> Key: MAPREDUCE-1410
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1410
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1410.txt
>
>
> If the main attempt is killed with a KillTaskAction and is added to 
> tasksReportedClosed, then the cleanup-attempt for the task (with same id) can 
> not be given a KillTaskAction, since tasksReportedClosed already contains the 
> attemptID.  
> The attemptID should be removed from  tasksReportedClosed in 
> incompleteSubTask() method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1410) Task-Cleanup attempt cannot be given KillTaskAction if the main attempt is killed with a KillTaskAction

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1410:
---

Status: Patch Available  (was: Open)

submitting for hudson

> Task-Cleanup attempt cannot be given KillTaskAction if the main attempt is 
> killed with a KillTaskAction
> ---
>
> Key: MAPREDUCE-1410
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1410
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1410.txt
>
>
> If the main attempt is killed with a KillTaskAction and is added to 
> tasksReportedClosed, then the cleanup-attempt for the task (with same id) can 
> not be given a KillTaskAction, since tasksReportedClosed already contains the 
> attemptID.  
> The attemptID should be removed from  tasksReportedClosed in 
> incompleteSubTask() method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1421) TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser fail on trunk

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1421:
---

Status: Patch Available  (was: Open)

Submitting for hudson

> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk
> 
>
> Key: MAPREDUCE-1421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker, test
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1421.txt, TestJobExecutionAsDifferentUser.patch
>
>
> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk after the commit of MAPREDUCE-1385

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-01-27 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805779#action_12805779
 ] 

Aaron Kimball commented on MAPREDUCE-1126:
--

Why are proposals now focusing on allowing users to specify different 
serialization *factories?*

If allowing users to specify the use of a particular {{SerializationBase}} via 
a flexible metadata map is considered too obscure, then I feel like the notion 
of having separate {{SerializationFactory}} instances seems to be an 
unnecessary level of abstraction. The current {{SerializationFactory}} 
implemented in hadoop-common allows access to all {{SerializationBase}} 
instances. If the focus is on user-accessibility of API, asking users to define 
a SerializationFactory which will only produce a single {{SerializationBase}} 
(e.g., {{WritableSerialization}} or {{AvroGenericSerialization}}) requires 
needless one-off code, and clutters the class hierarchy.

Instead, I might understand adding an API such as 
{{Job.setSerializationBase(ctxt, SerializationBase}}) where users directly set 
the {{SerializationBase}} instance to use in a given context, and disregard the 
{{SerializationFactory}} entirely.

For what it's worth, the patches that I and Tom have produced all make use of 
the default {{SerializationFactory}} in Hadoop; this API then uses the metadata 
map as defined in HADOOP-6165 to acquire the user's desired 
{{SerializationBase}} instance as appropriate for each of map output key, 
value, etc.


> shuffle should use serialization to get comparator
> --
>
> Key: MAPREDUCE-1126
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Doug Cutting
>Assignee: Aaron Kimball
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch, MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1421) TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser fail on trunk

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1421:
---

Attachment: patch-1421.txt

Updated Devaraj's patch for TestDebugScript to look at currentUser's group in 
the verification.

Both the tests passed on my machine.

> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk
> 
>
> Key: MAPREDUCE-1421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker, test
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1421.txt, TestJobExecutionAsDifferentUser.patch
>
>
> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk after the commit of MAPREDUCE-1385

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-646) distcp should place the file distcp_src_files in distributed cache

2010-01-27 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-646:
---

Release Note: Patch increases the replication factor of _distcp_src_files 
to sqrt(min(maxMapsOnCluster, totalMapsInThisJob)) sothat many maps won't 
access the same replica of the file _distcp_src_files at the same time.

> distcp should place the file distcp_src_files in distributed cache
> --
>
> Key: MAPREDUCE-646
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-646
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: d_replica_srcfilelist.patch, 
> d_replica_srcfilelist_v1.patch, d_replica_srcfilelist_v2.patch
>
>
> When large number of files are being copied by distcp, accessing 
> distcp_src_files seems to be an issue, as all map tasks would be accessing 
> this file. The error message seen is:
> 09/06/16 10:13:16 INFO mapred.JobClient: Task Id : 
> attempt_200906040559_0110_m_003348_0, Status : FAILED
> java.io.IOException: Could not obtain block: blk_-4229860619941366534_1500174
> file=/mapredsystem/hadoop/mapredsystem/distcp_7fiyvq/_distcp_src_files
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1757)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1585)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1712)
> at java.io.DataInputStream.readFully(DataInputStream.java:178)
> at java.io.DataInputStream.readFully(DataInputStream.java:152)
> at 
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
> at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428)
> at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
> at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
> at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:43)
> at 
> org.apache.hadoop.tools.DistCp$CopyInputFormat.getRecordReader(DistCp.java:299)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:336)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> This could be because of HADOOP-6038 and/or HADOOP-4681.
> If distcp places this special file distcp_src_files in distributed cache, 
> that could solve the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-899) When using LinuxTaskController, localized files may become accessible to unintended users if permissions are misconfigured.

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-899:
--

Status: Patch Available  (was: Open)

> When using LinuxTaskController, localized files may become accessible to 
> unintended users if permissions are misconfigured.
> ---
>
> Key: MAPREDUCE-899
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-899
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-899-20090828.txt, patch-899-1.txt, 
> patch-899-2.txt, patch-899-3.txt, patch-899-4.txt, patch-899.txt
>
>
> To enforce the accessibility of job files to only the job-owner and the 
> TaskTracker, as per MAPREDUCE-842, it is _trusted_ that the  setuid/setgid 
> linux TaskController binary is group owned by a _special group_ to which only 
> TaskTracker belongs and not just any group to which TT belongs. If the trust 
> is broken, possibly due to misconfiguration by admins, the local files become 
> accessible to unintended users, yet giving false sense of security to the 
> admins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-899) When using LinuxTaskController, localized files may become accessible to unintended users if permissions are misconfigured.

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-899:
--

Attachment: patch-899-4.txt

Patch updated to trunnk.

All LinuxTaskController tests passed except 
TestDebugScrpitWithLinuxTaskController, TestJobExecutionAsDifferentUser (due to 
MAPREDUCE-1421) and TestStreamingAsDifferentUser (MAPREDUCE-1322). 

> When using LinuxTaskController, localized files may become accessible to 
> unintended users if permissions are misconfigured.
> ---
>
> Key: MAPREDUCE-899
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-899
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-899-20090828.txt, patch-899-1.txt, 
> patch-899-2.txt, patch-899-3.txt, patch-899-4.txt, patch-899.txt
>
>
> To enforce the accessibility of job files to only the job-owner and the 
> TaskTracker, as per MAPREDUCE-842, it is _trusted_ that the  setuid/setgid 
> linux TaskController binary is group owned by a _special group_ to which only 
> TaskTracker belongs and not just any group to which TT belongs. If the trust 
> is broken, possibly due to misconfiguration by admins, the local files become 
> accessible to unintended users, yet giving false sense of security to the 
> admins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1421) TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser fail on trunk

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805771#action_12805771
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1421:


Looks like the validation itself is wrong in TestDebugScript:
{code}
-  String ttGroup = UnixUserGroupInformation.login().getGroupNames()[0];
+  Groups groups = new Groups(new Configuration());
+  String ttGroup = groups.getGroups(expectedUser).get(0);
{code}

The above code should look at group of the getCurrentUser(), not expectedUser.

> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk
> 
>
> Key: MAPREDUCE-1421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker, test
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: TestJobExecutionAsDifferentUser.patch
>
>
> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk after the commit of MAPREDUCE-1385

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1421) TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser fail on trunk

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805769#action_12805769
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1421:


There is no exception in the test TestDebugScriptWithLinuxTaskController. It 
fails with following assertion failure when i run with 
-Dtaskcontroller-ugi=nobody,nobody. 
{noformat}
Testcase: testDebugScriptExecutionAsDifferentUser took 18.307 sec
FAILED
Path 
/home/amarsri/workspace/trunk/build/test/logs/userlogs/attempt_20100127150051180_0001_m_00_0/debugout
 is group owned not by nobody but by users
junit.framework.AssertionFailedError: Path 
/home/amarsri/workspace/trunk/build/test/logs/userlogs/attempt_20100127150051180_0001_m_00_0/debugout
 is group owned not by nobody but by users
at 
org.apache.hadoop.mapred.TestTaskTrackerLocalization.checkFilePermissions(TestTaskTrackerLocalization.java:291)
at 
org.apache.hadoop.mapred.TestDebugScript.verifyDebugScriptOutput(TestDebugScript.java:166)
at 
org.apache.hadoop.mapred.TestDebugScriptWithLinuxTaskController.testDebugScriptExecutionAsDifferentUser(TestDebugScriptWithLinuxTaskController.java:61)
{noformat}

> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk
> 
>
> Key: MAPREDUCE-1421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker, test
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: TestJobExecutionAsDifferentUser.patch
>
>
> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk after the commit of MAPREDUCE-1385

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1421) TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser fail on trunk

2010-01-27 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-1421:
---

Attachment: TestJobExecutionAsDifferentUser.patch

Attaching a patch that fixes TestJobExecutionAsDifferentUser. I had missed 
fixing this testcase in MAPREDUCE-1385. I didn't know that I needed to run this 
using linux-task-controller. We *should* have an easier way to run all tests 
that use the linux-task-controller rather than having to guess that.. 
TestDebugScriptWithLinuxTaskController works in my box. What exception do you 
see?

> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk
> 
>
> Key: MAPREDUCE-1421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker, test
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: TestJobExecutionAsDifferentUser.patch
>
>
> TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser 
> fail on trunk after the commit of MAPREDUCE-1385

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1421) TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser fail on trunk

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)
TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser fail 
on trunk


 Key: MAPREDUCE-1421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker, test
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
 Fix For: 0.22.0


TestDebugScriptWithLinuxTaskController and TestJobExecutionAsDifferentUser fail 
on trunk after the commit of MAPREDUCE-1385

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-01-27 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805760#action_12805760
 ] 

Arun C Murthy commented on MAPREDUCE-1126:
--

bq. Great points, Chris. Yahoo! has stated that a significant majority of their 
MapReduce jobs are written in Pig, and Facebook says the same of Hive. Among 
our many customers at Cloudera, it's far more common to target the MapReduce 
execution engine with a higher level language rather than the Java API. What 
you propose as the common case, then, appears to be uncommon in practice.

Uh, no. That is precisely the point - making it slightly harder on _framework_ 
authors is better than making it harder for the average users of the Map-Reduce 
api. Only the framework authors pay the cost...

 

Along similar sentiments I'd like to re-state:

{quote}
1. We should use the current global serializer factory for all contexts of a 
job.
4. Only the default comparator should come from the serializer. The user has to 
be able to override it in the framework (not change the serializer factory).
{quote}

I'm not convinced we need to allow multiple serialization mechanism for the 
same job. I'm also much less convinced that we need to allow a serializer per 
map-in-key, map-in-value, map-out-key, map-out-value, reduce-out-key, 
reduce-out-value etc.

I can see that we might have some phase of transition where people might move 
from Writables to Avro as the preferred serialization mechanism. For e.g. they 
might have SequenceFiles with Writables as input-records and might produce 
SequenceFiles with Avro output-records. However, even with a single 
serializer-factory for all contexts of a job it is trivial to write wrappers, 
provide bridges in libraries or other frameworks etc. to cross the chasm.



At a later point, *iff* we get to a world where we need to console multiple 
serialization mechanisms for the same job on a regular basis e.g. a world where 
we have a lot of data in Writables *and* Avro *and* Thrift etc. I'd like to 
propose a slightly less involved version of Chris's proposal.

The simplification is that we have view 4 separate 'record contexts':
# INPUT (map-in-key, map-in-value)
# INTERMEDIATE (map-out-key, map-out-value)
# OUTPUT (reduce-out-key, reduce-out-value)
# JOB_DEFINITION (currently only InputSplit, possibly more in future via 
MAPREDUCE-1183)

Then we have Chris's proposal:

{noformat}
enum Context {
  INPUT,
  INTERMEDIATE,
  OUTPUT,
  JOB_DEFINITION
}

Job::setSerializationFactory(Context context, SerializationFactory...)
{noformat}

Thus we allow serializers to be specified for the 'records' flowing through the 
Map-Reduce framework... allowing map-in-key and map-in-value to have different 
serialization mechanisms seems like an over-kill. Do we have use-cases for such 
requirements? 

> shuffle should use serialization to get comparator
> --
>
> Key: MAPREDUCE-1126
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Doug Cutting
>Assignee: Aaron Kimball
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch, MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1383) Allow storage and caching of delegation token.

2010-01-27 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805757#action_12805757
 ] 

Devaraj Das commented on MAPREDUCE-1383:


And yes, as per the offline discussion with Kan you need to have host:port 
instead of URIs for the service field of the delegation tokens.

> Allow storage and caching of delegation token.
> --
>
> Key: MAPREDUCE-1383
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1383
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Boris Shkolnik
> Attachments: MAPREDUCE-1383-1.patch, MAPREDUCE-1383-2.patch, 
> MAPREDUCE-1383-5.patch, MAPREDUCE-1383-6.patch
>
>
> Client needs to obtain delegation tokens from all the NameNodes it is going 
> to work with and pass it to the application.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1383) Allow storage and caching of delegation token.

2010-01-27 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805754#action_12805754
 ] 

Devaraj Das commented on MAPREDUCE-1383:


Some comments:
1) Remove the LOG.info statements from TokenCache
2) Please add a method in TokenCache to load tokens from a file specified in 
the argument (rather than going indirectly through the conf). Then in 
Child.java, you can call that API directly rather than setting the file in conf.
3) Does it make sense to have a 
TokenCache.addPathsForGettingDelegationToken(Path[]) that's called from the 
places where you currently have TokenCache.obtainTokensForNamenodes. Then just 
before job submission you make one call to TokenCache.obtainTokensForNamenodes. 
That way we will be sure that there is not more than one per a namenode to get 
delegation tokens.
4) You define getDelegationTokens in TrackerDistributedCacheManager but don't 
invoke it. Also, you just need to pass the paths to the 
TokenCache.obtainTokensForNamenodes because that's already checking for 
duplicate entries.
We discussed offline that you need to fix the build so that the hdfs jars are 
in the classpath for the aspects compilation..

> Allow storage and caching of delegation token.
> --
>
> Key: MAPREDUCE-1383
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1383
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Boris Shkolnik
> Attachments: MAPREDUCE-1383-1.patch, MAPREDUCE-1383-2.patch, 
> MAPREDUCE-1383-5.patch, MAPREDUCE-1383-6.patch
>
>
> Client needs to obtain delegation tokens from all the NameNodes it is going 
> to work with and pass it to the application.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1403) Save file-sizes of each of the artifacts in DistributedCache in the JobConf

2010-01-27 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805753#action_12805753
 ] 

Arun C Murthy commented on MAPREDUCE-1403:
--

I propose we save it in the job-conf at the client side as we are setting up 
the distributed-cache for the job with a key like mapred.cache.files.sizes 
(akin to mapred.cache.files.timestamps etc.).

> Save file-sizes of each of the artifacts in DistributedCache in the JobConf
> ---
>
> Key: MAPREDUCE-1403
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1403
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Arun C Murthy
>Assignee: Hong Tang
> Fix For: 0.22.0
>
>
> It would be a useful metric to collect... potentially GridMix could use it to 
> emulate jobs which use the DistributedCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

2010-01-27 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1221:
--

Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below 
> a configured threshold
> ---
>
> Key: MAPREDUCE-1221
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a 
> task exceeds a set of configured thresholds. I would like to extend this 
> feature to enable killing tasks if the physical memory used by that task 
> exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using 
> lots of memory, the machine hangs and dies quickly. This means that we would 
> like to prevent map-reduce jobs from triggering this condition. From my 
> understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were 
> designed to address this problem. This works well when most map-reduce jobs 
> are Java jobs and have well-defined -Xmx parameters that specify the max 
> virtual memory for each task. On the other hand, if each task forks off 
> mappers/reducers written in other languages (python/php, etc), the total 
> virtual memory usage of the process-subtree varies greatly. In these cases, 
> it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

2010-01-27 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1221:
--

Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below 
> a configured threshold
> ---
>
> Key: MAPREDUCE-1221
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a 
> task exceeds a set of configured thresholds. I would like to extend this 
> feature to enable killing tasks if the physical memory used by that task 
> exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using 
> lots of memory, the machine hangs and dies quickly. This means that we would 
> like to prevent map-reduce jobs from triggering this condition. From my 
> understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were 
> designed to address this problem. This works well when most map-reduce jobs 
> are Java jobs and have well-defined -Xmx parameters that specify the max 
> virtual memory for each task. On the other hand, if each task forks off 
> mappers/reducers written in other languages (python/php, etc), the total 
> virtual memory usage of the process-subtree varies greatly. In these cases, 
> it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1420) TestTTResourceReporting failing in trunk

2010-01-27 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1420:
--

Status: Patch Available  (was: Open)

> TestTTResourceReporting failing in trunk
> 
>
> Key: MAPREDUCE-1420
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1420
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Scott Chen
> Attachments: MAPREDUCE-1420-v1.patch
>
>
> TestTTResourceReporting failing in trunk. 
> The most specific issue from the logs seems to be : Error executing shell 
> command org.apache.hadoop.util.Shell$ExitCodeException: kill: No such process 
> Link :
> http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/217/
> Giving the complete raw output:
> [junit] 2010-01-26 14:49:47,885 INFO mapred.JobQueueTaskScheduler 
> (TestTTResourceReporting.java:assignTasks(159)) - expected memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, mapSlotMemSize, reduceSlotMemorySize, 
> cumulativeCpuTime, cpuFrequency, numProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] reported memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, reportedMapSlotMemorySize, 
> reportedReduceSlotMemorySize, reportedCumulativeCpuTime, 
> reportedCpuFrequency, reportedNumProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] 2010-01-26 14:49:47,930 WARN 
> conf.Configuration (Configuration.java:set(601)) - jobclient.output.filter is 
> deprecated. Instead, use mapreduce.client.output.filter [junit] 2010-01-26 
> 14:49:47,943 WARN conf.Configuration (Configuration.java:set(601)) - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used [junit] 2010-01-26 14:49:48,013 
> WARN mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(226)) - 
> No job jar file set. User classes may not be found. See Job or 
> Job#setJar(String). [junit] 2010-01-26 14:49:48,088 WARN conf.Configuration 
> (Configuration.java:set(601)) - mapred.map.tasks is deprecated. Instead, use 
> mapreduce.job.maps [junit] 2010-01-26 14:49:48,088 INFO 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(351)) - number of 
> splits:1 [junit] 2010-01-26 14:49:48,293 WARN conf.Configuration 
> (Configuration.java:handleDeprecation(332)) - 
> mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
> mapreduce.job.committer.setup.cleanup.needed [junit] 2010-01-26 14:49:48,327 
> INFO mapred.JobTracker (JobTracker.java:addJob(3017)) - Job 
> job_20100126144930543_0001 added successfully for user 'hudson' to queue 
> 'default' [junit] 2010-01-26 14:49:48,328 INFO mapred.JobTracker 
> (JobTracker.java:initJob(3192)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,330 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(999)) - Running job: job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,333 INFO mapred.JobInProgress 
> (JobInProgress.java:initTasks(591)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,369 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(242)) - SetupWriter, creating file 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_hudson
>  [junit] 2010-01-26 14:49:48,549 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(256)) - LogDirConfPath is 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_conf.xml
>  [junit] about to write out: token = 1; sec = 0 [junit] 2010-01-26 
> 14:49:48,638 INFO mapred.JobInProgress 
> (JobInProgress.java:generateAndStoreTokens(3567)) - jobToken generated and 
> stored with users keys in 
> /tmp/hadoop-hudson/mapred/system/job_20100126144930543_0001/jobToken [junit] 
> 2010-01-26 14:49:48,645 INFO mapred.JobInProgress 
> (JobInProgress.java:createMapTasks(722)) - Input size for job 
> job_20100126144930543_0001 = 0. Number of splits = 1 [junit] 2010-01-26 
> 14:49:48,647 INFO mapred.JobInProgress (JobInProgress.java:initTasks(653)) - 
> Job job_20100126144930543_0001 initialized successfully with 1 map tasks and 
> 1 reduce tasks. [junit] 2010-01-26 14:49:49,335 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(1013)) - map 0% reduce 0% [junit] 2010-01-26 
> 14:49:50,906 INFO mapred.JobTracker (JobTracker.java:createTaskEntry(1770)) - 
> Adding task (JOB_SETUP) 'attempt_20100126144930543_0001_m_02_0' to tip 
> task_20100126144930543_0001_m_02, for tracker 
> 'tracker_host0.foo.com:localhost/127.0.0.1:41432' [junit] 201

[jira] Commented: (MAPREDUCE-1420) TestTTResourceReporting failing in trunk

2010-01-27 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805737#action_12805737
 ] 

dhruba borthakur commented on MAPREDUCE-1420:
-

+1, change looks good.

> TestTTResourceReporting failing in trunk
> 
>
> Key: MAPREDUCE-1420
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1420
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Scott Chen
> Attachments: MAPREDUCE-1420-v1.patch
>
>
> TestTTResourceReporting failing in trunk. 
> The most specific issue from the logs seems to be : Error executing shell 
> command org.apache.hadoop.util.Shell$ExitCodeException: kill: No such process 
> Link :
> http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/217/
> Giving the complete raw output:
> [junit] 2010-01-26 14:49:47,885 INFO mapred.JobQueueTaskScheduler 
> (TestTTResourceReporting.java:assignTasks(159)) - expected memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, mapSlotMemSize, reduceSlotMemorySize, 
> cumulativeCpuTime, cpuFrequency, numProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] reported memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, reportedMapSlotMemorySize, 
> reportedReduceSlotMemorySize, reportedCumulativeCpuTime, 
> reportedCpuFrequency, reportedNumProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] 2010-01-26 14:49:47,930 WARN 
> conf.Configuration (Configuration.java:set(601)) - jobclient.output.filter is 
> deprecated. Instead, use mapreduce.client.output.filter [junit] 2010-01-26 
> 14:49:47,943 WARN conf.Configuration (Configuration.java:set(601)) - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used [junit] 2010-01-26 14:49:48,013 
> WARN mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(226)) - 
> No job jar file set. User classes may not be found. See Job or 
> Job#setJar(String). [junit] 2010-01-26 14:49:48,088 WARN conf.Configuration 
> (Configuration.java:set(601)) - mapred.map.tasks is deprecated. Instead, use 
> mapreduce.job.maps [junit] 2010-01-26 14:49:48,088 INFO 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(351)) - number of 
> splits:1 [junit] 2010-01-26 14:49:48,293 WARN conf.Configuration 
> (Configuration.java:handleDeprecation(332)) - 
> mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
> mapreduce.job.committer.setup.cleanup.needed [junit] 2010-01-26 14:49:48,327 
> INFO mapred.JobTracker (JobTracker.java:addJob(3017)) - Job 
> job_20100126144930543_0001 added successfully for user 'hudson' to queue 
> 'default' [junit] 2010-01-26 14:49:48,328 INFO mapred.JobTracker 
> (JobTracker.java:initJob(3192)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,330 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(999)) - Running job: job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,333 INFO mapred.JobInProgress 
> (JobInProgress.java:initTasks(591)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,369 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(242)) - SetupWriter, creating file 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_hudson
>  [junit] 2010-01-26 14:49:48,549 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(256)) - LogDirConfPath is 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_conf.xml
>  [junit] about to write out: token = 1; sec = 0 [junit] 2010-01-26 
> 14:49:48,638 INFO mapred.JobInProgress 
> (JobInProgress.java:generateAndStoreTokens(3567)) - jobToken generated and 
> stored with users keys in 
> /tmp/hadoop-hudson/mapred/system/job_20100126144930543_0001/jobToken [junit] 
> 2010-01-26 14:49:48,645 INFO mapred.JobInProgress 
> (JobInProgress.java:createMapTasks(722)) - Input size for job 
> job_20100126144930543_0001 = 0. Number of splits = 1 [junit] 2010-01-26 
> 14:49:48,647 INFO mapred.JobInProgress (JobInProgress.java:initTasks(653)) - 
> Job job_20100126144930543_0001 initialized successfully with 1 map tasks and 
> 1 reduce tasks. [junit] 2010-01-26 14:49:49,335 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(1013)) - map 0% reduce 0% [junit] 2010-01-26 
> 14:49:50,906 INFO mapred.JobTracker (JobTracker.java:createTaskEntry(1770)) - 
> Adding task (JOB_SETUP) 'attempt_20100126144930543_0001_m_02_0' to tip 
> task_20100126144930543_0001_m_02, for tracker 
> 'tracker_ho

[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-01-27 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805735#action_12805735
 ] 

Scott Carey commented on MAPREDUCE-1126:


I am neck deep in building stuff on Avro.  I've also got a custom Pig reader 
that reads only my Avro record types as a prototyping stopgap until there is a 
general solution. 

Ted's Idea of a middle ground sounds useful.  Special casing Writables is OK as 
long as they don't have to be used.  Making all the new stuff harder to use 
sounds like a bad idea.  Ideally, a schema system means as a user I never have 
to write a Writable again.  Those are annoying enough.

bq. For union types, it is a trivial restriction to limit map/reduce users to 
wrap records around the unions 

Wrappers are trivial to deal with in a schema declaration.  They are not 
trivial on the other side. First you have Foo and then Bar, and neither needs a 
wrapper.  But then they might because sometimes you want to serialize Bar in 
columnar order and sometimes you don't.  Now you have Bar, and ColulmnarBar.  
Then you realize you need to have a union, so you make FooBarUnion.  Then in 
another use case you need BarFooUnion (different order, same Equals and 
HashCode -- fun with permutations when the union is large).
Then you have composite types.  FooBarStringNullUnion, and FooStringLongUnion.  
 
Mapping serialization options to classes is not fun.  Wrappers are a useful 
design pattern for many purposes, but not for encapsulating one to many 
relationships.


In general, users are moving away from writing directly to the MapReduce API 
and using various frameworks.  Making these frameworks high performance, 
powerful, and expressive is more important IMO than preventing the low level 
MapReduce API from getting a bit more complicated.
As for end-users writing MapReduce, the current situation is not all that 
pretty anyway.  4 generic type arguments that must be perfectly aligned with 
several type setting calls on a job configuration to avoid a runtime error?  
The Map and Reduce classes have compile time checking on a few types, and thats 
it.


Choosing a serialization is a declarative task, not a procedural one.  
Annotations are what Java has right now for declarative metadata.  
Unfortunately, very few people are experts at building annotation based 
frameworks or  using tools like ASM to enrich the capabilities.  Has there been 
any proposals to allow Annotations to handle this in a way cleaner than these 
declarative setter methods?

I haven't thought through it that far, but here's a quick annotation based idea 
that can bolt on to the work above or the current API.   This is just a point 
of contrast for this discussion, not a proposal to change -- perhaps it sparks 
some ideas to simplify the user experience while also making the framework more 
powerful.  With the right tools those can go hand in hand.

WordCount with configurable Input/Output formats via annotations:
{code}
public class WordCount {

  // if missing each of these has a default, or can be set with the traditional 
setters
  @Input(TextInputFormat.class);
  @Output(AvroOutputFormat.class);
  @InputWritables(key = LongWritable.class, val = Text.class).
  @SchemaConfig(SchemaBasedJobData.class);
  public static class Map extends MapReduceBase implements Mapper {
 public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
  . . .
 }
   }
}

  @Input(AvroInputFormat.class);
  @Output(AvroOutputFormat.class);
  @SchemaConfig(SchemaBasedJobData.class);
  public static class Reduce extends MapReduceBase implements Reducer {
public void reduce(Text key, Iterator values, 
OutputCollector output, Reporter reporter) throws IOException {
. . .
  }

  public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");

// infer inputs and outputs from annotations on the Map, Reduce, and 
Combiner setters
// throw an error if these are not compatible. 
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

// the "old way" can still work, setting individual input and output 
classes.  But it is not necessary when annotated and incompatible with schema 
based systems.

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
  }
}
{code}

There are some things missing above.  
For frameworks, configuration by method call isn't as big of a deal, but for 
hand-written classes it is a good thing to keep the generic type declaration 
and key/value class/type/schema declaration in the same place -- JobConf won't 
tell you at compile time that you have screwed it up and misaligned types in 
your Map.  And generic t

[jira] Commented: (MAPREDUCE-1420) TestTTResourceReporting failing in trunk

2010-01-27 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805734#action_12805734
 ] 

Scott Chen commented on MAPREDUCE-1420:
---

It seems that the problem is the reported CPU frequency doesn't match the 
expected one.
This is because the CPU frequency in /proc/cpuinfo can actually change on some 
environment
See http://lwn.net/Articles/162548/
The test works on my dev box but not hudson this time.

The patch remove CPU frequency from the static value checking.
It will still be verified in the dynamic value checking.

> TestTTResourceReporting failing in trunk
> 
>
> Key: MAPREDUCE-1420
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1420
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Scott Chen
> Attachments: MAPREDUCE-1420-v1.patch
>
>
> TestTTResourceReporting failing in trunk. 
> The most specific issue from the logs seems to be : Error executing shell 
> command org.apache.hadoop.util.Shell$ExitCodeException: kill: No such process 
> Link :
> http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/217/
> Giving the complete raw output:
> [junit] 2010-01-26 14:49:47,885 INFO mapred.JobQueueTaskScheduler 
> (TestTTResourceReporting.java:assignTasks(159)) - expected memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, mapSlotMemSize, reduceSlotMemorySize, 
> cumulativeCpuTime, cpuFrequency, numProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] reported memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, reportedMapSlotMemorySize, 
> reportedReduceSlotMemorySize, reportedCumulativeCpuTime, 
> reportedCpuFrequency, reportedNumProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] 2010-01-26 14:49:47,930 WARN 
> conf.Configuration (Configuration.java:set(601)) - jobclient.output.filter is 
> deprecated. Instead, use mapreduce.client.output.filter [junit] 2010-01-26 
> 14:49:47,943 WARN conf.Configuration (Configuration.java:set(601)) - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used [junit] 2010-01-26 14:49:48,013 
> WARN mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(226)) - 
> No job jar file set. User classes may not be found. See Job or 
> Job#setJar(String). [junit] 2010-01-26 14:49:48,088 WARN conf.Configuration 
> (Configuration.java:set(601)) - mapred.map.tasks is deprecated. Instead, use 
> mapreduce.job.maps [junit] 2010-01-26 14:49:48,088 INFO 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(351)) - number of 
> splits:1 [junit] 2010-01-26 14:49:48,293 WARN conf.Configuration 
> (Configuration.java:handleDeprecation(332)) - 
> mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
> mapreduce.job.committer.setup.cleanup.needed [junit] 2010-01-26 14:49:48,327 
> INFO mapred.JobTracker (JobTracker.java:addJob(3017)) - Job 
> job_20100126144930543_0001 added successfully for user 'hudson' to queue 
> 'default' [junit] 2010-01-26 14:49:48,328 INFO mapred.JobTracker 
> (JobTracker.java:initJob(3192)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,330 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(999)) - Running job: job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,333 INFO mapred.JobInProgress 
> (JobInProgress.java:initTasks(591)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,369 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(242)) - SetupWriter, creating file 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_hudson
>  [junit] 2010-01-26 14:49:48,549 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(256)) - LogDirConfPath is 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_conf.xml
>  [junit] about to write out: token = 1; sec = 0 [junit] 2010-01-26 
> 14:49:48,638 INFO mapred.JobInProgress 
> (JobInProgress.java:generateAndStoreTokens(3567)) - jobToken generated and 
> stored with users keys in 
> /tmp/hadoop-hudson/mapred/system/job_20100126144930543_0001/jobToken [junit] 
> 2010-01-26 14:49:48,645 INFO mapred.JobInProgress 
> (JobInProgress.java:createMapTasks(722)) - Input size for job 
> job_20100126144930543_0001 = 0. Number of splits = 1 [junit] 2010-01-26 
> 14:49:48,647 INFO mapred.JobInProgress (JobInProgress.java:initTasks(653)) - 
> Job job_20100126144930543_0001 initialized successfully with 1 map tasks and 
> 1 reduce tas

[jira] Updated: (MAPREDUCE-1420) TestTTResourceReporting failing in trunk

2010-01-27 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1420:
--

Attachment: MAPREDUCE-1420-v1.patch

> TestTTResourceReporting failing in trunk
> 
>
> Key: MAPREDUCE-1420
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1420
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Scott Chen
> Attachments: MAPREDUCE-1420-v1.patch
>
>
> TestTTResourceReporting failing in trunk. 
> The most specific issue from the logs seems to be : Error executing shell 
> command org.apache.hadoop.util.Shell$ExitCodeException: kill: No such process 
> Link :
> http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/217/
> Giving the complete raw output:
> [junit] 2010-01-26 14:49:47,885 INFO mapred.JobQueueTaskScheduler 
> (TestTTResourceReporting.java:assignTasks(159)) - expected memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, mapSlotMemSize, reduceSlotMemorySize, 
> cumulativeCpuTime, cpuFrequency, numProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] reported memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, reportedMapSlotMemorySize, 
> reportedReduceSlotMemorySize, reportedCumulativeCpuTime, 
> reportedCpuFrequency, reportedNumProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] 2010-01-26 14:49:47,930 WARN 
> conf.Configuration (Configuration.java:set(601)) - jobclient.output.filter is 
> deprecated. Instead, use mapreduce.client.output.filter [junit] 2010-01-26 
> 14:49:47,943 WARN conf.Configuration (Configuration.java:set(601)) - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used [junit] 2010-01-26 14:49:48,013 
> WARN mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(226)) - 
> No job jar file set. User classes may not be found. See Job or 
> Job#setJar(String). [junit] 2010-01-26 14:49:48,088 WARN conf.Configuration 
> (Configuration.java:set(601)) - mapred.map.tasks is deprecated. Instead, use 
> mapreduce.job.maps [junit] 2010-01-26 14:49:48,088 INFO 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(351)) - number of 
> splits:1 [junit] 2010-01-26 14:49:48,293 WARN conf.Configuration 
> (Configuration.java:handleDeprecation(332)) - 
> mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
> mapreduce.job.committer.setup.cleanup.needed [junit] 2010-01-26 14:49:48,327 
> INFO mapred.JobTracker (JobTracker.java:addJob(3017)) - Job 
> job_20100126144930543_0001 added successfully for user 'hudson' to queue 
> 'default' [junit] 2010-01-26 14:49:48,328 INFO mapred.JobTracker 
> (JobTracker.java:initJob(3192)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,330 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(999)) - Running job: job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,333 INFO mapred.JobInProgress 
> (JobInProgress.java:initTasks(591)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,369 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(242)) - SetupWriter, creating file 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_hudson
>  [junit] 2010-01-26 14:49:48,549 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(256)) - LogDirConfPath is 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_conf.xml
>  [junit] about to write out: token = 1; sec = 0 [junit] 2010-01-26 
> 14:49:48,638 INFO mapred.JobInProgress 
> (JobInProgress.java:generateAndStoreTokens(3567)) - jobToken generated and 
> stored with users keys in 
> /tmp/hadoop-hudson/mapred/system/job_20100126144930543_0001/jobToken [junit] 
> 2010-01-26 14:49:48,645 INFO mapred.JobInProgress 
> (JobInProgress.java:createMapTasks(722)) - Input size for job 
> job_20100126144930543_0001 = 0. Number of splits = 1 [junit] 2010-01-26 
> 14:49:48,647 INFO mapred.JobInProgress (JobInProgress.java:initTasks(653)) - 
> Job job_20100126144930543_0001 initialized successfully with 1 map tasks and 
> 1 reduce tasks. [junit] 2010-01-26 14:49:49,335 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(1013)) - map 0% reduce 0% [junit] 2010-01-26 
> 14:49:50,906 INFO mapred.JobTracker (JobTracker.java:createTaskEntry(1770)) - 
> Adding task (JOB_SETUP) 'attempt_20100126144930543_0001_m_02_0' to tip 
> task_20100126144930543_0001_m_02, for tracker 
> 'tracker_host0.foo.com:localhost/127.0.0.1:41432' [junit] 2010

[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-01-27 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805732#action_12805732
 ] 

Jeff Hammerbacher commented on MAPREDUCE-1126:
--

bq. Especially for frameworks written on top of MapReduce, less restrictive 
interfaces here would surely be fertile ground for performance improvements.

bq. Writing wrappers can be irritating, but for the MR API, I'd rather make it 
easier on common cases and users than on advanced uses and framework authors.

Great points, Chris. Yahoo! has stated that a significant majority of their 
MapReduce jobs are written in Pig, and Facebook says the same of Hive. Among 
our many customers at Cloudera, it's far more common to target the MapReduce 
execution engine with a higher level language rather than the Java API. What 
you propose as the common case, then, appears to be uncommon in practice. 
Perhaps we should adjust our design criteria to match the usage data reported 
by the users of the project?

Thanks,
Jeff

> shuffle should use serialization to get comparator
> --
>
> Key: MAPREDUCE-1126
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Doug Cutting
>Assignee: Aaron Kimball
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch, MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1399) The archive command shows a null error message

2010-01-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1399:
-

Attachment: MAPREDUCE-1399.patch

this patch includes the doc changes that went missing because of the split and 
docs removed from common which should have been moved to mapreduce.

> The archive command shows a null error message
> --
>
> Key: MAPREDUCE-1399
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: harchive
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Mahadev konar
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1399.patch
>
>
> {noformat}
> bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
> Exception in archives
> null
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1399) The archive command shows a null error message

2010-01-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1399:
-

Affects Version/s: (was: 0.22.0)
Fix Version/s: 0.22.0

> The archive command shows a null error message
> --
>
> Key: MAPREDUCE-1399
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: harchive
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Mahadev konar
> Fix For: 0.22.0
>
>
> {noformat}
> bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
> Exception in archives
> null
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1399) The archive command shows a null error message

2010-01-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1399:
-

Affects Version/s: 0.22.0

> The archive command shows a null error message
> --
>
> Key: MAPREDUCE-1399
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: harchive
>Affects Versions: 0.22.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Mahadev konar
>
> {noformat}
> bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
> Exception in archives
> null
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1010) Adding tests for changes in archives.

2010-01-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar resolved MAPREDUCE-1010.
--

Resolution: Fixed

This was committed as a part of  HADOOP-6097.

> Adding tests for changes in archives.
> -
>
> Key: MAPREDUCE-1010
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1010
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: harchive
>Affects Versions: 0.20.1
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Minor
> Fix For: 0.20.2
>
> Attachments: MAPREDUCE-1010.patch, MAPREDUCE-1010.patch
>
>
> Created this jira so that the tests can be added for HADOOP-6047. The test 
> cases for hadoop archives are in mapreduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1420) TestTTResourceReporting failing in trunk

2010-01-27 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805720#action_12805720
 ] 

Scott Chen commented on MAPREDUCE-1420:
---

Thanks for the report. I think I am the last person touched this code. I will 
investigate this and fix it.

> TestTTResourceReporting failing in trunk
> 
>
> Key: MAPREDUCE-1420
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1420
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Scott Chen
>
> TestTTResourceReporting failing in trunk. 
> The most specific issue from the logs seems to be : Error executing shell 
> command org.apache.hadoop.util.Shell$ExitCodeException: kill: No such process 
> Link :
> http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/217/
> Giving the complete raw output:
> [junit] 2010-01-26 14:49:47,885 INFO mapred.JobQueueTaskScheduler 
> (TestTTResourceReporting.java:assignTasks(159)) - expected memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, mapSlotMemSize, reduceSlotMemorySize, 
> cumulativeCpuTime, cpuFrequency, numProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] reported memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, reportedMapSlotMemorySize, 
> reportedReduceSlotMemorySize, reportedCumulativeCpuTime, 
> reportedCpuFrequency, reportedNumProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] 2010-01-26 14:49:47,930 WARN 
> conf.Configuration (Configuration.java:set(601)) - jobclient.output.filter is 
> deprecated. Instead, use mapreduce.client.output.filter [junit] 2010-01-26 
> 14:49:47,943 WARN conf.Configuration (Configuration.java:set(601)) - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used [junit] 2010-01-26 14:49:48,013 
> WARN mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(226)) - 
> No job jar file set. User classes may not be found. See Job or 
> Job#setJar(String). [junit] 2010-01-26 14:49:48,088 WARN conf.Configuration 
> (Configuration.java:set(601)) - mapred.map.tasks is deprecated. Instead, use 
> mapreduce.job.maps [junit] 2010-01-26 14:49:48,088 INFO 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(351)) - number of 
> splits:1 [junit] 2010-01-26 14:49:48,293 WARN conf.Configuration 
> (Configuration.java:handleDeprecation(332)) - 
> mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
> mapreduce.job.committer.setup.cleanup.needed [junit] 2010-01-26 14:49:48,327 
> INFO mapred.JobTracker (JobTracker.java:addJob(3017)) - Job 
> job_20100126144930543_0001 added successfully for user 'hudson' to queue 
> 'default' [junit] 2010-01-26 14:49:48,328 INFO mapred.JobTracker 
> (JobTracker.java:initJob(3192)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,330 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(999)) - Running job: job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,333 INFO mapred.JobInProgress 
> (JobInProgress.java:initTasks(591)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,369 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(242)) - SetupWriter, creating file 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_hudson
>  [junit] 2010-01-26 14:49:48,549 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(256)) - LogDirConfPath is 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_conf.xml
>  [junit] about to write out: token = 1; sec = 0 [junit] 2010-01-26 
> 14:49:48,638 INFO mapred.JobInProgress 
> (JobInProgress.java:generateAndStoreTokens(3567)) - jobToken generated and 
> stored with users keys in 
> /tmp/hadoop-hudson/mapred/system/job_20100126144930543_0001/jobToken [junit] 
> 2010-01-26 14:49:48,645 INFO mapred.JobInProgress 
> (JobInProgress.java:createMapTasks(722)) - Input size for job 
> job_20100126144930543_0001 = 0. Number of splits = 1 [junit] 2010-01-26 
> 14:49:48,647 INFO mapred.JobInProgress (JobInProgress.java:initTasks(653)) - 
> Job job_20100126144930543_0001 initialized successfully with 1 map tasks and 
> 1 reduce tasks. [junit] 2010-01-26 14:49:49,335 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(1013)) - map 0% reduce 0% [junit] 2010-01-26 
> 14:49:50,906 INFO mapred.JobTracker (JobTracker.java:createTaskEntry(1770)) - 
> Adding task (JOB_SETUP) 'attempt_20100126144930543_0001_m_02_0' to tip 
> task_20100126144930543_0001_m_02, fo

[jira] Assigned: (MAPREDUCE-1420) TestTTResourceReporting failing in trunk

2010-01-27 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen reassigned MAPREDUCE-1420:
-

Assignee: Scott Chen

> TestTTResourceReporting failing in trunk
> 
>
> Key: MAPREDUCE-1420
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1420
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Scott Chen
>
> TestTTResourceReporting failing in trunk. 
> The most specific issue from the logs seems to be : Error executing shell 
> command org.apache.hadoop.util.Shell$ExitCodeException: kill: No such process 
> Link :
> http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/217/
> Giving the complete raw output:
> [junit] 2010-01-26 14:49:47,885 INFO mapred.JobQueueTaskScheduler 
> (TestTTResourceReporting.java:assignTasks(159)) - expected memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, mapSlotMemSize, reduceSlotMemorySize, 
> cumulativeCpuTime, cpuFrequency, numProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] reported memory values : 
> (totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
> availablePhysicalMemoryOnTT, reportedMapSlotMemorySize, 
> reportedReduceSlotMemorySize, reportedCumulativeCpuTime, 
> reportedCpuFrequency, reportedNumProcessors) = (-1, -1,-1, 
> -1,-1,-1,-1,-1,-1,-1.0) [junit] 2010-01-26 14:49:47,930 WARN 
> conf.Configuration (Configuration.java:set(601)) - jobclient.output.filter is 
> deprecated. Instead, use mapreduce.client.output.filter [junit] 2010-01-26 
> 14:49:47,943 WARN conf.Configuration (Configuration.java:set(601)) - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used [junit] 2010-01-26 14:49:48,013 
> WARN mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(226)) - 
> No job jar file set. User classes may not be found. See Job or 
> Job#setJar(String). [junit] 2010-01-26 14:49:48,088 WARN conf.Configuration 
> (Configuration.java:set(601)) - mapred.map.tasks is deprecated. Instead, use 
> mapreduce.job.maps [junit] 2010-01-26 14:49:48,088 INFO 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(351)) - number of 
> splits:1 [junit] 2010-01-26 14:49:48,293 WARN conf.Configuration 
> (Configuration.java:handleDeprecation(332)) - 
> mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
> mapreduce.job.committer.setup.cleanup.needed [junit] 2010-01-26 14:49:48,327 
> INFO mapred.JobTracker (JobTracker.java:addJob(3017)) - Job 
> job_20100126144930543_0001 added successfully for user 'hudson' to queue 
> 'default' [junit] 2010-01-26 14:49:48,328 INFO mapred.JobTracker 
> (JobTracker.java:initJob(3192)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,330 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(999)) - Running job: job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,333 INFO mapred.JobInProgress 
> (JobInProgress.java:initTasks(591)) - Initializing job_20100126144930543_0001 
> [junit] 2010-01-26 14:49:48,369 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(242)) - SetupWriter, creating file 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_hudson
>  [junit] 2010-01-26 14:49:48,549 INFO jobhistory.JobHistory 
> (JobHistory.java:setupEventWriter(256)) - LogDirConfPath is 
> file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_conf.xml
>  [junit] about to write out: token = 1; sec = 0 [junit] 2010-01-26 
> 14:49:48,638 INFO mapred.JobInProgress 
> (JobInProgress.java:generateAndStoreTokens(3567)) - jobToken generated and 
> stored with users keys in 
> /tmp/hadoop-hudson/mapred/system/job_20100126144930543_0001/jobToken [junit] 
> 2010-01-26 14:49:48,645 INFO mapred.JobInProgress 
> (JobInProgress.java:createMapTasks(722)) - Input size for job 
> job_20100126144930543_0001 = 0. Number of splits = 1 [junit] 2010-01-26 
> 14:49:48,647 INFO mapred.JobInProgress (JobInProgress.java:initTasks(653)) - 
> Job job_20100126144930543_0001 initialized successfully with 1 map tasks and 
> 1 reduce tasks. [junit] 2010-01-26 14:49:49,335 INFO mapreduce.Job 
> (Job.java:monitorAndPrintJob(1013)) - map 0% reduce 0% [junit] 2010-01-26 
> 14:49:50,906 INFO mapred.JobTracker (JobTracker.java:createTaskEntry(1770)) - 
> Adding task (JOB_SETUP) 'attempt_20100126144930543_0001_m_02_0' to tip 
> task_20100126144930543_0001_m_02, for tracker 
> 'tracker_host0.foo.com:localhost/127.0.0.1:41432' [junit] 2010-01-26 
> 14:49:50,915 INFO mapred.TaskTracker (TaskTracke

[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-01-27 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805709#action_12805709
 ] 

Chris Douglas commented on MAPREDUCE-1126:
--

Replacing the type driven serialization with an explicitly specified, 
context-sensitive factory is 1) throwing away all Java type hierarchies, 2) 
asserting that the serialization defines the user types, and 3) implying that 
these types- and relationships between them- should remain opaque to the 
MapReduce framework.

It's making a tradeoff discussed in HADOOP-6323: all the type checks are 
removed from the framework, but enforced by the serializer. So 
{{WritableSerialization}}- appropriately- requires an exact match for the 
configured class, but other serializers may not. The MapReduce framework can't 
do any checks of its own- neither, notably, may Java- to verify properties of 
the types users supply; their semantics are _defined by_ the serialization. For 
example, a job using related {{Writable}} types may pass a compile-time type 
check, work with explicit Avro serialization in the intermediate data, but fail 
if it were run with implicit Writable serialization.

This is a *huge* shift. It means the generic, Java types for the Mapper, 
Reducer, collector etc. literally don't matter; they're effectively all 
{{Object}} (relying on autoboxing to collect primitive types). This means that 
every serialization has its own type semantics which need not look anything 
like what Java can enforce, inspect, or interpret. Given this, that the patch 
puts the serialization as the most prominent interface to MapReduce is not 
entirely surprising.

It's also powerful functionality. By allowing any user type to be 
serialized/deserialized per context, the long-term elimination of the key/value 
distinction doesn't change {{collect(K,V)}} to {{collect(Object)}} as proposed, 
but rather {{collect(Object...)}}: the serializer transforms the record into 
bytes, and the comparator works on that byte range, determining which bytes are 
relevant per the serialization contract. Especially for frameworks written on 
top of MapReduce, less restrictive interfaces here would surely be fertile 
ground for performance improvements.

That said: I hate this API for users. Someone writing a MapReduce job is 
writing a transform of data; how these data are encoded in different contexts 
is usually irrelevant to their task. Forcing the user to pick a serialization 
to declare their types to- rather than offering their types to MapReduce- is 
backwards for the vast majority of cases. Consider the Writable subtype example 
above: one is tying the correctness of the {{Mapper}} to the intermediate 
serialization declared in the submitter code, whose semantics are inscrutable. 
That's just odd.

If one's map is going to emit data without a common type, then doesn't it make 
sense to declare that instead of leaving the signature as {{Object}}? That is, 
particularly given MAPREDUCE-1411, wouldn't the equivalent of 
{{Mapper}} be a more apt signature than 
{{Mapper}} for an implementation emitting {{int}} and 
{{String}} as value types?

I much prefer the semantics of the global serializer, but wouldn't object to 
adding an inconspicuous knob in support of context-sensitive serialization. 
Would a {{Job::setSerializationFactory(CTXT, SerializationFactory...)}} method, 
such that {{CTXT}} is an enumerated type of framework-hooks (i.e. {{DEFAULT}}, 
{{MAP_OUTPUT_KEY}}, {{MAP_OUTPUT_VALUE}}, etc.) be satisfactory? This way, one 
can instruct the framework to use/prefer a particular serialization in one 
context without requiring most users to change their jobs. It also permits 
continued use of largely type-based serialization which- as Tom notes- is a 
very common case. Writing wrappers can be irritating, but for the MR API, I'd 
rather make it easier on common cases and users than on advanced uses and 
framework authors.

> shuffle should use serialization to get comparator
> --
>
> Key: MAPREDUCE-1126
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Doug Cutting
>Assignee: Aaron Kimball
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch, MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated 

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

2010-01-27 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1221:
--

Attachment: MAPREDUCE-1221-v2.patch

Have done some change based on Zheng's comment.
1. In TaskMemoryManagerThread 222: Skip the killed task when checking memory of 
each task
2. In TestTaskTrackerMemoryManager 545: Add assertFalse to fail the test 
immediately if the job finish successfully.

> Kill tasks on a node if the free physical memory on that machine falls below 
> a configured threshold
> ---
>
> Key: MAPREDUCE-1221
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a 
> task exceeds a set of configured thresholds. I would like to extend this 
> feature to enable killing tasks if the physical memory used by that task 
> exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using 
> lots of memory, the machine hangs and dies quickly. This means that we would 
> like to prevent map-reduce jobs from triggering this condition. From my 
> understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were 
> designed to address this problem. This works well when most map-reduce jobs 
> are Java jobs and have well-defined -Xmx parameters that specify the max 
> virtual memory for each task. On the other hand, if each task forks off 
> mappers/reducers written in other languages (python/php, etc), the total 
> virtual memory usage of the process-subtree varies greatly. In these cases, 
> it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-01-27 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1126:
-

Attachment: MAPREDUCE-1126.patch

Here's a much-simplified patch. To show how it works with nested types I've 
added an example mapper with signature {{Mapper>}} which uses the generic Avro serialization for the 
intermediate key and value. It is configured by calling 

{code}
Schema keySchema = Schema.create(Schema.Type.STRING);
Schema valSchema = Schema.parse("{\"type\":\"map\", \"values\":\"long\"}");
AvroGenericData.setMapOutputKeySchema(job, keySchema);
AvroGenericData.setMapOutputValueSchema(job, valSchema);
{code}

This replaces the calls to job.setMapOutputKeyClass() and 
job.setMapOutputValueClass().

I'm interested in hearing people's thoughts about this.

> shuffle should use serialization to get comparator
> --
>
> Key: MAPREDUCE-1126
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Doug Cutting
>Assignee: Aaron Kimball
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch, MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-01-27 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805679#action_12805679
 ] 

Doug Cutting commented on MAPREDUCE-1126:
-

Which Avro serializer? Avro includes three different mappings from in-memory to 
binary representations, and applications can add more. In generic, a 
java.lang.String represents an enum symbol, while in reflect it represents a 
string.

And do we really want to privilege Avro here? It should be possible to use 
Thrift too, and intermix the two within a single job. A Long in the input might 
be a part of Thrift union, and a Long in the output may use Avro.


> shuffle should use serialization to get comparator
> --
>
> Key: MAPREDUCE-1126
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Doug Cutting
>Assignee: Aaron Kimball
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-01-27 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805663#action_12805663
 ] 

Owen O'Malley commented on MAPREDUCE-1126:
--

{quote}
For scalar types Pig uses Java String, Long, Integer, etc. But default Java 
serialization is slow
{quote}

I think the default configuration should use a WritableSerializer for Writables 
and AvroSerializer for everything else. Java serialization was a great 
experiment, but it was never performant for serious use. So the question is not 
whether you want different serializers, but rather a job needs different 
serializers for the same class.

> shuffle should use serialization to get comparator
> --
>
> Key: MAPREDUCE-1126
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Doug Cutting
>Assignee: Aaron Kimball
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-498) Machine List generated by machines.jsp should be sorted

2010-01-27 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen reassigned MAPREDUCE-498:


Assignee: (was: Scott Chen)

> Machine List generated by machines.jsp should be sorted
> ---
>
> Key: MAPREDUCE-498
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-498
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Tim Williamson
>Priority: Minor
> Attachments: HADOOP-5586.patch
>
>
> The listing of machines shown by machine.jsp is arbitrarily ordered.  It 
> would be a more useful to sort them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-01-27 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805625#action_12805625
 ] 

Ted Dunning commented on MAPREDUCE-1126:


{quote}
   >  [From Owen] My assertion is that leaving the type as the primary 
instrument of the user in defining the job is correct. 
   > I haven't talked to any users that care about using a non-default 
serializer for a given type.

Pig would like to. For scalar types Pig uses Java String, Long, Integer, etc. 
But default Java serialization is slow. So currently we convert these to and 
from Writables as we go across the Map and Reduce boundaries to get the faster 
Writable serialization. If we could instead define an alternate serializer and 
avoid these conversions it would make our code simpler and should perform 
better.
{quote}

I would like to.  I would like to start using Avro for greater expressive power 
as soon as possible.  I also can't change all of my legacy code right away so I 
will have some code that implements both Writable and Avro serialization.  I 
need to be able to use writable for old code and Avro for new code.



> shuffle should use serialization to get comparator
> --
>
> Key: MAPREDUCE-1126
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Doug Cutting
>Assignee: Aaron Kimball
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-498) Machine List generated by machines.jsp should be sorted

2010-01-27 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen reassigned MAPREDUCE-498:


Assignee: Scott Chen

> Machine List generated by machines.jsp should be sorted
> ---
>
> Key: MAPREDUCE-498
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-498
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Tim Williamson
>Assignee: Scott Chen
>Priority: Minor
> Attachments: HADOOP-5586.patch
>
>
> The listing of machines shown by machine.jsp is arbitrarily ordered.  It 
> would be a more useful to sort them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-01-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805618#action_12805618
 ] 

Alan Gates commented on MAPREDUCE-1126:
---

bq. [From Owen] My assertion is that leaving the type as the primary instrument 
of the user in defining the job is correct. I haven't talked to any users that 
care about using a non-default serializer for a given type.

Pig would like to.  For scalar types Pig uses Java String, Long, Integer, etc.  
But default Java serialization is slow.  So currently we convert these to and 
from Writables as we go across the Map and Reduce boundaries to get the faster 
Writable serialization.  If we could instead define an alternate serializer and 
avoid these conversions it would make our code simpler and should perform 
better.

> shuffle should use serialization to get comparator
> --
>
> Key: MAPREDUCE-1126
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Doug Cutting
>Assignee: Aaron Kimball
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1416) New JIRA components for Map/Reduce project

2010-01-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805573#action_12805573
 ] 

Steve Loughran commented on MAPREDUCE-1416:
---

OK, the new entries are in and show up when you file new bugs. I will add any 
more people add. For each one can you list

-the string you want for the jira entry
-any text to describe it
-the JIRA account ID of anyone you want to lead a component (and rx/ the bug 
reports, this is handy for contrib/ sections)

I will leave it to anyone to move existing defects into the components; and 
leave this issue open until we are happy with the set

Also, Are the current components with explicit leads up to date? We can alter 
them

* contrib/capacity-sched (Lead: Hemanth Yamijala)
* contrib/eclipse-plugin (Lead: Christophe Taton)
* contrib/fair-share (Lead: Matei Zaharia)
* contrib/gridmix (Lead: Chris Douglas)
* contrib/mumak (Lead: Hong Tang)
* contrib/sqoop (Lead: Aaron Kimball)
* contrib/vertica (Lead: Omer Trajman)
* distcp (Lead: Tsz Wo (Nicholas), SZE)
* harchive (Lead: Mahadev konar)
* test (Lead: Nigel Daley)

> New JIRA components for Map/Reduce project
> --
>
> Key: MAPREDUCE-1416
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1416
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Vinod K V
>Assignee: Steve Loughran
>
> We need more JIRA components for the Map/Reduce project for better tracking. 
> Some missing ones: DistributedCache, TaskController, contrib/vaidya, 
> contrib/mruit, contrib/dynamic-scheduler, contrib/data_join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1186) While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir

2010-01-27 Thread Hemanth Yamijala (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated MAPREDUCE-1186:


Attachment: 1186.20S-6.patch

An updated version of the patch for earlier version of hadoop. Not for commit 
here.

> While localizing a DistributedCache file, TT sets permissions recursively on 
> the whole base-dir
> ---
>
> Key: MAPREDUCE-1186
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1186
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: 1186.20S-6.patch, patch-1186-1.txt, patch-1186-2.txt, 
> patch-1186-3-ydist.txt, patch-1186-3-ydist.txt, patch-1186-3.txt, 
> patch-1186-4.txt, patch-1186-5.txt, patch-1186-ydist.txt, 
> patch-1186-ydist.txt, patch-1186.txt
>
>
> This is a performance problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1420) TestTTResourceReporting failing in trunk

2010-01-27 Thread Iyappan Srinivasan (JIRA)
TestTTResourceReporting failing in trunk


 Key: MAPREDUCE-1420
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1420
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Iyappan Srinivasan


TestTTResourceReporting failing in trunk. 

The most specific issue from the logs seems to be : Error executing shell 
command org.apache.hadoop.util.Shell$ExitCodeException: kill: No such process 

Link :
http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/217/

Giving the complete raw output:

[junit] 2010-01-26 14:49:47,885 INFO mapred.JobQueueTaskScheduler 
(TestTTResourceReporting.java:assignTasks(159)) - expected memory values : 
(totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
availablePhysicalMemoryOnTT, mapSlotMemSize, reduceSlotMemorySize, 
cumulativeCpuTime, cpuFrequency, numProcessors) = (-1, -1,-1, 
-1,-1,-1,-1,-1,-1,-1.0) [junit] reported memory values : 
(totalVirtualMemoryOnTT, totalPhysicalMemoryOnTT, availableVirtualMemoryOnTT, 
availablePhysicalMemoryOnTT, reportedMapSlotMemorySize, 
reportedReduceSlotMemorySize, reportedCumulativeCpuTime, reportedCpuFrequency, 
reportedNumProcessors) = (-1, -1,-1, -1,-1,-1,-1,-1,-1,-1.0) [junit] 2010-01-26 
14:49:47,930 WARN conf.Configuration (Configuration.java:set(601)) - 
jobclient.output.filter is deprecated. Instead, use 
mapreduce.client.output.filter [junit] 2010-01-26 14:49:47,943 WARN 
conf.Configuration (Configuration.java:set(601)) - 
mapred.used.genericoptionsparser is deprecated. Instead, use 
mapreduce.client.genericoptionsparser.used [junit] 2010-01-26 14:49:48,013 WARN 
mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(226)) - No job 
jar file set. User classes may not be found. See Job or Job#setJar(String). 
[junit] 2010-01-26 14:49:48,088 WARN conf.Configuration 
(Configuration.java:set(601)) - mapred.map.tasks is deprecated. Instead, use 
mapreduce.job.maps [junit] 2010-01-26 14:49:48,088 INFO mapreduce.JobSubmitter 
(JobSubmitter.java:submitJobInternal(351)) - number of splits:1 [junit] 
2010-01-26 14:49:48,293 WARN conf.Configuration 
(Configuration.java:handleDeprecation(332)) - 
mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
mapreduce.job.committer.setup.cleanup.needed [junit] 2010-01-26 14:49:48,327 
INFO mapred.JobTracker (JobTracker.java:addJob(3017)) - Job 
job_20100126144930543_0001 added successfully for user 'hudson' to queue 
'default' [junit] 2010-01-26 14:49:48,328 INFO mapred.JobTracker 
(JobTracker.java:initJob(3192)) - Initializing job_20100126144930543_0001 
[junit] 2010-01-26 14:49:48,330 INFO mapreduce.Job 
(Job.java:monitorAndPrintJob(999)) - Running job: job_20100126144930543_0001 
[junit] 2010-01-26 14:49:48,333 INFO mapred.JobInProgress 
(JobInProgress.java:initTasks(591)) - Initializing job_20100126144930543_0001 
[junit] 2010-01-26 14:49:48,369 INFO jobhistory.JobHistory 
(JobHistory.java:setupEventWriter(242)) - SetupWriter, creating file 
file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_hudson
 [junit] 2010-01-26 14:49:48,549 INFO jobhistory.JobHistory 
(JobHistory.java:setupEventWriter(256)) - LogDirConfPath is 
file:/grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/history/job_20100126144930543_0001_conf.xml
 [junit] about to write out: token = 1; sec = 0 [junit] 2010-01-26 14:49:48,638 
INFO mapred.JobInProgress (JobInProgress.java:generateAndStoreTokens(3567)) - 
jobToken generated and stored with users keys in 
/tmp/hadoop-hudson/mapred/system/job_20100126144930543_0001/jobToken [junit] 
2010-01-26 14:49:48,645 INFO mapred.JobInProgress 
(JobInProgress.java:createMapTasks(722)) - Input size for job 
job_20100126144930543_0001 = 0. Number of splits = 1 [junit] 2010-01-26 
14:49:48,647 INFO mapred.JobInProgress (JobInProgress.java:initTasks(653)) - 
Job job_20100126144930543_0001 initialized successfully with 1 map tasks and 1 
reduce tasks. [junit] 2010-01-26 14:49:49,335 INFO mapreduce.Job 
(Job.java:monitorAndPrintJob(1013)) - map 0% reduce 0% [junit] 2010-01-26 
14:49:50,906 INFO mapred.JobTracker (JobTracker.java:createTaskEntry(1770)) - 
Adding task (JOB_SETUP) 'attempt_20100126144930543_0001_m_02_0' to tip 
task_20100126144930543_0001_m_02, for tracker 
'tracker_host0.foo.com:localhost/127.0.0.1:41432' [junit] 2010-01-26 
14:49:50,915 INFO mapred.TaskTracker (TaskTracker.java:registerTask(2059)) - 
LaunchTaskAction (registerTask): attempt_20100126144930543_0001_m_02_0 
task's state:UNASSIGNED [junit] 2010-01-26 14:49:50,917 INFO mapred.TaskTracker 
(TaskTracker.java:run(2017)) - Trying to launch : 
attempt_20100126144930543_0001_m_02_0 which needs 1 slots [junit] 
2010-01-26 14:49:50,917 INFO mapred.TaskTracker (TaskTracker.java:run(2028)) - 
In TaskLaunche

[jira] Updated: (MAPREDUCE-899) When using LinuxTaskController, localized files may become accessible to unintended users if permissions are misconfigured.

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-899:
--

Status: Open  (was: Patch Available)

> When using LinuxTaskController, localized files may become accessible to 
> unintended users if permissions are misconfigured.
> ---
>
> Key: MAPREDUCE-899
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-899
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-899-20090828.txt, patch-899-1.txt, 
> patch-899-2.txt, patch-899-3.txt, patch-899.txt
>
>
> To enforce the accessibility of job files to only the job-owner and the 
> TaskTracker, as per MAPREDUCE-842, it is _trusted_ that the  setuid/setgid 
> linux TaskController binary is group owned by a _special group_ to which only 
> TaskTracker belongs and not just any group to which TT belongs. If the trust 
> is broken, possibly due to misconfiguration by admins, the local files become 
> accessible to unintended users, yet giving false sense of security to the 
> admins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1270) Hadoop C++ Extention

2010-01-27 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805457#action_12805457
 ] 

He Yongqiang commented on MAPREDUCE-1270:
-

Hi Dong / Shouyan,
Are you going to open source this? If yes, can you update the recent work? This 
can help others to better understand.

> Hadoop C++ Extention
> 
>
> Key: MAPREDUCE-1270
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1270
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.20.1
> Environment:  hadoop linux
>Reporter: Wang Shouyan
>
>   Hadoop C++ extension is an internal project in baidu, We start it for these 
> reasons:
>1  To provide C++ API. We mostly use Streaming before, and we also try to 
> use PIPES, but we do not find PIPES is more efficient than Streaming. So we 
> think a new C++ extention is needed for us.
>2  Even using PIPES or Streaming, it is hard to control memory of hadoop 
> map/reduce Child JVM.
>3  It costs so much to read/write/sort TB/PB data by Java. When using 
> PIPES or Streaming, pipe or socket is not efficient to carry so huge data.
>What we want to do: 
>1 We do not use map/reduce Child JVM to do any data processing, which just 
> prepares environment, starts C++ mapper, tells mapper which split it should  
> deal with, and reads report from mapper until that finished. The mapper will 
> read record, ivoke user defined map, to do partition, write spill, combine 
> and merge into file.out. We think these operations can be done by C++ code.
>2 Reducer is similar to mapper, it was started after sort finished, it 
> read from sorted files, ivoke user difined reduce, and write to user defined 
> record writer.
>3 We also intend to rewrite shuffle and sort with C++, for efficience and 
> memory control.
>at first, 1 and 2, then 3.  
>What's the difference with PIPES:
>1 Yes, We will reuse most PIPES code.
>2 And, We should do it more completely, nothing changed in scheduling and 
> management, but everything in execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1416) New JIRA components for Map/Reduce project

2010-01-27 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned MAPREDUCE-1416:
-

Assignee: Steve Loughran

> New JIRA components for Map/Reduce project
> --
>
> Key: MAPREDUCE-1416
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1416
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Vinod K V
>Assignee: Steve Loughran
>
> We need more JIRA components for the Map/Reduce project for better tracking. 
> Some missing ones: DistributedCache, TaskController, contrib/vaidya, 
> contrib/mruit, contrib/dynamic-scheduler, contrib/data_join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1416) New JIRA components for Map/Reduce project

2010-01-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805441#action_12805441
 ] 

Steve Loughran commented on MAPREDUCE-1416:
---

I think I have the rights. Let me see

> New JIRA components for Map/Reduce project
> --
>
> Key: MAPREDUCE-1416
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1416
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Vinod K V
>
> We need more JIRA components for the Map/Reduce project for better tracking. 
> Some missing ones: DistributedCache, TaskController, contrib/vaidya, 
> contrib/mruit, contrib/dynamic-scheduler, contrib/data_join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1415) With streaming jobs and LinuxTaskController, the localized streaming binary has 571 permissions instead of 570

2010-01-27 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805437#action_12805437
 ] 

Vinod K V commented on MAPREDUCE-1415:
--

Note that this still isn't a major security concern as all the parent 
directories are secure already. This fix is needed so we are consistent with 
permissions overall.

> With streaming jobs and LinuxTaskController, the localized streaming binary 
> has 571 permissions instead of 570
> --
>
> Key: MAPREDUCE-1415
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1415
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming, security
>Reporter: Vinod K V
>
> After MAPREDUCE-856, all localized files are expected to have **0 permissions 
> for the sake of security.
> This was found by Karam while testing LinuxTaskController functionality after 
> MAPREDUCE-856.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1419) Enhance tasktracker's localization tests after MAPREDUCE-181

2010-01-27 Thread Vinod K V (JIRA)
Enhance tasktracker's localization tests after MAPREDUCE-181


 Key: MAPREDUCE-1419
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1419
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security, tasktracker, test
Reporter: Vinod K V
 Fix For: 0.22.0


The following tests are missing:
 - Verifying the secure permissions and ownership of the localized token-file
 - Making the tests future proof against missing of permissions/ownership 
checks of newly added files/directories.
 - JobContext.JOB_TOKEN_FILE property setting in the localized 
job-configuration.
 - Failure of localization if the token-file is not present in the JT 
file-system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1383) Allow storage and caching of delegation token.

2010-01-27 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805425#action_12805425
 ] 

Devaraj Das commented on MAPREDUCE-1383:


[exec] compile-aspects:
 [exec]  [echo] 1.6
 [exec]  [echo] Start weaving aspects in place
 [exec]  [iajc] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/JobSubmitter.java:39
 [error] The import org.apache.hadoop.hdfs cannot be resolved
 [exec]  [iajc] import org.apache.hadoop.hdfs.DistributedFileSystem;
 [exec]  [iajc]^
 [exec]  [iajc] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/JobSubmitter.java:40
 [error] The import org.apache.hadoop.hdfs cannot be resolved
 [exec]  [iajc] import 
org.apache.hadoop.hdfs.security.token.DelegationTokenIdentifier;
 [exec]  [iajc]^
 [exec]  [iajc] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/JobSubmitter.java:41
 [error] The import org.apache.hadoop.hdfs cannot be resolved
 [exec]  [iajc] import org.apache.hadoop.hdfs.server.namenode.NameNode;
 [exec]  [iajc]^
 [exec]  [iajc] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/security/TokenCache.java:32
 [error] The import org.apache.hadoop.hdfs cannot be resolved
 [exec]  [iajc] import org.apache.hadoop.hdfs.DistributedFileSystem;
 [exec]  [iajc]^
 [exec]  [iajc] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/security/TokenCache.java:33
 [error] The import org.apache.hadoop.hdfs cannot be resolved
 [exec]  [iajc] import 
org.apache.hadoop.hdfs.security.token.DelegationTokenIdentifier;
 [exec]  [iajc]^
 [exec]  [iajc] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/security/TokenCache.java:99
 [error] DelegationTokenIdentifier cannot be resolved to a type
 [exec]  [iajc] public static Token 
getDelegationToken(String namenode) {
 [exec]  [iajc] ^^
 [exec]  [iajc] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/security/TokenCache.java:181
 [error] DistributedFileSystem cannot be resolved to a type
 [exec]  [iajc] if(fs instanceof DistributedFileSystem) {
 [exec]  [iajc]  ^^
 [exec]  [iajc] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/security/TokenCache.java:182
 [error] DistributedFileSystem cannot be resolved to a type
 [exec]  [iajc] DistributedFileSystem dfs = (DistributedFileSystem)fs;
 [exec]  [iajc] 
 [exec]  [iajc] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/security/TokenCache.java:182
 [error] DistributedFileSystem cannot be resolved to a type
 [exec]  [iajc] DistributedFileSystem dfs = (DistributedFileSystem)fs;
 [exec]  [iajc]  
 [exec]  [iajc] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/security/TokenCache.java:186
 [error] DelegationTokenIdentifier cannot be resolved to a type
 [exec]  [iajc] Token token = 
 [exec]  [iajc]   
 [exec]  [iajc] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/security/TokenCache.java:187
 [error] The method getDelegationToken(String) is undefined for the type 
TokenCache
 [exec]  [iajc] TokenCache.getDelegationToken(fs_uri); 
 [exec]  [iajc]^^^
 [exec]  [iajc] 
 [exec]  [iajc] 11 errors
 [exec] 
 [exec] BUILD FAILED

This is there is the log of the build. The reason for this is the hdfs jar is 
not present in the classpath when compile-aspects is running. Is there a way to 
avoid having to import hdfs.* in the patch.

> Allow storage and caching of delegation token.
> --
>
> Key: MAPREDUCE-1383
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1383
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Boris Shkolnik
> 

[jira] Updated: (MAPREDUCE-1376) Support for varied user submission in Gridmix

2010-01-27 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1376:
-

Status: Open  (was: Patch Available)

The strategy effecting this is invalid after HADOOP-6299

> Support for varied user submission in Gridmix
> -
>
> Key: MAPREDUCE-1376
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1376
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/gridmix
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: M1376-0.patch, M1376-1.patch, M1376-2.patch, 
> M1376-3.patch, M1376-4.patch
>
>
> Gridmix currently submits all synthetic jobs as the client user. It should be 
> possible to map users in the trace to a set of users appropriate for the 
> target cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1385) Make changes to MapReduce for the new UserGroupInformation APIs (HADOOP-6299)

2010-01-27 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved MAPREDUCE-1385.
--

  Resolution: Fixed
Hadoop Flags: [Incompatible change, Reviewed]

I just committed this. Thanks, Devaraj!

> Make changes to MapReduce for the new UserGroupInformation APIs (HADOOP-6299)
> -
>
> Key: MAPREDUCE-1385
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1385
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.22.0
>
> Attachments: mr-6299.3.patch, mr-6299.7.patch, mr-6299.8.patch, 
> mr-6299.patch
>
>
> This is about moving the MapReduce code to use the new UserGroupInformation 
> API as described in HADOOP-6299.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-899) When using LinuxTaskController, localized files may become accessible to unintended users if permissions are misconfigured.

2010-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805418#action_12805418
 ] 

Hadoop QA commented on MAPREDUCE-899:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12431525/patch-899-3.txt
  against trunk revision 903544.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/410/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/410/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/410/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/410/console

This message is automatically generated.

> When using LinuxTaskController, localized files may become accessible to 
> unintended users if permissions are misconfigured.
> ---
>
> Key: MAPREDUCE-899
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-899
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-899-20090828.txt, patch-899-1.txt, 
> patch-899-2.txt, patch-899-3.txt, patch-899.txt
>
>
> To enforce the accessibility of job files to only the job-owner and the 
> TaskTracker, as per MAPREDUCE-842, it is _trusted_ that the  setuid/setgid 
> linux TaskController binary is group owned by a _special group_ to which only 
> TaskTracker belongs and not just any group to which TT belongs. If the trust 
> is broken, possibly due to misconfiguration by admins, the local files become 
> accessible to unintended users, yet giving false sense of security to the 
> admins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1410) Task-Cleanup attempt cannot be given KillTaskAction if the main attempt is killed with a KillTaskAction

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1410:
---

Fix Version/s: 0.22.0
   Status: Patch Available  (was: Open)

> Task-Cleanup attempt cannot be given KillTaskAction if the main attempt is 
> killed with a KillTaskAction
> ---
>
> Key: MAPREDUCE-1410
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1410
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1410.txt
>
>
> If the main attempt is killed with a KillTaskAction and is added to 
> tasksReportedClosed, then the cleanup-attempt for the task (with same id) can 
> not be given a KillTaskAction, since tasksReportedClosed already contains the 
> attemptID.  
> The attemptID should be removed from  tasksReportedClosed in 
> incompleteSubTask() method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1410) Task-Cleanup attempt cannot be given KillTaskAction if the main attempt is killed with a KillTaskAction

2010-01-27 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1410:
---

Attachment: patch-1410.txt

Patch fixing the bug.
Added unit tests for all possible paths for cleanup attempt getting 
KillTaskAction, when its main attempt is killed with KillTaskAction.

> Task-Cleanup attempt cannot be given KillTaskAction if the main attempt is 
> killed with a KillTaskAction
> ---
>
> Key: MAPREDUCE-1410
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1410
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1410.txt
>
>
> If the main attempt is killed with a KillTaskAction and is added to 
> tasksReportedClosed, then the cleanup-attempt for the task (with same id) can 
> not be given a KillTaskAction, since tasksReportedClosed already contains the 
> attemptID.  
> The attemptID should be removed from  tasksReportedClosed in 
> incompleteSubTask() method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1418) LinuxTaskController binary misses validation of arguments passed for relative components in some cases.

2010-01-27 Thread Vinod K V (JIRA)
LinuxTaskController binary misses validation of arguments passed for relative 
components in some cases.
---

 Key: MAPREDUCE-1418
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1418
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security, tasktracker
Reporter: Vinod K V


The function {{int check_path_for_relative_components(char * path)}} should be 
used to validate the absence of relative components before any operation is 
done on those paths. This is missed in all the {{initialize*()}} functions, as 
Hemanth pointed out offline.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.