[jira] Created: (MAPREDUCE-1314) Some logs have wrong configuration names.

2009-12-18 Thread Amareshwari Sriramadasu (JIRA)
Some logs have wrong configuration names.
-

 Key: MAPREDUCE-1314
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1314
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0


After MAPREDUCE-849, some of the logs have wrong configuration names.
For example :
09/12/16 20:30:58 INFO mapred.MapTask: mapreduce.task.mapreduce.task.io.sort.mb 
= 10

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1067) Default state of queues is undefined when unspecified

2009-12-18 Thread V.V.Chaitanya Krishna (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792383#action_12792383
 ] 

V.V.Chaitanya Krishna commented on MAPREDUCE-1067:
--

bq. -1 contrib tests. The patch failed contrib unit tests.

The test failures are unrelated to this issue (ref. MAPREDUCE-1311)

 Default state of queues is undefined when unspecified
 -

 Key: MAPREDUCE-1067
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1067
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: V.V.Chaitanya Krishna
Assignee: V.V.Chaitanya Krishna
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1067-1.patch, MAPREDUCE-1067-2.patch, 
 MAPREDUCE-1067-3.patch, MAPREDUCE-1067-4.patch, MAPREDUCE-1067-5.patch, 
 MAPREDUCE-1067-6.patch


 Currently, if the state of a queue is not specified, it is being set to 
 undefined state instead of running state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.

2009-12-18 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792384#action_12792384
 ] 

Hemanth Yamijala commented on MAPREDUCE-1143:
-

+1 for the 21 patch. I will commit this.

 runningMapTasks counter is not properly decremented in case of failed Tasks.
 

 Key: MAPREDUCE-1143
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: rahul k singh
Assignee: rahul k singh
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch, 
 MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch, 
 MAPRED-1143-5.patch.txt, MAPRED-1143-6.patch, MAPRED-1143-7.patch, 
 MAPRED-1143-v21.patch, MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch, 
 MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch, 
 MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch, 
 MAPRED-1143-ydist-7.patch, MAPRED-1143-ydist-8.patch.txt, 
 MAPRED-1143-ydist-9.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.

2009-12-18 Thread Hemanth Yamijala (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated MAPREDUCE-1143:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I committed this to trunk and branch 0.21. Thanks, Rahul !

 runningMapTasks counter is not properly decremented in case of failed Tasks.
 

 Key: MAPREDUCE-1143
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: rahul k singh
Assignee: rahul k singh
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch, 
 MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch, 
 MAPRED-1143-5.patch.txt, MAPRED-1143-6.patch, MAPRED-1143-7.patch, 
 MAPRED-1143-v21.patch, MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch, 
 MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch, 
 MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch, 
 MAPRED-1143-ydist-7.patch, MAPRED-1143-ydist-8.patch.txt, 
 MAPRED-1143-ydist-9.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1315) taskdetails.jsp and jobfailures.jsp should have consistent convention for machine names in case of lost task tracker

2009-12-18 Thread Ramya R (JIRA)
taskdetails.jsp and jobfailures.jsp should have consistent convention for 
machine names in case of lost task tracker


 Key: MAPREDUCE-1315
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1315
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.20.1
Reporter: Ramya R
Priority: Minor
 Fix For: 0.20.2


Machine names displayed in taskdetails.jsp and jobfailures,jsp show 
inconsistency in convention in case of lost TT i.e in case of lost TT the 
machine name is displayed as tracker_hostname:localhost/127.0.0.1:port 
(not a hyperlink) whereas for other TTs the name displayed is hostname 
(hyperlink). Ideally the machine names should follow a single convention.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1315) taskdetails.jsp and jobfailures.jsp should have consistent convention for machine names in case of lost task tracker

2009-12-18 Thread Ramya R (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramya R updated MAPREDUCE-1315:
---

Attachment: TaskDetails.png

Attaching the snapshot of taskdetails.jsp showing the inconsistency.

 taskdetails.jsp and jobfailures.jsp should have consistent convention for 
 machine names in case of lost task tracker
 

 Key: MAPREDUCE-1315
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1315
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.20.1
Reporter: Ramya R
Priority: Minor
 Fix For: 0.20.2

 Attachments: TaskDetails.png


 Machine names displayed in taskdetails.jsp and jobfailures,jsp show 
 inconsistency in convention in case of lost TT i.e in case of lost TT the 
 machine name is displayed as tracker_hostname:localhost/127.0.0.1:port 
 (not a hyperlink) whereas for other TTs the name displayed is hostname 
 (hyperlink). Ideally the machine names should follow a single convention.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-18 Thread Hong Tang (JIRA)
Reducing memory consumption of rumen objects


 Key: MAPREDUCE-1317
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Hong Tang


We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
very large jobs. The purpose of this jira is to optimze memory consumption of 
rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-18 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792441#action_12792441
 ] 

Hong Tang commented on MAPREDUCE-1317:
--

Through YourKit profiling, we found two places where we could save memory:
- LoggedLocation - we should share references to the same LoggedLocation for 
the same preferred location for different map tasks.
- LoggedTaskAttempt.hostName - we should keep a cache of all host names for the 
cluster and share the references.

 Reducing memory consumption of rumen objects
 

 Key: MAPREDUCE-1317
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Hong Tang

 We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
 very large jobs. The purpose of this jira is to optimze memory consumption of 
 rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-18 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1317:
-

Attachment: mapreduce-1317-20091218.patch

Straight forward patch. No additional test is added because it does not change 
the semantics of the modified classes, and existing unit tests should provide 
enough coverage.

 Reducing memory consumption of rumen objects
 

 Key: MAPREDUCE-1317
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Hong Tang
 Attachments: mapreduce-1317-20091218.patch


 We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
 very large jobs. The purpose of this jira is to optimze memory consumption of 
 rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-18 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1317:
-

Fix Version/s: 0.22.0
   0.21.0
 Assignee: Hong Tang
Affects Version/s: 0.22.0
   0.21.0
   Status: Patch Available  (was: Open)

 Reducing memory consumption of rumen objects
 

 Key: MAPREDUCE-1317
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.21.0, 0.22.0
Reporter: Hong Tang
Assignee: Hong Tang
 Fix For: 0.21.0, 0.22.0

 Attachments: mapreduce-1317-20091218.patch


 We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
 very large jobs. The purpose of this jira is to optimze memory consumption of 
 rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1235) java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.

2009-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792490#action_12792490
 ] 

Hadoop QA commented on MAPREDUCE-1235:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428350/MAPREDUCE-1235.patch
  against trunk revision 892178.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/217/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/217/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/217/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/217/console

This message is automatically generated.

 java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 
 to TIMESTAMP. 
 

 Key: MAPREDUCE-1235
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1235
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Affects Versions: 0.20.1
 Environment: hadoop 0.20.1
 sqoop
 ubuntu karmic
 mysql 4
Reporter: valentina kroshilina
Assignee: Aaron Kimball
Priority: Minor
 Attachments: MAPREDUCE-1235.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 *Description*: java.io.IOException is thrown when trying to import a table to 
 HDFS using Sqoop. Table has 0 value in a field of type datetime. 
 *Full Exception*: java.io.IOException: Cannot convert value '-00-00 
 00:00:00' from column 6 to TIMESTAMP. 
 *Original question*: 
 http://getsatisfaction.com/cloudera/topics/cant_import_table?utm_content=reply_linkutm_medium=emailutm_source=reply_notification

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via speculated tips

2009-12-18 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792496#action_12792496
 ] 

Arun C Murthy commented on MAPREDUCE-1316:
--

Good one Amar!

 JobTracker holds stale references to retired jobs via speculated tips 
 --

 Key: MAPREDUCE-1316
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Amar Kamat

 JobTracker fails to remove speculative tasks' mapping from _taskToTIPMap_ if 
 the job finishes and retires before the tracker (running the speculative 
 tasks) reports back. In such cases a stale reference is held to 
 TaskInProgress (and thus JobInProgress) long after the job is gone leading to 
 memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via speculated tips

2009-12-18 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat reassigned MAPREDUCE-1316:
-

Assignee: Amar Kamat

 JobTracker holds stale references to retired jobs via speculated tips 
 --

 Key: MAPREDUCE-1316
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Amar Kamat
Assignee: Amar Kamat

 JobTracker fails to remove speculative tasks' mapping from _taskToTIPMap_ if 
 the job finishes and retires before the tracker (running the speculative 
 tasks) reports back. In such cases a stale reference is held to 
 TaskInProgress (and thus JobInProgress) long after the job is gone leading to 
 memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1291) JobTracker fails to remove setup tip mapping from taskidToTIPMap if the job gets killed before the setup returns

2009-12-18 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat reassigned MAPREDUCE-1291:
-

Assignee: Amar Kamat

 JobTracker fails to remove setup tip mapping from taskidToTIPMap if the job 
 gets killed before the setup returns
 

 Key: MAPREDUCE-1291
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1291
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Amar Kamat
Assignee: Amar Kamat
Priority: Critical

 Here is the scenario :
 1) job inits
 2) setup task is launched on tt1 and an entry is made in taskidToTIPMap
 3) job is killed
 4) cleanup gets launched on tt2
 5) cleanup returns KILLING the job and removing all the *completed* 
 setup/map/reduce task mappings from taskidToTIPMap. Here the setup is still 
 RUNNING state.
 6) job retires and all the map/reduce mappings from taskidToTIPMap are removed
  
 In the end the setup tip still lingers in the taskidToTIPMap map. Because of 
 the backreference from the tip to jip, the whole job stays in memory forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1235) java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.

2009-12-18 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792549#action_12792549
 ] 

Aaron Kimball commented on MAPREDUCE-1235:
--

These test failures are unrelated

 java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 
 to TIMESTAMP. 
 

 Key: MAPREDUCE-1235
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1235
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Affects Versions: 0.20.1
 Environment: hadoop 0.20.1
 sqoop
 ubuntu karmic
 mysql 4
Reporter: valentina kroshilina
Assignee: Aaron Kimball
Priority: Minor
 Attachments: MAPREDUCE-1235.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 *Description*: java.io.IOException is thrown when trying to import a table to 
 HDFS using Sqoop. Table has 0 value in a field of type datetime. 
 *Full Exception*: java.io.IOException: Cannot convert value '-00-00 
 00:00:00' from column 6 to TIMESTAMP. 
 *Original question*: 
 http://getsatisfaction.com/cloudera/topics/cant_import_table?utm_content=reply_linkutm_medium=emailutm_source=reply_notification

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-181) Secure job submission

2009-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792570#action_12792570
 ] 

Hadoop QA commented on MAPREDUCE-181:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428376/181-5.1.patch
  against trunk revision 892178.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 78 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/console

This message is automatically generated.

 Secure job submission 
 --

 Key: MAPREDUCE-181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amar Kamat
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch, 
 181-4.patch, 181-5.1.patch, 181-5.1.patch, 
 hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch, 
 HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, 
 MAPRED-181-v3.8.patch


 Currently the jobclient accesses the {{mapred.system.dir}} to add job 
 details. Hence the {{mapred.system.dir}} has the permissions of 
 {{rwx-wx-wx}}. This could be a security loophole where the job files might 
 get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1258) Fair scheduler event log not logging job info

2009-12-18 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792595#action_12792595
 ] 

Scott Chen commented on MAPREDUCE-1258:
---

+1, this patch looks good to me.

 Fair scheduler event log not logging job info
 -

 Key: MAPREDUCE-1258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1258
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/fair-share
Affects Versions: 0.21.0
Reporter: Matei Zaharia
Assignee: Matei Zaharia
Priority: Minor
 Attachments: mapreduce-1258-1.patch


 The MAPREDUCE-706 patch seems to have left an unfinished TODO in the Fair 
 Scheduler - namely, in the dump() function for periodically dumping scheduler 
 state to the event log, the part that dumps information about jobs is 
 commented out. This makes the event log less useful than it was before.
 It should be fairly easy to update this part to use the new scheduler data 
 structures (Schedulable etc) and print the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously

2009-12-18 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792632#action_12792632
 ] 

Todd Lipcon commented on MAPREDUCE-1213:


Hi Zheng. Do you have this patch against 0.20 as well? We're considering 
backporting. Thanks

 TaskTrackers restart is very slow because it deletes distributed cache 
 directory synchronously
 --

 Key: MAPREDUCE-1213
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: dhruba borthakur
Assignee: Zheng Shao
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, 
 MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch


 We are seeing that when we restart a tasktracker, it tries to recursively 
 delete all the file in the distributed cache. It invoked 
 FileUtil.fullyDelete() which is very very slow. This means that the 
 TaskTracker cannot join the cluster for an extended period of time (upto 2 
 hours for us). The problem is acute if the number of files in a distributed 
 cache is a few-thousands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1310) CREATE TABLE statements for Hive do not correctly specify delimiters

2009-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792651#action_12792651
 ] 

Hadoop QA commented on MAPREDUCE-1310:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428386/MAPREDUCE-1310.patch
  against trunk revision 892178.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 20 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/219/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/219/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/219/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/219/console

This message is automatically generated.

 CREATE TABLE statements for Hive do not correctly specify delimiters
 

 Key: MAPREDUCE-1310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1310
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1310.patch


 Imports to HDFS via Sqoop that also inject metadata into Hive do not 
 correctly specify delimiters; using Hive to access the data results in rows 
 being parsed as NULL characters. See 
 http://getsatisfaction.com/cloudera/topics/sqoop_hive_import_giving_null_query_values
  for an example bug report

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

2009-12-18 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792667#action_12792667
 ] 

Todd Lipcon commented on MAPREDUCE-1114:


The issue is that the build ends up spawning a lot of subants, which each 
re-resolve everything. I get a total of 22 ivy-resolves even if I skip contrib! 
Part of this is that skip.contrib=1 still resolves all of the contrib stuff 
(MAPREDUCE-1113)

{quote}
t...@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 
-Dtestcase=xxx 21 | grep 'ivy-resolve' | wc -l
22
t...@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 
-Dtestcase=xxx 21 | grep 'ivy-resolve-common' | wc -l
19
{quote}

Some of these are within the same ant run, so they get cached. But 16 of them 
actually do some non-cached work:
{quote}
t...@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 
-Dtestcase=xxx 21 | grep 'resolving dependencies' | wc -l
16
{quote}

 Speed up ivy resolution in builds with clever caching
 -

 Key: MAPREDUCE-1114
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-1114.txt, mapreduce-1114.txt, 
 mapreduce-1114.txt


 An awful lot of time is spent in the ivy:resolve parts of the build, even 
 when all of the dependencies have been fetched and cached. Profiling showed 
 this was in XML parsing. I have a sort-of-ugly hack which speeds up 
 incremental compiles (and more importantly ant test) significantly using 
 some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously

2009-12-18 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1302:
--

Status: Patch Available  (was: Open)

Transient errors in hudson. (user1 not found)
Submitting again.

 TrackerDistributedCacheManager can delete file asynchronously
 -

 Key: MAPREDUCE-1302
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch


 With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to 
 delete files from distributed cache asynchronously.
 That will help make task initialization faster, because task initialization 
 calls the code that localizes files into the cache and may delete some other 
 files.
 The deletion can slow down the task initialization speed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously

2009-12-18 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1302:
--

Status: Open  (was: Patch Available)

 TrackerDistributedCacheManager can delete file asynchronously
 -

 Key: MAPREDUCE-1302
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch


 With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to 
 delete files from distributed cache asynchronously.
 That will help make task initialization faster, because task initialization 
 calls the code that localizes files into the cache and may delete some other 
 files.
 The deletion can slow down the task initialization speed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1310) CREATE TABLE statements for Hive do not correctly specify delimiters

2009-12-18 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792685#action_12792685
 ] 

Aaron Kimball commented on MAPREDUCE-1310:
--

These test failures are unrelated

 CREATE TABLE statements for Hive do not correctly specify delimiters
 

 Key: MAPREDUCE-1310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1310
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1310.patch


 Imports to HDFS via Sqoop that also inject metadata into Hive do not 
 correctly specify delimiters; using Hive to access the data results in rows 
 being parsed as NULL characters. See 
 http://getsatisfaction.com/cloudera/topics/sqoop_hive_import_giving_null_query_values
  for an example bug report

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

2009-12-18 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792687#action_12792687
 ] 

Konstantin Boudnik commented on MAPREDUCE-1114:
---

Well, build.xml has 7 'retrieves' in it. If you all add contrib to this it's 
gonna be total mess (e.g. 22 re-resolutions). IMO fixing ivy doesn't make much 
sense - we'd be better off focusing on moving towards Maven.

 Speed up ivy resolution in builds with clever caching
 -

 Key: MAPREDUCE-1114
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-1114.txt, mapreduce-1114.txt, 
 mapreduce-1114.txt


 An awful lot of time is spent in the ivy:resolve parts of the build, even 
 when all of the dependencies have been fetched and cached. Profiling showed 
 this was in XML parsing. I have a sort-of-ugly hack which speeds up 
 incremental compiles (and more importantly ant test) significantly using 
 some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1083) Use the user-to-groups mapping service in the JobTracker

2009-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792711#action_12792711
 ] 

Hadoop QA commented on MAPREDUCE-1083:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12428377/MAPREDUCE-1083-3.patch
  against trunk revision 892178.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/220/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/220/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/220/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/220/console

This message is automatically generated.

  Use the user-to-groups mapping service in the JobTracker
 -

 Key: MAPREDUCE-1083
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1083
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Arun C Murthy
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HADOOP-4656_mr.patch, MAPREDUCE-1083-2.patch, 
 MAPREDUCE-1083-3.patch


 HADOOP-4656 introduces a user-to-groups mapping service on the server-side. 
 The JobTracker should use this to map users to their groups rather than 
 relying on the information passed by the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1287) HashPartitioner calls hashCode() when there is only 1 reducer

2009-12-18 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1287:
-

Status: Open  (was: Patch Available)

 HashPartitioner calls hashCode() when there is only 1 reducer
 -

 Key: MAPREDUCE-1287
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1287
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Ed Mazur
Assignee: Ed Mazur
Priority: Minor
 Fix For: 0.22.0

 Attachments: M1287-4.patch, MAPREDUCE-1287.2.patch, 
 MAPREDUCE-1287.3.patch, MAPREDUCE-1287.patch


 HashPartitioner could be optimized to not call the key's hashCode() if there 
 is only 1 reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1287) HashPartitioner calls hashCode() when there is only 1 reducer

2009-12-18 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1287:
-

Status: Patch Available  (was: Open)

 HashPartitioner calls hashCode() when there is only 1 reducer
 -

 Key: MAPREDUCE-1287
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1287
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Ed Mazur
Assignee: Ed Mazur
Priority: Minor
 Fix For: 0.22.0

 Attachments: M1287-4.patch, MAPREDUCE-1287.2.patch, 
 MAPREDUCE-1287.3.patch, MAPREDUCE-1287.patch


 HashPartitioner could be optimized to not call the key's hashCode() if there 
 is only 1 reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-181) Secure job submission

2009-12-18 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-181:
--

Status: Open  (was: Patch Available)

 Secure job submission 
 --

 Key: MAPREDUCE-181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amar Kamat
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch, 
 181-4.patch, 181-5.1.patch, 181-5.1.patch, 
 hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch, 
 HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, 
 MAPRED-181-v3.8.patch


 Currently the jobclient accesses the {{mapred.system.dir}} to add job 
 details. Hence the {{mapred.system.dir}} has the permissions of 
 {{rwx-wx-wx}}. This could be a security loophole where the job files might 
 get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1083) Use the user-to-groups mapping service in the JobTracker

2009-12-18 Thread Boris Shkolnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792724#action_12792724
 ] 

Boris Shkolnik commented on MAPREDUCE-1083:
---

2 contrib tests failed:
reran them manually:

[junit] Running org.apache.hadoop.streaming.TestStreamingExitStatus
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 8.194 sec


[junit] Running org.apache.hadoop.streaming.TestStreamingKeyValue
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.185 sec


  Use the user-to-groups mapping service in the JobTracker
 -

 Key: MAPREDUCE-1083
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1083
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Arun C Murthy
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HADOOP-4656_mr.patch, MAPREDUCE-1083-2.patch, 
 MAPREDUCE-1083-3.patch


 HADOOP-4656 introduces a user-to-groups mapping service on the server-side. 
 The JobTracker should use this to map users to their groups rather than 
 relying on the information passed by the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-181) Secure job submission

2009-12-18 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-181:
--

Attachment: 181-6.patch

In my local tests, i discovered that i had to do a bunch of changes to work 
around the extra checks that i introduced in the last patch. One of them being 
check for ownership of the staging dir now includes a check for the UGI of the 
submitting user (otherwise tests that fake UGI were failing during job 
submission). I also introduced a method for getting the staging area location 
from the JobTracker (so that the user's home dir doesn't get clobbered with 
files in .staging dir when tests are run).
I am still testing this patch. With the server side groups patch in, i might 
need to do some minor changes in the testcases for them to work in the new 
model of job submission. But this should mostly be good overall..  Up for 
review.

 Secure job submission 
 --

 Key: MAPREDUCE-181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amar Kamat
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch, 
 181-4.patch, 181-5.1.patch, 181-5.1.patch, 181-6.patch, 
 hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch, 
 HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, 
 MAPRED-181-v3.8.patch


 Currently the jobclient accesses the {{mapred.system.dir}} to add job 
 details. Hence the {{mapred.system.dir}} has the permissions of 
 {{rwx-wx-wx}}. This could be a security loophole where the job files might 
 get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

2009-12-18 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1221:
--

Attachment: MAPREDUCE-1221-v1.patch

The patch allows us to set an amount of memory that will not be used to run 
tasks. If this limit is violated, the task uses the highest amount of memory 
will be killed.

Ex: Configure mapreduce.tasktracker.reserved.physicalmemory.mb=3072
If there's a TaskTracker with 16GB of memory and currently the tasks are using 
14GB of memory, then we have 16GB-14GB  3GB. In this case, 
TaskMemoryManagerThread will kill the task uses the highest amount of memory. 
Note that if the value is not configured, this policy will not triggered.

Killing tasks will slow down the job because the task has to be scheduled 
again. But if the task is not killed in this case, it is very likely that it 
will failed on this node because the node has no memory to run it. Also the 
node might crashed because of this. We choose the highest memory-consuming task 
to kill because it is likely that it is the bad job that's causing the problem.

A part of this patch is done by Dhruba. He has sent me his half-done patch and 
I continued from there.

 Kill tasks on a node if the free physical memory on that machine falls below 
 a configured threshold
 ---

 Key: MAPREDUCE-1221
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Reporter: dhruba borthakur
Assignee: Scott Chen
 Attachments: MAPREDUCE-1221-v1.patch


 The TaskTracker currently supports killing tasks if the virtual memory of a 
 task exceeds a set of configured thresholds. I would like to extend this 
 feature to enable killing tasks if the physical memory used by that task 
 exceeds a certain threshold.
 On a certain operating system (guess?), if user space processes start using 
 lots of memory, the machine hangs and dies quickly. This means that we would 
 like to prevent map-reduce jobs from triggering this condition. From my 
 understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were 
 designed to address this problem. This works well when most map-reduce jobs 
 are Java jobs and have well-defined -Xmx parameters that specify the max 
 virtual memory for each task. On the other hand, if each task forks off 
 mappers/reducers written in other languages (python/php, etc), the total 
 virtual memory usage of the process-subtree varies greatly. In these cases, 
 it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1083) Use the user-to-groups mapping service in the JobTracker

2009-12-18 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-1083:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1. I just committed this. Thanks, Boris!

  Use the user-to-groups mapping service in the JobTracker
 -

 Key: MAPREDUCE-1083
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1083
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Arun C Murthy
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HADOOP-4656_mr.patch, MAPREDUCE-1083-2.patch, 
 MAPREDUCE-1083-3.patch


 HADOOP-4656 introduces a user-to-groups mapping service on the server-side. 
 The JobTracker should use this to map users to their groups rather than 
 relying on the information passed by the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

2009-12-18 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792736#action_12792736
 ] 

Scott Chen commented on MAPREDUCE-1221:
---

About the virtual memory limiting, we have tried to use it in our cluster. Our 
experience is that even if we set the total memory threshold to a high enough 
value, TaskTracker would still kills a considerable amount of tasks when there 
is nothing wrong with the RSS memory. So we decided to extend the virtual 
memory limiting to this physical one. Anyway, it doesn't hurt to have more 
options. It will not be turned on if the configuration is not set.

 Kill tasks on a node if the free physical memory on that machine falls below 
 a configured threshold
 ---

 Key: MAPREDUCE-1221
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Reporter: dhruba borthakur
Assignee: Scott Chen
 Attachments: MAPREDUCE-1221-v1.patch


 The TaskTracker currently supports killing tasks if the virtual memory of a 
 task exceeds a set of configured thresholds. I would like to extend this 
 feature to enable killing tasks if the physical memory used by that task 
 exceeds a certain threshold.
 On a certain operating system (guess?), if user space processes start using 
 lots of memory, the machine hangs and dies quickly. This means that we would 
 like to prevent map-reduce jobs from triggering this condition. From my 
 understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were 
 designed to address this problem. This works well when most map-reduce jobs 
 are Java jobs and have well-defined -Xmx parameters that specify the max 
 virtual memory for each task. On the other hand, if each task forks off 
 mappers/reducers written in other languages (python/php, etc), the total 
 virtual memory usage of the process-subtree varies greatly. In these cases, 
 it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-18 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1317:
-

Status: Open  (was: Patch Available)

I spoke too soon. The cache needs to be properly synchronized because although 
we expect LoggedLocation objects are created through JSON library and should be 
read only afterwards, the cache may be accessed concurrently, and thus should 
be properly synchronized.

Also found a few other minor improvements that I should incorporate.

With these, i think we also need to add a unit test to ensure the code runs 
properly with multiple threads.

 Reducing memory consumption of rumen objects
 

 Key: MAPREDUCE-1317
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.21.0, 0.22.0
Reporter: Hong Tang
Assignee: Hong Tang
 Fix For: 0.21.0, 0.22.0

 Attachments: mapreduce-1317-20091218.patch


 We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
 very large jobs. The purpose of this jira is to optimze memory consumption of 
 rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1258) Fair scheduler event log not logging job info

2009-12-18 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792768#action_12792768
 ] 

Matei Zaharia commented on MAPREDUCE-1258:
--

Thanks for the review, Scott. I'll wait to see if there are any checkstyle 
warnings etc and commit it if there aren't.

 Fair scheduler event log not logging job info
 -

 Key: MAPREDUCE-1258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1258
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/fair-share
Affects Versions: 0.21.0
Reporter: Matei Zaharia
Assignee: Matei Zaharia
Priority: Minor
 Attachments: mapreduce-1258-1.patch


 The MAPREDUCE-706 patch seems to have left an unfinished TODO in the Fair 
 Scheduler - namely, in the dump() function for periodically dumping scheduler 
 state to the event log, the part that dumps information about jobs is 
 commented out. This makes the event log less useful than it was before.
 It should be fairly easy to update this part to use the new scheduler data 
 structures (Schedulable etc) and print the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1250) Refactor job token to use a common token interface

2009-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792775#action_12792775
 ] 

Hadoop QA commented on MAPREDUCE-1250:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428336/m1250-12.patch
  against trunk revision 892411.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/221/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/221/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/221/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/221/console

This message is automatically generated.

 Refactor job token to use a common token interface
 --

 Key: MAPREDUCE-1250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security
Reporter: Kan Zhang
Assignee: Kan Zhang
 Attachments: m1250-09.patch, m1250-12.patch


 The idea is to use a common token interface for both job token and delegation 
 token (HADOOP-6373) so that the RPC layer that uses them don't have to 
 differentiate them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Moved: (MAPREDUCE-1318) Document exit codes and their meanings used by linux task controller

2009-12-18 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik moved HADOOP-5912 to MAPREDUCE-1318:
---

  Component/s: (was: documentation)
   documentation
Fix Version/s: (was: 0.21.0)
   0.21.0
  Key: MAPREDUCE-1318  (was: HADOOP-5912)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

 Document exit codes and their meanings used by linux task controller
 

 Key: MAPREDUCE-1318
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1318
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation
Reporter: Sreekanth Ramakrishnan
Assignee: Anatoli Fomenko
Priority: Blocker
 Fix For: 0.21.0

 Attachments: HADOOP-5912.1.patch


 Currently, linux task controller binary uses a set of exit code, which is not 
 documented. These should be documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1318) Document exit codes and their meanings used by linux task controller

2009-12-18 Thread Anatoli Fomenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Fomenko updated MAPREDUCE-1318:
---

Attachment: MAPREDUCE-1318.1.patch

The patch is fixed per the comments (thank you). 
The patch file name has been changed to MAPREDUCE-1318.1.patch to reflect the 
issue key change.

 Document exit codes and their meanings used by linux task controller
 

 Key: MAPREDUCE-1318
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1318
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation
Reporter: Sreekanth Ramakrishnan
Assignee: Anatoli Fomenko
Priority: Blocker
 Fix For: 0.21.0

 Attachments: HADOOP-5912.1.patch, MAPREDUCE-1318.1.patch


 Currently, linux task controller binary uses a set of exit code, which is not 
 documented. These should be documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1318) Document exit codes and their meanings used by linux task controller

2009-12-18 Thread Anatoli Fomenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Fomenko updated MAPREDUCE-1318:
---

Status: Patch Available  (was: Open)

Please review the submitted patch.

 Document exit codes and their meanings used by linux task controller
 

 Key: MAPREDUCE-1318
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1318
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation
Reporter: Sreekanth Ramakrishnan
Assignee: Anatoli Fomenko
Priority: Blocker
 Fix For: 0.21.0

 Attachments: HADOOP-5912.1.patch, MAPREDUCE-1318.1.patch


 Currently, linux task controller binary uses a set of exit code, which is not 
 documented. These should be documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-181) Secure job submission

2009-12-18 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-181:
--

Attachment: 181-8.patch

This fixes Owen's offline comments about having a finite limit on the split 
meta info that the JobTracker reads. The other comment was about a typo in 
writJobSplitMetaInfo. 
I also fixed the testcases. To be specific, w.r.t the earlier patch, the 
differences in this w.r.t the testcases are in
1) TestSubmitJob.java / TestSeveral.java / ClusterWithLinuxTaskController.java 
where i setup the staging area root directory with proper permissions so that 
job clients can create the .staging directories there.
Other than that a javadoc warning is fixed.

I ran test-patch locally and it passed. ant test is in progress.

 Secure job submission 
 --

 Key: MAPREDUCE-181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amar Kamat
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch, 
 181-4.patch, 181-5.1.patch, 181-5.1.patch, 181-6.patch, 181-8.patch, 
 hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch, 
 HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, 
 MAPRED-181-v3.8.patch


 Currently the jobclient accesses the {{mapred.system.dir}} to add job 
 details. Hence the {{mapred.system.dir}} has the permissions of 
 {{rwx-wx-wx}}. This could be a security loophole where the job files might 
 get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.