[jira] Commented: (MAPREDUCE-1308) reduce tasks stall and are eventually killed
[ https://issues.apache.org/jira/browse/MAPREDUCE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791842#action_12791842 ] Arun C Murthy commented on MAPREDUCE-1308: -- bq. This is problematic because our scheduled Hadoop jobs now take an extra hour-and-a-half to run (6000 seconds). First up, this is a per-job config... why is mapred.task.timeout set to 6000s? The default value is 600s. Could you please check the reducer's syslog file to check if there are issues? reduce tasks stall and are eventually killed Key: MAPREDUCE-1308 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1308 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Environment: 20-node cluster, 8 cores per machine, 32GB memory, Fedora Linux Reporter: Brian Karlak We recently migrated our 0.19.2 cluster from Gentoo Linux to Fedora Linux. Everything was running smoothly before, but now about 5%-10% of our jobs have at least one reduce task that stalls out and is eventually killed with the message: Task attempt_200912102211_1648_r_09_0 failed to report status for 6003 seconds. Killing! The task is then re-launched and completes successfully, usually in a couple of minutes. This is problematic because our scheduled Hadoop jobs now take an extra hour-and-a-half to run (6000 seconds). There are no indications in the logs that anything is amiss. The task starts, a small amount of the copy/shuffle runs, and then nothing is else is heard from the task until it is killed. I will attach the relevant parts of the TaskTracker logs in the comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1067) Default state of queues is undefined when unspecified
[ https://issues.apache.org/jira/browse/MAPREDUCE-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V.V.Chaitanya Krishna updated MAPREDUCE-1067: - Attachment: MAPREDUCE-1067-6.patch Uploading new patch with the above comments implemented. Default state of queues is undefined when unspecified - Key: MAPREDUCE-1067 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1067 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.21.0 Reporter: V.V.Chaitanya Krishna Assignee: V.V.Chaitanya Krishna Priority: Blocker Fix For: 0.21.0 Attachments: MAPREDUCE-1067-1.patch, MAPREDUCE-1067-2.patch, MAPREDUCE-1067-3.patch, MAPREDUCE-1067-4.patch, MAPREDUCE-1067-5.patch, MAPREDUCE-1067-6.patch Currently, if the state of a queue is not specified, it is being set to undefined state instead of running state. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1067) Default state of queues is undefined when unspecified
[ https://issues.apache.org/jira/browse/MAPREDUCE-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V.V.Chaitanya Krishna updated MAPREDUCE-1067: - Status: Patch Available (was: Open) Default state of queues is undefined when unspecified - Key: MAPREDUCE-1067 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1067 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.21.0 Reporter: V.V.Chaitanya Krishna Assignee: V.V.Chaitanya Krishna Priority: Blocker Fix For: 0.21.0 Attachments: MAPREDUCE-1067-1.patch, MAPREDUCE-1067-2.patch, MAPREDUCE-1067-3.patch, MAPREDUCE-1067-4.patch, MAPREDUCE-1067-5.patch, MAPREDUCE-1067-6.patch Currently, if the state of a queue is not specified, it is being set to undefined state instead of running state. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1174) Sqoop improperly handles table/column names which are reserved sql words
[ https://issues.apache.org/jira/browse/MAPREDUCE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791852#action_12791852 ] Hadoop QA commented on MAPREDUCE-1174: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428067/MAPREDUCE-1174.4.patch against trunk revision 891524. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/207/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/207/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/207/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/207/console This message is automatically generated. Sqoop improperly handles table/column names which are reserved sql words Key: MAPREDUCE-1174 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1174 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-1174.2.patch, MAPREDUCE-1174.3.patch, MAPREDUCE-1174.4.patch, MAPREDUCE-1174.patch In some databases it is legal to name tables and columns with terms that overlap SQL reserved keywords (e.g., {{CREATE}}, {{table}}, etc.). In such cases, the database allows you to escape the table and column names. We should always escape table and column names when possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete
[ https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791855#action_12791855 ] Hadoop QA commented on MAPREDUCE-1305: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428234/MAPRED-1305.patch against trunk revision 891524. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/208/console This message is automatically generated. Massive performance problem with DistCp and -delete --- Key: MAPREDUCE-1305 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.20.1 Reporter: Peter Romianowski Assignee: Peter Romianowski Attachments: MAPRED-1305.patch *First problem* In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus objects when the path is all we need. The performance problem comes from org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries to retrieve file permissions by issuing a ls -ld path which is painfully slow. Changed that to just serialize Path and not FileStatus. *Second problem* To delete the files we invoke the hadoop command line tool with option -rmr path. Again, for each file. Changed that to dstfs.delete(path, true) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1067) Default state of queues is undefined when unspecified
[ https://issues.apache.org/jira/browse/MAPREDUCE-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791868#action_12791868 ] rahul k singh commented on MAPREDUCE-1067: -- +1 with patch Default state of queues is undefined when unspecified - Key: MAPREDUCE-1067 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1067 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.21.0 Reporter: V.V.Chaitanya Krishna Assignee: V.V.Chaitanya Krishna Priority: Blocker Fix For: 0.21.0 Attachments: MAPREDUCE-1067-1.patch, MAPREDUCE-1067-2.patch, MAPREDUCE-1067-3.patch, MAPREDUCE-1067-4.patch, MAPREDUCE-1067-5.patch, MAPREDUCE-1067-6.patch Currently, if the state of a queue is not specified, it is being set to undefined state instead of running state. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-118) Job.getJobID() will always return null
[ https://issues.apache.org/jira/browse/MAPREDUCE-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791873#action_12791873 ] Amareshwari Sriramadasu commented on MAPREDUCE-118: --- The proposal looks fine. But I found small issue implementing it. In 0.21, ClientProtocol.getNewJobID() throws InterruptedException out. The new Job constructors(introduced in 0.21) can be changed to throw InterruptedException. But, the deprecated constructors cannot be changed. After discussing with Arun, one solution we could think of is add a deprecated setJobID in JobContextImpl, which can be called from deprecated constructors. Will remove the newly added method, when we remove the deprecated constructors. Job.getJobID() will always return null -- Key: MAPREDUCE-118 URL: https://issues.apache.org/jira/browse/MAPREDUCE-118 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: Amar Kamat Assignee: Amareshwari Sriramadasu Priority: Blocker Fix For: 0.20.2 Attachments: patch-118-0.20.txt, patch-118-0.21.txt, patch-118.txt JobContext is used for a read-only view of job's info. Hence all the readonly fields in JobContext are set in the constructor. Job extends JobContext. When a Job is created, jobid is not known and hence there is no way to set JobID once Job is created. JobID is obtained only when the JobClient queries the jobTracker for a job-id., which happens later i.e upon job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-896) Users can set non-writable permissions on temporary files for TT and can abuse disk usage.
[ https://issues.apache.org/jira/browse/MAPREDUCE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-896: --- Attachment: MR-896.v3.patch Attaching patch for trunk. Incorporated review comments. Fixed the issue of launching task-controller in case of path not existing, similar to y896.2.1.fix.v2.patch. Added more testcases for cases of (a) needCleanup is false and (b) jvmReuse. Users can set non-writable permissions on temporary files for TT and can abuse disk usage. -- Key: MAPREDUCE-896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-896 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.21.0 Reporter: Vinod K V Assignee: Ravi Gummadi Fix For: 0.21.0 Attachments: MR-896.patch, MR-896.v1.patch, MR-896.v2.patch, MR-896.v3.patch, y896.v1.patch, y896.v2.1.fix.patch, y896.v2.1.fix.v1.patch, y896.v2.1.fix.v2.patch, y896.v2.1.patch, y896.v2.patch As of now, irrespective of the TaskController in use, TT itself does a full delete on local files created by itself or job tasks. This step, depending upon TT's umask and the permissions set by files by the user, for e.g in job-work/task-work or child.tmp directories, may or may not go through successful completion fully. Thus is left an opportunity for abusing disk space usage either accidentally or intentionally by TT/users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-913) TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-913: -- Assignee: Amareshwari Sriramadasu Status: Patch Available (was: Open) TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker Key: MAPREDUCE-913 URL: https://issues.apache.org/jira/browse/MAPREDUCE-913 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.1 Reporter: Vinod K V Assignee: Amareshwari Sriramadasu Priority: Blocker Fix For: 0.21.0 Attachments: mapreduce-913-1.patch, MAPREDUCE-913-20091119.1.txt, MAPREDUCE-913-20091119.2.txt, MAPREDUCE-913-20091120.1.txt, patch-913.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-913) TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-913: -- Attachment: patch-913.txt Patch does the following: 1. changed reportTaskFinished code to ensure release slot happens always by calling releaseSlot in finally block. 2. Have undone the changes to do with throwing exception when arguments to debug-script could not be constructed, as it was already initializing them to empty String. 3. Modified the testcase to use new api. bq. In test case can we verify the correct number of the map slot is actually reported back to JobTracker after the failing job completes, this would test the actual slot management. 4. Added asserts for slot management. Verified the test passes with the patch and fails without the patch. bq. Can we check if the workDir is non-null in the run-debug script and throw an exception if the same is null? Would prevent launch of task-controller code. If workdDir is null or if it doesnt exists, the current code already throws IOException. bq. Wouldn't it be much better that we add a check to figure out if the taskJVM was launched or not and then run debug script accordingly. This may need more discussion, since it changes the feature in a way that debug script will be launched only when taskJvm is launched properly. TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker Key: MAPREDUCE-913 URL: https://issues.apache.org/jira/browse/MAPREDUCE-913 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.1 Reporter: Vinod K V Priority: Blocker Fix For: 0.21.0 Attachments: mapreduce-913-1.patch, MAPREDUCE-913-20091119.1.txt, MAPREDUCE-913-20091119.2.txt, MAPREDUCE-913-20091120.1.txt, patch-913.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8
[ https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791951#action_12791951 ] Hadoop QA commented on MAPREDUCE-1277: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428009/streaming-1277-new.patch against trunk revision 891524. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/console This message is automatically generated. Streaming job should support other characterset in user's stderr log, not only utf8 --- Key: MAPREDUCE-1277 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.21.0 Reporter: ZhuGuanyin Assignee: ZhuGuanyin Fix For: 0.21.0 Attachments: streaming-1277-new.patch, streaming-1277.patch Current implementation in streaming only support utf8 encoded user stderr log, it should encode free to support other characterset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1308) reduce tasks stall and are eventually killed
[ https://issues.apache.org/jira/browse/MAPREDUCE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791965#action_12791965 ] Brian Karlak commented on MAPREDUCE-1308: - Arun -- Thanks for the pointers. I'm not quite sure how mapred.task.timeout got set incorrectly -- I went through our local SVN repo, and it seems to have been set that way at our site since we were using 0.16.4 back in July 2008. Since it was never an issue until now, we never noticed, I guess. ;-) Parameter has been modified. I'll check the syslogs and report back in the next comment. Brian reduce tasks stall and are eventually killed Key: MAPREDUCE-1308 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1308 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Environment: 20-node cluster, 8 cores per machine, 32GB memory, Fedora Linux Reporter: Brian Karlak We recently migrated our 0.19.2 cluster from Gentoo Linux to Fedora Linux. Everything was running smoothly before, but now about 5%-10% of our jobs have at least one reduce task that stalls out and is eventually killed with the message: Task attempt_200912102211_1648_r_09_0 failed to report status for 6003 seconds. Killing! The task is then re-launched and completes successfully, usually in a couple of minutes. This is problematic because our scheduled Hadoop jobs now take an extra hour-and-a-half to run (6000 seconds). There are no indications in the logs that anything is amiss. The task starts, a small amount of the copy/shuffle runs, and then nothing is else is heard from the task until it is killed. I will attach the relevant parts of the TaskTracker logs in the comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1308) reduce tasks stall and are eventually killed
[ https://issues.apache.org/jira/browse/MAPREDUCE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791969#action_12791969 ] Brian Karlak commented on MAPREDUCE-1308: - I can find no indication of error in the syslog files. /var/log/messages has only syslog-ng and ntpd messages in the time 03:50 -- 05:40. reduce tasks stall and are eventually killed Key: MAPREDUCE-1308 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1308 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Environment: 20-node cluster, 8 cores per machine, 32GB memory, Fedora Linux Reporter: Brian Karlak We recently migrated our 0.19.2 cluster from Gentoo Linux to Fedora Linux. Everything was running smoothly before, but now about 5%-10% of our jobs have at least one reduce task that stalls out and is eventually killed with the message: Task attempt_200912102211_1648_r_09_0 failed to report status for 6003 seconds. Killing! The task is then re-launched and completes successfully, usually in a couple of minutes. This is problematic because our scheduled Hadoop jobs now take an extra hour-and-a-half to run (6000 seconds). There are no indications in the logs that anything is amiss. The task starts, a small amount of the copy/shuffle runs, and then nothing is else is heard from the task until it is killed. I will attach the relevant parts of the TaskTracker logs in the comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1308) reduce tasks stall and are eventually killed
[ https://issues.apache.org/jira/browse/MAPREDUCE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791973#action_12791973 ] Brian Karlak commented on MAPREDUCE-1308: - Outside of the time period in question, I do see a few other worrisome log messages in the hadoop (not syslog) files. In the datanode logs, I see 5 messages (in a 24 hour period) like: 2009-12-17 10:14:13,565 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(172.29.2.67:50010, storageID=DS-739735928-172.29.2.67-50010-1259798617913, infoPort=50075, ipcPort=50020):Got exception while serving blk_786885296716083440_1313776 to /172.29.2.67: java.io.IOException: Block blk_786885296716083440_1313776 is not valid. at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:731) at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:719) at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) 2009-12-17 10:14:13,565 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(172.29.2.67:50010, storageID=DS-739735928-172.29.2.67-50010-1259798617913, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Block blk_786885296716083440_1313776 is not valid. at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:731) at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:719) at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) reduce tasks stall and are eventually killed Key: MAPREDUCE-1308 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1308 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Environment: 20-node cluster, 8 cores per machine, 32GB memory, Fedora Linux Reporter: Brian Karlak We recently migrated our 0.19.2 cluster from Gentoo Linux to Fedora Linux. Everything was running smoothly before, but now about 5%-10% of our jobs have at least one reduce task that stalls out and is eventually killed with the message: Task attempt_200912102211_1648_r_09_0 failed to report status for 6003 seconds. Killing! The task is then re-launched and completes successfully, usually in a couple of minutes. This is problematic because our scheduled Hadoop jobs now take an extra hour-and-a-half to run (6000 seconds). There are no indications in the logs that anything is amiss. The task starts, a small amount of the copy/shuffle runs, and then nothing is else is heard from the task until it is killed. I will attach the relevant parts of the TaskTracker logs in the comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1308) reduce tasks stall and are eventually killed
[ https://issues.apache.org/jira/browse/MAPREDUCE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791974#action_12791974 ] Brian Karlak commented on MAPREDUCE-1308: - And I see one message for a SocketTimeout: 2009-12-17 06:26:20,082 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(172.29.2.67:50010, storageID=DS-739735928-172.29.2.67-50010-1259798617913, infoPort=50075, ipcPort=50020):Got exception while serving blk_-7712543153225807619_1300911 to /172.29.2.67: java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/172.29.2.67:50010 remote=/172.29.2.67:41058] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) 2009-12-17 06:26:20,082 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(172.29.2.67:50010, storageID=DS-739735928-172.29.2.67-50010-1259798617913, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/172.29.2.67:50010 remote=/172.29.2.67:41058] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) reduce tasks stall and are eventually killed Key: MAPREDUCE-1308 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1308 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Environment: 20-node cluster, 8 cores per machine, 32GB memory, Fedora Linux Reporter: Brian Karlak We recently migrated our 0.19.2 cluster from Gentoo Linux to Fedora Linux. Everything was running smoothly before, but now about 5%-10% of our jobs have at least one reduce task that stalls out and is eventually killed with the message: Task attempt_200912102211_1648_r_09_0 failed to report status for 6003 seconds. Killing! The task is then re-launched and completes successfully, usually in a couple of minutes. This is problematic because our scheduled Hadoop jobs now take an extra hour-and-a-half to run (6000 seconds). There are no indications in the logs that anything is amiss. The task starts, a small amount of the copy/shuffle runs, and then nothing is else is heard from the task until it is killed. I will attach the relevant parts of the TaskTracker logs in the comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1308) reduce tasks stall and are eventually killed
[ https://issues.apache.org/jira/browse/MAPREDUCE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791978#action_12791978 ] Brian Karlak commented on MAPREDUCE-1308: - And in the tasktracker logs, around the same time but for a different task, I get errors like the one below. There are about 55 of these over a 24-hour period. 2009-12-17 03:56:14,843 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_200912102211_1648_m_82_0,46) failed : java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.mortbay.http.ChunkingOutputStream.bypassWrite(ChunkingOutputStream.java:151) at org.mortbay.http.BufferedOutputStream.write(BufferedOutputStream.java:139) at org.mortbay.http.HttpOutputStream.write(HttpOutputStream.java:423) at org.mortbay.jetty.servlet.ServletOut.write(ServletOut.java:54) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2919) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) 2009-12-17 03:56:14,844 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 172.29.2.67:50060, dest: 172.29.2.61:17116, bytes: 589824, op: MAPRED_SHUFFLE, cliID: attempt_200912102211_1648_m_82_0 2009-12-17 03:56:14,844 WARN /: /mapOutput?job=job_200912102211_1648map=attempt_200912102211_1648_m_82_0reduce=46: java.lang.IllegalStateException: Committed at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212) at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2945) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) reduce tasks stall and are eventually killed Key: MAPREDUCE-1308 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1308 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Environment: 20-node cluster, 8 cores per machine, 32GB memory, Fedora Linux Reporter: Brian Karlak We recently migrated our 0.19.2 cluster from Gentoo Linux to Fedora Linux. Everything was running smoothly before, but now about 5%-10% of our jobs have at least one reduce task that stalls out and is eventually killed with the message: Task attempt_200912102211_1648_r_09_0 failed to report status for 6003 seconds. Killing!
[jira] Commented: (MAPREDUCE-698) Per-pool task limits for the fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792013#action_12792013 ] Matei Zaharia commented on MAPREDUCE-698: - I've looked at the patch more carefully, and it all looks good, except there seems to be a loop doing nothing in PoolManager: {noformat} +for(String pool : poolNamesInAllocFile) { +} {noformat} I can remove this myself and commit the patch, unless there was a reason you had it there (and forgot to put in some code). Per-pool task limits for the fair scheduler --- Key: MAPREDUCE-698 URL: https://issues.apache.org/jira/browse/MAPREDUCE-698 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/fair-share Reporter: Matei Zaharia Assignee: Kevin Peterson Fix For: 0.21.0 Attachments: MAPREDUCE-698-prelim.patch, mapreduce-698-trunk-3.patch, mapreduce-698-trunk-4.patch, mapreduce-698-trunk.patch, mapreduce-698-trunk.patch The fair scheduler could use a way to cap the share of a given pool similar to MAPREDUCE-532. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-698) Per-pool task limits for the fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792018#action_12792018 ] Kevin Peterson commented on MAPREDUCE-698: -- That was for checking if the allocations were consistent (min max), I moved this into the loop where they read but missed this bit it looks like. Per-pool task limits for the fair scheduler --- Key: MAPREDUCE-698 URL: https://issues.apache.org/jira/browse/MAPREDUCE-698 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/fair-share Reporter: Matei Zaharia Assignee: Kevin Peterson Fix For: 0.21.0 Attachments: MAPREDUCE-698-prelim.patch, mapreduce-698-trunk-3.patch, mapreduce-698-trunk-4.patch, mapreduce-698-trunk.patch, mapreduce-698-trunk.patch The fair scheduler could use a way to cap the share of a given pool similar to MAPREDUCE-532. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete
[ https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Romianowski updated MAPREDUCE-1305: - Attachment: (was: MAPRED-1305.patch) Massive performance problem with DistCp and -delete --- Key: MAPREDUCE-1305 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.20.1 Reporter: Peter Romianowski Assignee: Peter Romianowski *First problem* In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus objects when the path is all we need. The performance problem comes from org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries to retrieve file permissions by issuing a ls -ld path which is painfully slow. Changed that to just serialize Path and not FileStatus. *Second problem* To delete the files we invoke the hadoop command line tool with option -rmr path. Again, for each file. Changed that to dstfs.delete(path, true) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete
[ https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Romianowski updated MAPREDUCE-1305: - Attachment: MAPREDUCE-1305.patch We even do not need the absolute path serialized. Using NullWritable now. Patch is against trunk, rev 891812 Massive performance problem with DistCp and -delete --- Key: MAPREDUCE-1305 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.20.1 Reporter: Peter Romianowski Assignee: Peter Romianowski Attachments: MAPREDUCE-1305.patch *First problem* In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus objects when the path is all we need. The performance problem comes from org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries to retrieve file permissions by issuing a ls -ld path which is painfully slow. Changed that to just serialize Path and not FileStatus. *Second problem* To delete the files we invoke the hadoop command line tool with option -rmr path. Again, for each file. Changed that to dstfs.delete(path, true) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-698) Per-pool task limits for the fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated MAPREDUCE-698: Status: Open (was: Patch Available) Per-pool task limits for the fair scheduler --- Key: MAPREDUCE-698 URL: https://issues.apache.org/jira/browse/MAPREDUCE-698 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/fair-share Reporter: Matei Zaharia Assignee: Kevin Peterson Fix For: 0.21.0 Attachments: MAPREDUCE-698-prelim.patch, mapreduce-698-trunk-3.patch, mapreduce-698-trunk-4.patch, mapreduce-698-trunk-5.patch, mapreduce-698-trunk.patch, mapreduce-698-trunk.patch The fair scheduler could use a way to cap the share of a given pool similar to MAPREDUCE-532. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-698) Per-pool task limits for the fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated MAPREDUCE-698: Attachment: mapreduce-698-trunk-5.patch Here's the patch with the for loop removed. I'm going to run it through Hudson for good measure, but it seems to be working fine from my point of view, and the test failures in the previous run were unrelated. I'll commit it unless Hudson complains. Per-pool task limits for the fair scheduler --- Key: MAPREDUCE-698 URL: https://issues.apache.org/jira/browse/MAPREDUCE-698 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/fair-share Reporter: Matei Zaharia Assignee: Kevin Peterson Fix For: 0.21.0 Attachments: MAPREDUCE-698-prelim.patch, mapreduce-698-trunk-3.patch, mapreduce-698-trunk-4.patch, mapreduce-698-trunk-5.patch, mapreduce-698-trunk.patch, mapreduce-698-trunk.patch The fair scheduler could use a way to cap the share of a given pool similar to MAPREDUCE-532. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-698) Per-pool task limits for the fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated MAPREDUCE-698: Fix Version/s: (was: 0.21.0) 0.22.0 Hadoop Flags: [Reviewed] Status: Patch Available (was: Open) Per-pool task limits for the fair scheduler --- Key: MAPREDUCE-698 URL: https://issues.apache.org/jira/browse/MAPREDUCE-698 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/fair-share Reporter: Matei Zaharia Assignee: Kevin Peterson Fix For: 0.22.0 Attachments: MAPREDUCE-698-prelim.patch, mapreduce-698-trunk-3.patch, mapreduce-698-trunk-4.patch, mapreduce-698-trunk-5.patch, mapreduce-698-trunk.patch, mapreduce-698-trunk.patch The fair scheduler could use a way to cap the share of a given pool similar to MAPREDUCE-532. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1146) Sqoop dependencies break Ecpilse build on Linux
[ https://issues.apache.org/jira/browse/MAPREDUCE-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-1146: - Resolution: Fixed Fix Version/s: 0.22.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I've just committed this. Thanks Aaron! (There were actually no release audit warnings introduced by this patch. Also, the test failures were unrelated.) Sqoop dependencies break Ecpilse build on Linux --- Key: MAPREDUCE-1146 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1146 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Environment: Linux, Sun JDK6 Reporter: Konstantin Boudnik Assignee: Aaron Kimball Fix For: 0.22.0 Attachments: MAPREDUCE-1146.2.patch, MAPREDUCE-1146.3.patch, MAPREDUCE-1146.4.patch, MAPREDUCE-1146.patch Under Linux there's the error in the Eclipse Problems view: {noformat} - com.sun.tools cannot be resolved at line 166 of org.apache.hadoop.sqoop.orm.CompilationManager {noformat} The problem doesn't appear on MacOS though -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1174) Sqoop improperly handles table/column names which are reserved sql words
[ https://issues.apache.org/jira/browse/MAPREDUCE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792094#action_12792094 ] Aaron Kimball commented on MAPREDUCE-1174: -- Only test failures are unrelated (streaming). Sqoop improperly handles table/column names which are reserved sql words Key: MAPREDUCE-1174 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1174 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-1174.2.patch, MAPREDUCE-1174.3.patch, MAPREDUCE-1174.4.patch, MAPREDUCE-1174.patch In some databases it is legal to name tables and columns with terms that overlap SQL reserved keywords (e.g., {{CREATE}}, {{table}}, etc.). In such cases, the database allows you to escape the table and column names. We should always escape table and column names when possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1250) Refactor job token to use a common token interface
[ https://issues.apache.org/jira/browse/MAPREDUCE-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792096#action_12792096 ] Kan Zhang commented on MAPREDUCE-1250: -- I think it may make more sense to store a Token in the Task, especially since it is Writable and can be easily serialized as part of the Task's write method. Currently, it only servers as a temporary in-memory cache for the SecretKey (to avoid converting from tokenPassword to SecretKey each time the token is used for Shuffle). The token itself is not intended to be serialized and sent along with the Task object. The passing of credentials for a Task is handled by way of the credential cache. If we're going to pass credentials along with Task objects, we need to make sure Task objects are handled properly. Since this is a re-factoring patch, I suggest we evaluate it as part of the credential cache work Boris is doing. Attaching a patch that addressed your other comments. Refactor job token to use a common token interface -- Key: MAPREDUCE-1250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security Reporter: Kan Zhang Assignee: Kan Zhang Attachments: m1250-09.patch The idea is to use a common token interface for both job token and delegation token (HADOOP-6373) so that the RPC layer that uses them don't have to differentiate them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1250) Refactor job token to use a common token interface
[ https://issues.apache.org/jira/browse/MAPREDUCE-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated MAPREDUCE-1250: - Attachment: m1250-12.patch Refactor job token to use a common token interface -- Key: MAPREDUCE-1250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security Reporter: Kan Zhang Assignee: Kan Zhang Attachments: m1250-09.patch, m1250-12.patch The idea is to use a common token interface for both job token and delegation token (HADOOP-6373) so that the RPC layer that uses them don't have to differentiate them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792118#action_12792118 ] Hadoop QA commented on MAPREDUCE-1143: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427899/MAPRED-1143-7.patch against trunk revision 891524. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/console This message is automatically generated. runningMapTasks counter is not properly decremented in case of failed Tasks. Key: MAPREDUCE-1143 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: rahul k singh Assignee: rahul k singh Priority: Blocker Fix For: 0.21.0 Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch, MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch, MAPRED-1143-5.patch.txt, MAPRED-1143-6.patch, MAPRED-1143-7.patch, MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch, MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch, MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch, MAPRED-1143-ydist-7.patch, MAPRED-1143-ydist-8.patch.txt, MAPRED-1143-ydist-9.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)
[ https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-961: - Status: Open (was: Patch Available) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s) --- Key: MAPREDUCE-961 URL: https://issues.apache.org/jira/browse/MAPREDUCE-961 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/fair-share Affects Versions: 0.22.0 Reporter: dhruba borthakur Assignee: Scott Chen Fix For: 0.22.0 Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch, ResourceScheduling.pdf Design and develop a ResouceAwareLoadManager for the FairShare scheduler that dynamically decides how many maps/reduces to run on a particular machine based on the CPU/Memory/diskIO/network usage in that machine. The amount of resources currently used on each task tracker is being fed into the ResourceAwareLoadManager in real-time via an entity that is external to Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1174) Sqoop improperly handles table/column names which are reserved sql words
[ https://issues.apache.org/jira/browse/MAPREDUCE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-1174: - Resolution: Fixed Fix Version/s: 0.22.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I've just committed this. Thanks Aaron! Sqoop improperly handles table/column names which are reserved sql words Key: MAPREDUCE-1174 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1174 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.22.0 Attachments: MAPREDUCE-1174.2.patch, MAPREDUCE-1174.3.patch, MAPREDUCE-1174.4.patch, MAPREDUCE-1174.patch In some databases it is legal to name tables and columns with terms that overlap SQL reserved keywords (e.g., {{CREATE}}, {{table}}, etc.). In such cases, the database allows you to escape the table and column names. We should always escape table and column names when possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Patch Available (was: Open) TrackerDistributedCacheManager can delete file asynchronously - Key: MAPREDUCE-1302 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to delete files from distributed cache asynchronously. That will help make task initialization faster, because task initialization calls the code that localizes files into the cache and may delete some other files. The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Attachment: MAPREDUCE-1302.1.patch This patch is on top of MAPREDUCE-1213 which is already committed. TrackerDistributedCacheManager can delete file asynchronously - Key: MAPREDUCE-1302 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to delete files from distributed cache asynchronously. That will help make task initialization faster, because task initialization calls the code that localizes files into the cache and may delete some other files. The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Open (was: Patch Available) TrackerDistributedCacheManager can delete file asynchronously - Key: MAPREDUCE-1302 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to delete files from distributed cache asynchronously. That will help make task initialization faster, because task initialization calls the code that localizes files into the cache and may delete some other files. The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1235) java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball reassigned MAPREDUCE-1235: Assignee: Aaron Kimball java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 to TIMESTAMP. Key: MAPREDUCE-1235 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1235 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Affects Versions: 0.20.1 Environment: hadoop 0.20.1 sqoop ubuntu karmic mysql 4 Reporter: valentina kroshilina Assignee: Aaron Kimball Priority: Minor Original Estimate: 4h Remaining Estimate: 4h *Description*: java.io.IOException is thrown when trying to import a table to HDFS using Sqoop. Table has 0 value in a field of type datetime. *Full Exception*: java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 to TIMESTAMP. *Original question*: http://getsatisfaction.com/cloudera/topics/cant_import_table?utm_content=reply_linkutm_medium=emailutm_source=reply_notification -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1295) We need a job trace manipulator to build gridmix runs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-1295: - Status: Patch Available (was: Open) This patch fixes an applicability issue. We need a job trace manipulator to build gridmix runs. -- Key: MAPREDUCE-1295 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1295 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Dick King Assignee: Dick King Attachments: mapreduce-1295--2009-12-17.patch, mapreduce-1297--2009-12-14.patch Rumen produces job traces, which are JSON format files describing important aspects of all jobs that are run [successfully or not] on a hadoop map/reduce cluster. There are two packages under development that will consume these trace files and produce actions in that cluster or another cluster: gridmix3 [see jira MAPREDUCE-1124 ] and Mumak [a simulator -- see MAPREDUCE-728 ]. It would be useful to be able to do two things with job traces, so we can run experiments using these two tools: change the duration, and change the density. I would like to provide a folder, a tool that can wrap a long-duration execution trace to redistribute its jobs over a shorter interval, and also change the density by duplicating or culling away jobs from the folded combined job trace. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1235) java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-1235: - Status: Patch Available (was: Open) java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 to TIMESTAMP. Key: MAPREDUCE-1235 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1235 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Affects Versions: 0.20.1 Environment: hadoop 0.20.1 sqoop ubuntu karmic mysql 4 Reporter: valentina kroshilina Assignee: Aaron Kimball Priority: Minor Attachments: MAPREDUCE-1235.patch Original Estimate: 4h Remaining Estimate: 4h *Description*: java.io.IOException is thrown when trying to import a table to HDFS using Sqoop. Table has 0 value in a field of type datetime. *Full Exception*: java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 to TIMESTAMP. *Original question*: http://getsatisfaction.com/cloudera/topics/cant_import_table?utm_content=reply_linkutm_medium=emailutm_source=reply_notification -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1235) java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-1235: - Attachment: MAPREDUCE-1235.patch Attaching patch to fix this issue. MySQL supports TIMESTAMP values of '-00-00 00:00:00' which is out-of-range for java.sql.Timestamp. MySQL allows various behaviors for handling this; the default used to be to convert this value to null; since MySQL 5 it now throws IOException when such a timestamp is retrieved. Sqoop now sets the default behavior to be convert these values to 'null', since this is a reasonable data conversion given the imprecision available. Users can override this default by passing the {{zeroDateTimeBehavior=exception}} parameter in the connect string. java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 to TIMESTAMP. Key: MAPREDUCE-1235 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1235 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Affects Versions: 0.20.1 Environment: hadoop 0.20.1 sqoop ubuntu karmic mysql 4 Reporter: valentina kroshilina Assignee: Aaron Kimball Priority: Minor Attachments: MAPREDUCE-1235.patch Original Estimate: 4h Remaining Estimate: 4h *Description*: java.io.IOException is thrown when trying to import a table to HDFS using Sqoop. Table has 0 value in a field of type datetime. *Full Exception*: java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 to TIMESTAMP. *Original question*: http://getsatisfaction.com/cloudera/topics/cant_import_table?utm_content=reply_linkutm_medium=emailutm_source=reply_notification -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1295) We need a job trace manipulator to build gridmix runs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-1295: - Attachment: mapreduce-1295--2009-12-17.patch This patch applies on a direct download of Trunk, and replaces the previous patch We need a job trace manipulator to build gridmix runs. -- Key: MAPREDUCE-1295 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1295 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Dick King Assignee: Dick King Attachments: mapreduce-1295--2009-12-17.patch, mapreduce-1297--2009-12-14.patch Rumen produces job traces, which are JSON format files describing important aspects of all jobs that are run [successfully or not] on a hadoop map/reduce cluster. There are two packages under development that will consume these trace files and produce actions in that cluster or another cluster: gridmix3 [see jira MAPREDUCE-1124 ] and Mumak [a simulator -- see MAPREDUCE-728 ]. It would be useful to be able to do two things with job traces, so we can run experiments using these two tools: change the duration, and change the density. I would like to provide a folder, a tool that can wrap a long-duration execution trace to redistribute its jobs over a shorter interval, and also change the density by duplicating or culling away jobs from the folded combined job trace. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1235) java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792176#action_12792176 ] Todd Lipcon commented on MAPREDUCE-1235: patch looks good to me java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 to TIMESTAMP. Key: MAPREDUCE-1235 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1235 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Affects Versions: 0.20.1 Environment: hadoop 0.20.1 sqoop ubuntu karmic mysql 4 Reporter: valentina kroshilina Assignee: Aaron Kimball Priority: Minor Attachments: MAPREDUCE-1235.patch Original Estimate: 4h Remaining Estimate: 4h *Description*: java.io.IOException is thrown when trying to import a table to HDFS using Sqoop. Table has 0 value in a field of type datetime. *Full Exception*: java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 to TIMESTAMP. *Original question*: http://getsatisfaction.com/cloudera/topics/cant_import_table?utm_content=reply_linkutm_medium=emailutm_source=reply_notification -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1298) better access/organization of userlogs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meng Mao updated MAPREDUCE-1298: Attachment: fido.py Attached is a script that illustrates a typical debugging approach. The script goes out to all the worker nodes and grabs any userlogs for attempts for a given job. If there were a page that brought all these userlogs together for a given job, this script wouldn't be necessary. better access/organization of userlogs -- Key: MAPREDUCE-1298 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1298 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Reporter: Meng Mao Priority: Minor Attachments: fido.py Right now, it is quite a chore to browse to all userlogs generated during a given map or reduce phase. It is quite easy to browse to a job and look at either the map or reduce tasks, like so: /jobtasks.jsp?jobid=job_myidtype=mappagenum=1 /jobtasks.jsp?jobid=job_myidtype=reducepagenum=1 However, it is not easy to look at the stderr output across all the attempts. Currently, the best technique I know of is to browse into each task: /taskdetails.jsp?jobid=job_myidtipid=task_taskid And from there, jump to the slave node's task log for that taskid: slavenode/tasklog?taskid=attempt_for the taskidall=true I'm not suggesting that there needs to be really sophisticated way to present all the task userlogs in one place, especially with the expected size of the logs. However, it would be nice to be presented with a list of URLs (that are clickable) to all the log files. From here, it would be easy to copy/paste that elsewhere, where I could wget the set of log files and grep through them. What has prevented me from scripting it is a foolproof way to branch down from a job id to all the constituent task ids and logs. One more thing -- the task detail page: /taskdetails.jsp?jobid=job_myidtipid=task_taskid gives links to see 4kb, 8kb, and all logs. I think it'd be nice to be able to get a link to just the stdout, stderr, and syslog portions. Most of our debugging is done by examining all of the stderr logs. Maybe it's possible to request that via URL? But I haven't found out how to in documentation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1083) Use the user-to-groups mapping service in the JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated MAPREDUCE-1083: -- Attachment: MAPREDUCE-1083-2.patch added and fixed tests to support common changes Use the user-to-groups mapping service in the JobTracker - Key: MAPREDUCE-1083 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1083 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Reporter: Arun C Murthy Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: HADOOP-4656_mr.patch, MAPREDUCE-1083-2.patch HADOOP-4656 introduces a user-to-groups mapping service on the server-side. The JobTracker should use this to map users to their groups rather than relying on the information passed by the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats - Key: MAPREDUCE-1309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Dick King There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange]. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-181) Secure job submission
[ https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-181: -- Status: Open (was: Patch Available) Secure job submission -- Key: MAPREDUCE-181 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Amar Kamat Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch, 181-4.patch, 181-5.1.patch, hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, MAPRED-181-v3.8.patch Currently the jobclient accesses the {{mapred.system.dir}} to add job details. Hence the {{mapred.system.dir}} has the permissions of {{rwx-wx-wx}}. This could be a security loophole where the job files might get overwritten/tampered after the job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-181) Secure job submission
[ https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-181: -- Attachment: 181-5.1.patch Thanks for the review, Owen. This patch addresses the concerns. I also did one more change - the JobInProgress constructor now checks whether the username in the submitted jobconf is the same as the one obtained from the UGI, and if not, fails the job submission. Ideally, we should not use conf.getUser anywhere but since it is used even in the TaskTracker code, i left it as it is but instead fail the job submission if the user string from the two sources don't match.. Secure job submission -- Key: MAPREDUCE-181 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Amar Kamat Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch, 181-4.patch, 181-5.1.patch, hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, MAPRED-181-v3.8.patch Currently the jobclient accesses the {{mapred.system.dir}} to add job details. Hence the {{mapred.system.dir}} has the permissions of {{rwx-wx-wx}}. This could be a security loophole where the job files might get overwritten/tampered after the job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-181) Secure job submission
[ https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-181: -- Status: Patch Available (was: Open) Secure job submission -- Key: MAPREDUCE-181 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Amar Kamat Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch, 181-4.patch, 181-5.1.patch, hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, MAPRED-181-v3.8.patch Currently the jobclient accesses the {{mapred.system.dir}} to add job details. Hence the {{mapred.system.dir}} has the permissions of {{rwx-wx-wx}}. This could be a security loophole where the job files might get overwritten/tampered after the job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1067) Default state of queues is undefined when unspecified
[ https://issues.apache.org/jira/browse/MAPREDUCE-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792242#action_12792242 ] Hadoop QA commented on MAPREDUCE-1067: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428275/MAPREDUCE-1067-6.patch against trunk revision 891823. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/212/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/212/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/212/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/212/console This message is automatically generated. Default state of queues is undefined when unspecified - Key: MAPREDUCE-1067 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1067 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.21.0 Reporter: V.V.Chaitanya Krishna Assignee: V.V.Chaitanya Krishna Priority: Blocker Fix For: 0.21.0 Attachments: MAPREDUCE-1067-1.patch, MAPREDUCE-1067-2.patch, MAPREDUCE-1067-3.patch, MAPREDUCE-1067-4.patch, MAPREDUCE-1067-5.patch, MAPREDUCE-1067-6.patch Currently, if the state of a queue is not specified, it is being set to undefined state instead of running state. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-181) Secure job submission
[ https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-181: -- Attachment: 181-5.1.patch Sorry, the last patch had a silly bug in the new checks i introduced. Secure job submission -- Key: MAPREDUCE-181 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Amar Kamat Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch, 181-4.patch, 181-5.1.patch, 181-5.1.patch, hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, MAPRED-181-v3.8.patch Currently the jobclient accesses the {{mapred.system.dir}} to add job details. Hence the {{mapred.system.dir}} has the permissions of {{rwx-wx-wx}}. This could be a security loophole where the job files might get overwritten/tampered after the job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1083) Use the user-to-groups mapping service in the JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-1083: --- Status: Patch Available (was: Open) Submitting the patch on behalf of Boris. Use the user-to-groups mapping service in the JobTracker - Key: MAPREDUCE-1083 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1083 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Reporter: Arun C Murthy Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: HADOOP-4656_mr.patch, MAPREDUCE-1083-2.patch, MAPREDUCE-1083-3.patch HADOOP-4656 introduces a user-to-groups mapping service on the server-side. The JobTracker should use this to map users to their groups rather than relying on the information passed by the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1311) TestStreamingExitStatus fails on hudson patch builds
[ https://issues.apache.org/jira/browse/MAPREDUCE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792330#action_12792330 ] Amareshwari Sriramadasu commented on MAPREDUCE-1311: The failure log for one of the builds is @ http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/203/testReport/org.apache.hadoop.streaming/TestStreamingExitStatus/testMapFailOk/ TestStreamingExitStatus fails on hudson patch builds Key: MAPREDUCE-1311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1311 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Amareshwari Sriramadasu TestStreamingExitStatus fails on hudson patch builds. The logs have the following error : {noformat} 09/12/16 20:30:58 INFO fs.FSInputChecker: Found checksum error: b[0, 6]=68656c6c6f0a org.apache.hadoop.fs.ChecksumException: Checksum error: file:/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h3.grid.sp2.yahoo.net/trunk/build/contrib/streaming/test/data/input.txt at 0 at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:278) at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:242) at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190) at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:180) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:206) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:191) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:376) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:257) 09/12/16 20:30:58 INFO streaming.PipeMapRed: MRErrorThread done {noformat} The same passes on my local machine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1312) TestStreamingKeyValue fails on hudson patch builds
[ https://issues.apache.org/jira/browse/MAPREDUCE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792336#action_12792336 ] Amareshwari Sriramadasu commented on MAPREDUCE-1312: The same passes on my local machine. TestStreamingKeyValue fails on hudson patch builds -- Key: MAPREDUCE-1312 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1312 Project: Hadoop Map/Reduce Issue Type: Bug Components: build, test Reporter: Amareshwari Sriramadasu TestStreamingKeyValue fails on hudson patch builds with FileNotFoundException. The failure log from one of the builds is @ http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/203/testReport/org.apache.hadoop.streaming/TestStreamingKeyValue/testCommandLine/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1009) Forrest documentation needs to be updated to describes features provided for supporting hierarchical queues
[ https://issues.apache.org/jira/browse/MAPREDUCE-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-1009: Attachment: MAPREDUCE-1009-20091217.txt I am attaching a new patch that makes some modifications: - Added a new file build-utils.xml that moves the java5.check and forrest.check targets. Imported this into the main build.xml and build-contrib.xml, thereby removing duplication of these targets in the earlier patch. - I reorganized and edited the section on mapred-queues.xml in cluster-setup documentation. Primarily, I tried to make the connection between the queues and schedulers more explicit. I also tried to classify various queue configurations a little more clearly - like single queue setup, multiple single level queue setup and hierarchical queue setup, giving descriptions of each. - Some other editorial changes - like scrubbing the example of hierarchical queue setup in mapred-queues.xml.template. Vinod, can you quickly glance at these differences and see if you are comfortable with these ? Forrest documentation needs to be updated to describes features provided for supporting hierarchical queues --- Key: MAPREDUCE-1009 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1009 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 0.21.0 Reporter: Hemanth Yamijala Assignee: Vinod K V Priority: Blocker Fix For: 0.21.0 Attachments: MAPREDUCE-1009-20091008.txt, MAPREDUCE-1009-20091116.txt, MAPREDUCE-1009-20091124.txt, MAPREDUCE-1009-20091211.txt, MAPREDUCE-1009-20091217.txt Forrest documentation must be updated for describing how to set up and use hierarchical queues in the framework and the capacity scheduler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1312) TestStreamingKeyValue fails on hudson patch builds
TestStreamingKeyValue fails on hudson patch builds -- Key: MAPREDUCE-1312 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1312 Project: Hadoop Map/Reduce Issue Type: Bug Components: build, test Reporter: Amareshwari Sriramadasu TestStreamingKeyValue fails on hudson patch builds with FileNotFoundException. The failure log from one of the builds is @ http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/203/testReport/org.apache.hadoop.streaming/TestStreamingKeyValue/testCommandLine/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1313) NPE in FieldFormatter if escape character is set and field is null
[ https://issues.apache.org/jira/browse/MAPREDUCE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-1313: - Attachment: MAPREDUCE-1313.patch Patch to fix this issue. NPE in FieldFormatter if escape character is set and field is null -- Key: MAPREDUCE-1313 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1313 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-1313.patch Performing an import with the {{\-\-escaped-by}} character set on a table with a null field will cause a NullPointerException in FieldFormatter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1313) NPE in FieldFormatter if escape character is set and field is null
NPE in FieldFormatter if escape character is set and field is null -- Key: MAPREDUCE-1313 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1313 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-1313.patch Performing an import with the {{\-\-escaped-by}} character set on a table with a null field will cause a NullPointerException in FieldFormatter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1313) NPE in FieldFormatter if escape character is set and field is null
[ https://issues.apache.org/jira/browse/MAPREDUCE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-1313: - Status: Patch Available (was: Open) NPE in FieldFormatter if escape character is set and field is null -- Key: MAPREDUCE-1313 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1313 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-1313.patch Performing an import with the {{\-\-escaped-by}} character set on a table with a null field will cause a NullPointerException in FieldFormatter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792364#action_12792364 ] Hadoop QA commented on MAPREDUCE-1302: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428349/MAPREDUCE-1302.1.patch against trunk revision 891920. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/215/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/215/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/215/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/215/console This message is automatically generated. TrackerDistributedCacheManager can delete file asynchronously - Key: MAPREDUCE-1302 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to delete files from distributed cache asynchronously. That will help make task initialization faster, because task initialization calls the code that localizes files into the cache and may delete some other files. The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1258) Fair scheduler event log not logging job info
[ https://issues.apache.org/jira/browse/MAPREDUCE-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated MAPREDUCE-1258: - Status: Patch Available (was: Open) Fair scheduler event log not logging job info - Key: MAPREDUCE-1258 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1258 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/fair-share Affects Versions: 0.21.0 Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1258-1.patch The MAPREDUCE-706 patch seems to have left an unfinished TODO in the Fair Scheduler - namely, in the dump() function for periodically dumping scheduler state to the event log, the part that dumps information about jobs is commented out. This makes the event log less useful than it was before. It should be fairly easy to update this part to use the new scheduler data structures (Schedulable etc) and print the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1258) Fair scheduler event log not logging job info
[ https://issues.apache.org/jira/browse/MAPREDUCE-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated MAPREDUCE-1258: - Attachment: mapreduce-1258-1.patch Here's a patch for this issue. I didn't include a unit test because it's a very simple fix. I'd appreciate if someone could review it! Note that the code in the patch does not print deficits, unlike the previous code, because deficits were removed as part of MAPREDUCE-706. Fair scheduler event log not logging job info - Key: MAPREDUCE-1258 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1258 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/fair-share Affects Versions: 0.21.0 Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1258-1.patch The MAPREDUCE-706 patch seems to have left an unfinished TODO in the Fair Scheduler - namely, in the dump() function for periodically dumping scheduler state to the event log, the part that dumps information about jobs is commented out. This makes the event log less useful than it was before. It should be fairly easy to update this part to use the new scheduler data structures (Schedulable etc) and print the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1284) TestLocalizationWithLinuxTaskController fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792366#action_12792366 ] Hemanth Yamijala commented on MAPREDUCE-1284: - +1 for the patch. Given the nature of the fix, and the fact that it fixes a broken test (which I verified by running manually), I think there is no need for additional tests. I will commit this. TestLocalizationWithLinuxTaskController fails - Key: MAPREDUCE-1284 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1284 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker, test Affects Versions: 0.22.0 Reporter: Ravi Gummadi Assignee: Ravi Gummadi Fix For: 0.22.0 Attachments: MR-1284.patch With current trunk, the testcase TestLocalizationWithLinuxTaskController fails with an exit code of 139 from task-controller when doing INITIALIZE_USER -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1284) TestLocalizationWithLinuxTaskController fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-1284: Resolution: Fixed Fix Version/s: (was: 0.22.0) 0.21.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I committed this to trunk and branch 0.21. Thanks, Ravi ! TestLocalizationWithLinuxTaskController fails - Key: MAPREDUCE-1284 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1284 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker, test Affects Versions: 0.22.0 Reporter: Ravi Gummadi Assignee: Ravi Gummadi Fix For: 0.21.0 Attachments: MR-1284.patch With current trunk, the testcase TestLocalizationWithLinuxTaskController fails with an exit code of 139 from task-controller when doing INITIALIZE_USER -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rahul k singh updated MAPREDUCE-1143: - Attachment: MAPRED-1143-v21.patch patch for 21 runningMapTasks counter is not properly decremented in case of failed Tasks. Key: MAPREDUCE-1143 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: rahul k singh Assignee: rahul k singh Priority: Blocker Fix For: 0.21.0 Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch, MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch, MAPRED-1143-5.patch.txt, MAPRED-1143-6.patch, MAPRED-1143-7.patch, MAPRED-1143-v21.patch, MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch, MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch, MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch, MAPRED-1143-ydist-7.patch, MAPRED-1143-ydist-8.patch.txt, MAPRED-1143-ydist-9.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.