[jira] Updated: (MAPREDUCE-1082) Command line UI for queues' information is broken with hierarchical queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V.V.Chaitanya Krishna updated MAPREDUCE-1082: - Attachment: MAPREDUCE-1082-3.patch Uploading patch with the above ideas implemented. > Command line UI for queues' information is broken with hierarchical queues. > --- > > Key: MAPREDUCE-1082 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1082 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobtracker >Affects Versions: 0.21.0 >Reporter: Vinod K V >Assignee: V.V.Chaitanya Krishna >Priority: Blocker > Fix For: 0.21.0 > > Attachments: MAPREDUCE-1082-1.txt, MAPREDUCE-1082-2.patch, > MAPREDUCE-1082-3.patch > > > When the command "./bin/mapred --config ~/tmp/conf/ queue -list" is run, it > just hangs. I can see the following in the JT logs: > {code} > 2009-10-08 13:19:26,762 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 1 on 5 caught: java.lang.NullPointerException > at org.apache.hadoop.mapreduce.QueueInfo.write(QueueInfo.java:217) > at org.apache.hadoop.mapreduce.QueueInfo.write(QueueInfo.java:223) > at > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:159) > at > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:126) > at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:70) > at org.apache.hadoop.ipc.Server.setupResponse(Server.java:1074) > at org.apache.hadoop.ipc.Server.access$2400(Server.java:77) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:983) > {code} > Same is the case with "./bin/mapred --config ~/tmp/conf/ queue -info > " -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1249) mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784103#action_12784103 ] Hadoop QA commented on MAPREDUCE-1249: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426497/patch-1249-1.txt against trunk revision 885530. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/155/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/155/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/155/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/155/console This message is automatically generated. > mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in > mapred-default.xml > > > Key: MAPREDUCE-1249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu >Priority: Blocker > Fix For: 0.21.0 > > Attachments: patch-1249-1.txt, patch-1249.txt > > > mapreduce.reduce.shuffle.read.timeout has a value of 30,000 (30 seconds) in > mapred-default.xml, whereas the default value in Fetcher code is 3 minutes. > It should be 3 minutes by default, as it was in pre MAPREDUCE-353. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1082) Command line UI for queues' information is broken with hierarchical queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V.V.Chaitanya Krishna updated MAPREDUCE-1082: - Status: Open (was: Patch Available) > Command line UI for queues' information is broken with hierarchical queues. > --- > > Key: MAPREDUCE-1082 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1082 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobtracker >Affects Versions: 0.21.0 >Reporter: Vinod K V >Assignee: V.V.Chaitanya Krishna >Priority: Blocker > Fix For: 0.21.0 > > Attachments: MAPREDUCE-1082-1.txt, MAPREDUCE-1082-2.patch > > > When the command "./bin/mapred --config ~/tmp/conf/ queue -list" is run, it > just hangs. I can see the following in the JT logs: > {code} > 2009-10-08 13:19:26,762 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 1 on 5 caught: java.lang.NullPointerException > at org.apache.hadoop.mapreduce.QueueInfo.write(QueueInfo.java:217) > at org.apache.hadoop.mapreduce.QueueInfo.write(QueueInfo.java:223) > at > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:159) > at > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:126) > at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:70) > at org.apache.hadoop.ipc.Server.setupResponse(Server.java:1074) > at org.apache.hadoop.ipc.Server.access$2400(Server.java:77) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:983) > {code} > Same is the case with "./bin/mapred --config ~/tmp/conf/ queue -info > " -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruibang He updated MAPREDUCE-1248: -- Attachment: MAPREDUCE-1248-v1.0.patch An early solution > Redundant memory copying in StreamKeyValUtil > > > Key: MAPREDUCE-1248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming >Reporter: Ruibang He >Priority: Minor > Attachments: MAPREDUCE-1248-v1.0.patch > > > I found that when MROutputThread collecting the output of Reducer, it calls > StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there > for each line of output. Later these two byte-arrays are passed to variable > key and val. There are twice memory copying here, one is the > System.arraycopy() method, the other is inside key.set() / val.set(). > This causes double times of memory copying for the whole output (may lead to > higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784098#action_12784098 ] Ruibang He commented on MAPREDUCE-1248: --- Thanks, Guanyin. The lastest trunk has fixed the problem in KeyValueLineRecordReader.java, but in StreamKeyValUtil.java this problem still exists. Patch is attached for an early solution. > Redundant memory copying in StreamKeyValUtil > > > Key: MAPREDUCE-1248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming >Reporter: Ruibang He >Priority: Minor > > I found that when MROutputThread collecting the output of Reducer, it calls > StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there > for each line of output. Later these two byte-arrays are passed to variable > key and val. There are twice memory copying here, one is the > System.arraycopy() method, the other is inside key.set() / val.set(). > This causes double times of memory copying for the whole output (may lead to > higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1256) org.apache.hadoop.mapred.TestFairScheduler.testPoolAssignment (from TestFairScheduler) is failing in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784084#action_12784084 ] Jothi Padmanabhan commented on MAPREDUCE-1256: -- Duplicate of MAPREDUCE-1245? > org.apache.hadoop.mapred.TestFairScheduler.testPoolAssignment (from > TestFairScheduler) is failing in trunk > -- > > Key: MAPREDUCE-1256 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1256 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Iyappan Srinivasan > Fix For: 0.22.0 > > > Trunk build is failing. The unit testcase that fail is: > org.apache.hadoop.mapred.TestFairScheduler.testPoolAssignment (from > TestFairScheduler) > http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/160/testReport/org.apache.hadoop.mapred/TestFairScheduler/testPoolAssignment/ > Error Message > Timeout occurred. Please note the time in the report does not reflect the > time until the timeout. > Stacktrace > junit.framework.AssertionFailedError: Timeout occurred. Please note the time > in the report does not reflect the time until the timeout -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1256) org.apache.hadoop.mapred.TestFairScheduler.testPoolAssignment (from TestFairScheduler) is failing in trunk
org.apache.hadoop.mapred.TestFairScheduler.testPoolAssignment (from TestFairScheduler) is failing in trunk -- Key: MAPREDUCE-1256 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1256 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Iyappan Srinivasan Fix For: 0.22.0 Trunk build is failing. The unit testcase that fail is: org.apache.hadoop.mapred.TestFairScheduler.testPoolAssignment (from TestFairScheduler) http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Mapreduce-trunk/160/testReport/org.apache.hadoop.mapred/TestFairScheduler/testPoolAssignment/ Error Message Timeout occurred. Please note the time in the report does not reflect the time until the timeout. Stacktrace junit.framework.AssertionFailedError: Timeout occurred. Please note the time in the report does not reflect the time until the timeout -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1255) How to write a custom input format and record reader to read multiple lines of text from files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved MAPREDUCE-1255. Resolution: Invalid Hi Kunal, JIRA is meant for issue tracking, not questions. Please email the common-user or mapreduce-user mailing list with your question. Thanks. > How to write a custom input format and record reader to read multiple lines > of text from files > -- > > Key: MAPREDUCE-1255 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1255 > Project: Hadoop Map/Reduce > Issue Type: Task >Affects Versions: 0.20.1 > Environment: Ubuntu, 32 bit system. Apache hadoop 0.20.1 >Reporter: Kunal Gupta >Priority: Minor > > Can someone explain how to override the "FileInputFormat" and "RecordReader" > in order to be able to read multiple lines of text from input files in a > single map task? > Here the key will be the offset of the first line of text and value will be > the N lines of text. > I have overridden the class FileInputFormat: > public class MultiLineFileInputFormat > extends FileInputFormat{ > ... > } > and implemented the abstract method: > public RecordReader createRecordReader(InputSplit split, > TaskAttemptContext context) > throws IOException, InterruptedException {...} > I have also overridden the recordreader class: > public class MultiLineFileRecordReader extends RecordReader Text> > {...} > and in the job configuration, specified this new InputFormat class: > job.setInputFormatClass(MultiLineFileInputFormat.class); > When I run this new map/reduce program, i get the following java error: > Exception in thread "main" java.lang.RuntimeException: > java.lang.NoSuchMethodException: > CustomRecordReader$MultiLineFileInputFormat.() > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) > at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) > at CustomRecordReader.main(CustomRecordReader.java:257) > Caused by: java.lang.NoSuchMethodException: > CustomRecordReader$MultiLineFileInputFormat.() > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getDeclaredConstructor(Class.java:1985) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109) > ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784077#action_12784077 ] Hemanth Yamijala commented on MAPREDUCE-1143: - After a discussion with Arun, I felt I might clarify little more on what I am proposing. Some details: - In TaskInProgress.java, introduce: {code} boolean isRunning(TaskAttemptID taskId) { return activeTasks.containsKey(taskId); } {code} - Modify JobInProgress.failedTask private API to have an additional parameter wasAttemptRunning, which would be initialized in JIP.updateTaskStatus to tip.isRunning(status.getTaskID()) - Use wasAttemptRunning only to update the running* counters I originally thought we can modify wasRunning to indicate if the attempt was running (rather than if the TIP was running). But after speaking with Arun, I feel we want to localize the changes to as much as possible. > runningMapTasks counter is not properly decremented in case of failed Tasks. > > > Key: MAPREDUCE-1143 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: rahul k singh >Priority: Blocker > Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch, > MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch, > MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch, > MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch, > MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1255) How to write a custom input format and record reader to read multiple lines of text from files
How to write a custom input format and record reader to read multiple lines of text from files -- Key: MAPREDUCE-1255 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1255 Project: Hadoop Map/Reduce Issue Type: Task Affects Versions: 0.20.1 Environment: Ubuntu, 32 bit system. Apache hadoop 0.20.1 Reporter: Kunal Gupta Priority: Minor Can someone explain how to override the "FileInputFormat" and "RecordReader" in order to be able to read multiple lines of text from input files in a single map task? Here the key will be the offset of the first line of text and value will be the N lines of text. I have overridden the class FileInputFormat: public class MultiLineFileInputFormat extends FileInputFormat{ ... } and implemented the abstract method: public RecordReader createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException {...} I have also overridden the recordreader class: public class MultiLineFileRecordReader extends RecordReader {...} and in the job configuration, specified this new InputFormat class: job.setInputFormatClass(MultiLineFileInputFormat.class); When I run this new map/reduce program, i get the following java error: Exception in thread "main" java.lang.RuntimeException: java.lang.NoSuchMethodException: CustomRecordReader$MultiLineFileInputFormat.() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) at CustomRecordReader.main(CustomRecordReader.java:257) Caused by: java.lang.NoSuchMethodException: CustomRecordReader$MultiLineFileInputFormat.() at java.lang.Class.getConstructor0(Class.java:2706) at java.lang.Class.getDeclaredConstructor(Class.java:1985) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-222) Shuffle should be refactored to a separate task by itself
[ https://issues.apache.org/jira/browse/MAPREDUCE-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784075#action_12784075 ] ZhuGuanyin commented on MAPREDUCE-222: -- I think it would be better if shuffle and sort phase seperate from reduce task. 1) The reschduled reduce need shuffle and sort again if the former reduce task failed in current implentation. Example, the reduce shuffle and sort phase cost a lot of time if a reduce need fetch map midoutput from 100k maps. 2) we could shuffle and sort while anothers job's or tasks' reducer running, which would maximize resource utilization. In current implentation, the reduce slots are comsumed if it is shuffle or waiting the map finished. 3) we could localized the reduce task on the tasktracker where it has shuffled. > Shuffle should be refactored to a separate task by itself > - > > Key: MAPREDUCE-222 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-222 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Devaraj Das > > Currently, shuffle phase is part of the reduce task. The idea here is to move > out the shuffle as a first-class task. This will improve the usage of the > network since we will then be able to schedule shuffle tasks independently, > and later on pin reduce tasks to those nodes. This will make most sense for > apps where there are multiple waves of reduces (the second wave of reduces > can directly start off doing the "reducer" phase). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1075) getQueue(String queue) in JobTracker would return NPE for invalid queue name
[ https://issues.apache.org/jira/browse/MAPREDUCE-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V.V.Chaitanya Krishna updated MAPREDUCE-1075: - Status: Open (was: Patch Available) > getQueue(String queue) in JobTracker would return NPE for invalid queue name > > > Key: MAPREDUCE-1075 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1075 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: V.V.Chaitanya Krishna >Assignee: V.V.Chaitanya Krishna > Fix For: 0.21.0 > > Attachments: MAPREDUCE-1075-1.patch, MAPREDUCE-1075-2.patch, > MAPREDUCE-1075-3.patch, MAPREDUCE-1075-4.patch, MAPREDUCE-1075-5.patch, > MAPREDUCE-1075-6.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1075) getQueue(String queue) in JobTracker would return NPE for invalid queue name
[ https://issues.apache.org/jira/browse/MAPREDUCE-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V.V.Chaitanya Krishna updated MAPREDUCE-1075: - Attachment: MAPREDUCE-1075-6.patch Uploading patch with the above comments implemented. > getQueue(String queue) in JobTracker would return NPE for invalid queue name > > > Key: MAPREDUCE-1075 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1075 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: V.V.Chaitanya Krishna >Assignee: V.V.Chaitanya Krishna > Fix For: 0.21.0 > > Attachments: MAPREDUCE-1075-1.patch, MAPREDUCE-1075-2.patch, > MAPREDUCE-1075-3.patch, MAPREDUCE-1075-4.patch, MAPREDUCE-1075-5.patch, > MAPREDUCE-1075-6.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1099) Setup and cleanup tasks could affect job latency if they are caught running on bad nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784067#action_12784067 ] ZhuGuanyin commented on MAPREDUCE-1099: --- We have encountered the same problem, so We just remove the setup and cleanup task (inport the patch https://issues.apache.org/jira/browse/MAPREDUCE-463 ) > Setup and cleanup tasks could affect job latency if they are caught running > on bad nodes > > > Key: MAPREDUCE-1099 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1099 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.1 >Reporter: Hemanth Yamijala > > We found cases on our clusters where a setup task got scheduled on a bad node > and took upwards of several minutes to run, adversely affecting job runtimes. > Speculation did not help here as speculation is not used for setup tasks. I > suspect the same could happen for cleanup tasks as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-353) Allow shuffle read and connection timeouts to be configurable
[ https://issues.apache.org/jira/browse/MAPREDUCE-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-353: -- Attachment: patch-353-ydist.txt Patch for Yahoo! distribution > Allow shuffle read and connection timeouts to be configurable > - > > Key: MAPREDUCE-353 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-353 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.21.0 >Reporter: Arun C Murthy >Assignee: Ravi Gummadi > Fix For: 0.21.0 > > Attachments: MR-353.patch, MR-353.v1.patch, patch-353-ydist.txt > > > It would be good for latency-sensitive applications to tune the shuffle > read/connection timeouts... in fact this made a huge difference to terasort > since we were seeing individual shuffles stuck for upwards of 60s and had to > have a very small read timeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1136) ConcurrentModificationException when tasktracker updates task status to jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Liu updated MAPREDUCE-1136: -- Attachment: MAPREDUCE-1136.patch Patch for trunk. > ConcurrentModificationException when tasktracker updates task status to > jobtracker > -- > > Key: MAPREDUCE-1136 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1136 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Qi Liu > Attachments: MAPREDUCE-1136.0.18.3.patch, MAPREDUCE-1136.patch > > > In Hadoop 0.18.3, the following exception happened during a job execution. It > does not happen often. > Here is the stack trace of the exception. > org.apache.hadoop.ipc.RemoteException: java.io.IOException: > java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) > at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145) > at > org.apache.hadoop.mapred.JobTracker.getAllJobs(JobTracker.java:2376) > at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) > org.apache.hadoop.ipc.Client.call(Client.java:716) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1185) URL to JT webconsole for running job and job history should be the same
[ https://issues.apache.org/jira/browse/MAPREDUCE-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784060#action_12784060 ] Jothi Padmanabhan commented on MAPREDUCE-1185: -- Trunk patch. Since you are already doing a remove of entries from the {{jobHistoryFileMap}} while iterating through it to handle the manual move scenario, I think the {{jobHistoryFileMap.remove()}} while deleting the history file is redundant and can be removed. > URL to JT webconsole for running job and job history should be the same > --- > > Key: MAPREDUCE-1185 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1185 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Sharad Agarwal >Assignee: Sharad Agarwal > Attachments: 1185_v1.patch, 1185_v2.patch, 1185_v3.patch, > 1185_v4.patch, 1185_v5.patch, patch-1185-1-ydist.txt, patch-1185-ydist.txt > > > The tracking url for running jobs and the jobs which are retired is > different. This creates problem for clients which caches the job running url > because soon it becomes invalid when job is retired. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1185) URL to JT webconsole for running job and job history should be the same
[ https://issues.apache.org/jira/browse/MAPREDUCE-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1185: --- Attachment: patch-1185-1-ydist.txt The test for redirection was missing in earlier Y!20 patch. Added the test in attached patch. > URL to JT webconsole for running job and job history should be the same > --- > > Key: MAPREDUCE-1185 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1185 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Sharad Agarwal >Assignee: Sharad Agarwal > Attachments: 1185_v1.patch, 1185_v2.patch, 1185_v3.patch, > 1185_v4.patch, 1185_v5.patch, patch-1185-1-ydist.txt, patch-1185-ydist.txt > > > The tracking url for running jobs and the jobs which are retired is > different. This creates problem for clients which caches the job running url > because soon it becomes invalid when job is retired. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1249) mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1249: --- Assignee: Amareshwari Sriramadasu Status: Open (was: Patch Available) > mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in > mapred-default.xml > > > Key: MAPREDUCE-1249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu >Priority: Blocker > Fix For: 0.21.0 > > Attachments: patch-1249-1.txt, patch-1249.txt > > > mapreduce.reduce.shuffle.read.timeout has a value of 30,000 (30 seconds) in > mapred-default.xml, whereas the default value in Fetcher code is 3 minutes. > It should be 3 minutes by default, as it was in pre MAPREDUCE-353. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1249) mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1249: --- Status: Patch Available (was: Open) Test failures are unrelated to the patch. Resubmitting for hudson > mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in > mapred-default.xml > > > Key: MAPREDUCE-1249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu >Priority: Blocker > Fix For: 0.21.0 > > Attachments: patch-1249-1.txt, patch-1249.txt > > > mapreduce.reduce.shuffle.read.timeout has a value of 30,000 (30 seconds) in > mapred-default.xml, whereas the default value in Fetcher code is 3 minutes. > It should be 3 minutes by default, as it was in pre MAPREDUCE-353. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1249) mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1249: --- Attachment: patch-1249-1.txt Patch updating the doc. > mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in > mapred-default.xml > > > Key: MAPREDUCE-1249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu >Priority: Blocker > Fix For: 0.21.0 > > Attachments: patch-1249-1.txt, patch-1249.txt > > > mapreduce.reduce.shuffle.read.timeout has a value of 30,000 (30 seconds) in > mapred-default.xml, whereas the default value in Fetcher code is 3 minutes. > It should be 3 minutes by default, as it was in pre MAPREDUCE-353. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1249) mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784049#action_12784049 ] Amareshwari Sriramadasu commented on MAPREDUCE-1249: It is documented that the parameters mapreduce.reduce.shuffle.connect.timeout and mapreduce.reduce.shuffle.read.timeout are cluster-wide parameters. But, actually they are job-level parameters. I think the tag "Expert" conveys the fact that users are not supposed to play with it. > mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in > mapred-default.xml > > > Key: MAPREDUCE-1249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Priority: Blocker > Fix For: 0.21.0 > > Attachments: patch-1249.txt > > > mapreduce.reduce.shuffle.read.timeout has a value of 30,000 (30 seconds) in > mapred-default.xml, whereas the default value in Fetcher code is 3 minutes. > It should be 3 minutes by default, as it was in pre MAPREDUCE-353. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1229) [Mumak] Allow customization of job submission policy
[ https://issues.apache.org/jira/browse/MAPREDUCE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784046#action_12784046 ] Hong Tang commented on MAPREDUCE-1229: -- Attached new patch that addresses the comments by Dick. bq. 1: Should TestSimulator*JobSubmission check to see whether the total "runtime" was reasonable for the Policy? Currently, each policy is tested as a separate test case. It may be hard to combine them and compare the virtual runtime, which is only present as console output. I did do some basic sanity check manually after the run. bq. 2: minor nit: Should SimulatorJobSubmissionPolicy/getPolicy(Configuration) use valueOf(policy.toUpper()) instead of looping through the types? Updated in the patch based on the suggestion. bq. 3: medium sized nit: in SimulatorJobClient.isOverloaded() there are two literals, 0.9 and 2.0F, that ought to be static private named values. Added final variables to represent the magic constants, and added comments. bq. 4: Here is my biggest point. The existing code cannot submit a job more often than once every five seconds when the jobs were spaced further apart than that and the policy is STRESS . bq. bq. Please consider adding code to call the processLoadProbingEvent core code when we processJobCompleteEvent or a processJobSubmitEvent . That includes potentially adding a new LoadProbingEvent . This can lead to an accumulation because each LoadProbingEvent replaces itself, so we should track the ones that are in flight in a PriorityQueue and only add a new LoadProbingEvent whenever the new event has a time stamp strictly earlier than the earliest one already in flight. This will limit us to two events in flight with the current adjustLoadProbingInterval . bq. bq. If you don't do that, then if a real dreadnaught of a job gets dropped into the system and the probing interval gets long it could take us a while to notice that we're okay to submit jobs, in the case where the job has many tasks finishing at about the same time, and we could submit tiny jobs as onsies every five seconds when the cluster is clear enough to accommodate lots of jobs. When the cluster can handle N jobs in less than 5N seconds for some N, we won't overload it with the existing code. I changed the minimum load probing interval to 1 seconds (from 5 seconds). Note that when a job is submitted, it could take a few seconds before JT assigns the map tasks to TTs with free map slots. So reducing this interval further could lead to artificial load spikes. I also added load checks after each job completion, and if the cluster is underloaded, we submit another job (and reset the load checking interval to the minimum value). This does bring in a potential danger when many jobs happen to complete at the same time, and inject a lot of jobs into the system. But I think such risk should be fairly low and thus would not worry much about it. > [Mumak] Allow customization of job submission policy > > > Key: MAPREDUCE-1229 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1229 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/mumak >Affects Versions: 0.21.0, 0.22.0 >Reporter: Hong Tang >Assignee: Hong Tang > Fix For: 0.21.0, 0.22.0 > > Attachments: mapreduce-1229-20091121.patch, > mapreduce-1229-20091123.patch, mapreduce-1229-20091130.patch > > > Currently, mumak replay job submission faithfully. To make mumak useful for > evaluation purposes, it would be great if we can support other job submission > policies such as sequential job submission, or stress job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1229) [Mumak] Allow customization of job submission policy
[ https://issues.apache.org/jira/browse/MAPREDUCE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-1229: - Attachment: mapreduce-1229-20091130.patch > [Mumak] Allow customization of job submission policy > > > Key: MAPREDUCE-1229 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1229 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/mumak >Affects Versions: 0.21.0, 0.22.0 >Reporter: Hong Tang >Assignee: Hong Tang > Fix For: 0.21.0, 0.22.0 > > Attachments: mapreduce-1229-20091121.patch, > mapreduce-1229-20091123.patch, mapreduce-1229-20091130.patch > > > Currently, mumak replay job submission faithfully. To make mumak useful for > evaluation purposes, it would be great if we can support other job submission > policies such as sequential job submission, or stress job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1229) [Mumak] Allow customization of job submission policy
[ https://issues.apache.org/jira/browse/MAPREDUCE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-1229: - Status: Open (was: Patch Available) > [Mumak] Allow customization of job submission policy > > > Key: MAPREDUCE-1229 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1229 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/mumak >Affects Versions: 0.21.0, 0.22.0 >Reporter: Hong Tang >Assignee: Hong Tang > Fix For: 0.21.0, 0.22.0 > > Attachments: mapreduce-1229-20091121.patch, > mapreduce-1229-20091123.patch > > > Currently, mumak replay job submission faithfully. To make mumak useful for > evaluation purposes, it would be great if we can support other job submission > policies such as sequential job submission, or stress job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1251) c++ utils doesn't compile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784035#action_12784035 ] Hadoop QA commented on MAPREDUCE-1251: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426461/MR-1251.patch against trunk revision 885530. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/281/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/281/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/281/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/281/console This message is automatically generated. > c++ utils doesn't compile > - > > Key: MAPREDUCE-1251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 > Environment: ubuntu karmic 64-bit >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch > > > c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for > HADOOP-5611 needs to be applied first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1254) job.xml should add crc check in tasktracker and sub jvm.
job.xml should add crc check in tasktracker and sub jvm. Key: MAPREDUCE-1254 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1254 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Affects Versions: 0.22.0 Reporter: ZhuGuanyin Currently job.xml in tasktracker and subjvm are write to local disk through ChecksumFilesystem, and already had crc checksum information, but load the job.xml file without crc check. It would cause the mapred job finished successful but with wrong data because of disk error. Example: The tasktracker and sub task jvm would load the default configuration if it doesn't successfully load the job.xml which maybe replace the mapper with IdentityMapper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784019#action_12784019 ] ZhuGuanyin commented on MAPREDUCE-1247: --- I agree, seperate the overtime lock method from heartbeat thread and never do i/o operations holding locks is the best solution. We had tried, but found it's not very easy to achieved and would not resolve recently, I propose a tempary solution. > Send out-of-band heartbeat to avoid fake lost tasktracker > - > > Key: MAPREDUCE-1247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: ZhuGuanyin > > Currently the TaskTracker report task status to jobtracker through heartbeat, > sometimes if the tasktracker lock the tasktracker to do some cleanup job, > like remove task temp data on disk, the heartbeat thread would hang for a > long time while waiting for the lock, so the jobtracker just thought it had > lost and would reschedule all its finished maps or un finished reduce on > other tasktrackers, we call it "fake lost tasktracker", some times it doesn't > acceptable especially when we run some large jobs. So We introduce a > out-of-band heartbeat mechanism to send an out-of-band heartbeat in that case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1251) c++ utils doesn't compile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784017#action_12784017 ] Eli Collins commented on MAPREDUCE-1251: Thanks Todd. Also tested the patch against Centos 5.4 64-bit. > c++ utils doesn't compile > - > > Key: MAPREDUCE-1251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 > Environment: ubuntu karmic 64-bit >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch > > > c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for > HADOOP-5611 needs to be applied first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1252) Shuffle deadlocks on wrong number of maps
[ https://issues.apache.org/jira/browse/MAPREDUCE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784011#action_12784011 ] Hadoop QA commented on MAPREDUCE-1252: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426479/mr-1252.patch against trunk revision 885530. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/154/console This message is automatically generated. > Shuffle deadlocks on wrong number of maps > - > > Key: MAPREDUCE-1252 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1252 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0, 0.22.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 0.21.0, 0.22.0 > > Attachments: mr-1252.patch > > > The new shuffle assumes that the number of maps is correct. The new > JobSubmitter sets the old value. Something misfires in the middle causing: > 09/12/01 00:00:15 WARN conf.Configuration: mapred.job.split.file is > deprecated. Instead, use mapreduce.job.splitfile > 09/12/01 00:00:15 WARN conf.Configuration: mapred.map.tasks is deprecated. > Instead, use mapreduce.job.maps > But my reduces got stuck at 2 maps / 12 when there were only 2 maps in the > job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1136) ConcurrentModificationException when tasktracker updates task status to jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Liu updated MAPREDUCE-1136: -- Attachment: MAPREDUCE-1136.0.18.3.patch Patch for 0.18.3 branch. > ConcurrentModificationException when tasktracker updates task status to > jobtracker > -- > > Key: MAPREDUCE-1136 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1136 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Qi Liu > Attachments: MAPREDUCE-1136.0.18.3.patch > > > In Hadoop 0.18.3, the following exception happened during a job execution. It > does not happen often. > Here is the stack trace of the exception. > org.apache.hadoop.ipc.RemoteException: java.io.IOException: > java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) > at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145) > at > org.apache.hadoop.mapred.JobTracker.getAllJobs(JobTracker.java:2376) > at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) > org.apache.hadoop.ipc.Client.call(Client.java:716) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1251) c++ utils doesn't compile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783990#action_12783990 ] Todd Lipcon commented on MAPREDUCE-1251: +1 - started a new karmic VM on ec2, verified build failure, applied MR-1251.patch, and verified a successful build. This should be committed to branch-20 as well. From the 0.20.1 tarball, I had to apply HADOOP-5612, HADOOP-5611, and MR-1251.patch before pipes would build (this same patch applies cleanly to that tarball). Doesn't look like any of those are in branch-20, but they all are necessary if we consider build failure to be critical. > c++ utils doesn't compile > - > > Key: MAPREDUCE-1251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 > Environment: ubuntu karmic 64-bit >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch > > > c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for > HADOOP-5611 needs to be applied first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1253) Making Mumak work with Capacity-Scheduler
Making Mumak work with Capacity-Scheduler - Key: MAPREDUCE-1253 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1253 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/mumak Affects Versions: 0.21.0, 0.22.0 Reporter: Anirban Dasgupta Assignee: Anirban Dasgupta Priority: Minor In order to make the capacity-scheduler work in the mumak simulation environment, we have to replace the job-initialization threads of the capacity scheduler with classes that perform event-based initialization. We propose to use aspectj to disable the threads of the JobInitializationPoller class used by the Capacity Scheduler, and then perform the corresponding initialization tasks through a simulation job-initialization class that receives periodic wake-up calls from the simulator engine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1252) Shuffle deadlocks on wrong number of maps
[ https://issues.apache.org/jira/browse/MAPREDUCE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1252: - Attachment: mr-1252.patch > Shuffle deadlocks on wrong number of maps > - > > Key: MAPREDUCE-1252 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1252 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0, 0.22.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 0.21.0, 0.22.0 > > Attachments: mr-1252.patch > > > The new shuffle assumes that the number of maps is correct. The new > JobSubmitter sets the old value. Something misfires in the middle causing: > 09/12/01 00:00:15 WARN conf.Configuration: mapred.job.split.file is > deprecated. Instead, use mapreduce.job.splitfile > 09/12/01 00:00:15 WARN conf.Configuration: mapred.map.tasks is deprecated. > Instead, use mapreduce.job.maps > But my reduces got stuck at 2 maps / 12 when there were only 2 maps in the > job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1252) Shuffle deadlocks on wrong number of maps
[ https://issues.apache.org/jira/browse/MAPREDUCE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1252: - Status: Patch Available (was: Open) > Shuffle deadlocks on wrong number of maps > - > > Key: MAPREDUCE-1252 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1252 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0, 0.22.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 0.21.0, 0.22.0 > > Attachments: mr-1252.patch > > > The new shuffle assumes that the number of maps is correct. The new > JobSubmitter sets the old value. Something misfires in the middle causing: > 09/12/01 00:00:15 WARN conf.Configuration: mapred.job.split.file is > deprecated. Instead, use mapreduce.job.splitfile > 09/12/01 00:00:15 WARN conf.Configuration: mapred.map.tasks is deprecated. > Instead, use mapreduce.job.maps > But my reduces got stuck at 2 maps / 12 when there were only 2 maps in the > job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1252) Shuffle deadlocks on wrong number of maps
Shuffle deadlocks on wrong number of maps - Key: MAPREDUCE-1252 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1252 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.21.0, 0.22.0 Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.21.0, 0.22.0 Attachments: mr-1252.patch The new shuffle assumes that the number of maps is correct. The new JobSubmitter sets the old value. Something misfires in the middle causing: 09/12/01 00:00:15 WARN conf.Configuration: mapred.job.split.file is deprecated. Instead, use mapreduce.job.splitfile 09/12/01 00:00:15 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps But my reduces got stuck at 2 maps / 12 when there were only 2 maps in the job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1222) [Mumak] We should not include nodes with numeric ips in cluster topology.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783971#action_12783971 ] Hong Tang commented on MAPREDUCE-1222: -- @dick, thanks for the comments. I am combing through the jdk code too and struggling to find the any explicit API. Finally, I found out that there are calls to directly translate ipv4/6 addresses from string literal form ot Inet{4|6}Address objects, but they are not in JDK, instead they are in sun.net.util.IPAddressUtil - which makes it unusable (unless we mandate everybody use sun jvm). What bothers me for re-implementing RFC 2372 (IPv6 address architecture) is that it is a complicated scheme and we probably need a suite of unit-tests to guarantee our implementation is correct - which sounds to me to be way beyond the scope of this jira (which should not even exist if HDFS-778 is fixed). > [Mumak] We should not include nodes with numeric ips in cluster topology. > - > > Key: MAPREDUCE-1222 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1222 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/mumak >Affects Versions: 0.21.0, 0.22.0 >Reporter: Hong Tang >Assignee: Hong Tang > Fix For: 0.21.0, 0.22.0 > > Attachments: IPv6-predicate.patch, mapreduce-1222-20091119.patch, > mapreduce-1222-20091121.patch > > > Rumen infers cluster topology by parsing input split locations from job > history logs. Due to HDFS-778, a cluster node may appear both as a numeric ip > or as a host name in job history logs. We should exclude nodes appeared as > numeric ips in cluster toplogy when we run mumak until a solution is found so > that numeric ips would never appear in input split locations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1222) [Mumak] We should not include nodes with numeric ips in cluster topology.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783962#action_12783962 ] Dick King commented on MAPREDUCE-1222: -- After I wrote my comment of 24/Nov/09 07:59 PM , I looked at the Java API because I came to wonder whether unescaping and using the Java API could be made to work by itself. I did look for alternatives before I created my big regular expression. The big problem is that Java doesn't really present any API that distinguishes numeric IP addresses from symbolic addresses. Although InetAddress.getByName(String) must have some means of parsing an IPV4 and IPV6 literal numeric address, this functionality is not presented to java.net.* users. InetAddress.getByName(String) will parse either a numeric address or a symbolic name and produce indistinguishable results. That piece of the API does not give us a means to distinguish the two. I was unable to find any other API that did make the distinction. The formats of numeric literal IPV4 and IPV6 internet addresses are fixed in RFCs and are extremely unlikely to be changed in the foreseeable future. We are therefore not exposed to any non-future-proofing. The only exposure we have is a possible future IPV8, but the ICANN is doing its best to make that unnecessary for a very long time. Considering that Apache already owns this regular expression we should consider using it. I considered the simpler approach of considering any address that contains a colon character to be a numeric IPV6 address, but colons are used as other punctuation, ie., separation between IP address and port number. That solution felt to me to be too brittle and accident-prone, and doesn't solve the IPV8 problem. There is a continuum of IPV6 solutions ranging from "look for a colon" to the correct regular expression you see here, and no principled way to decide where to stop. > [Mumak] We should not include nodes with numeric ips in cluster topology. > - > > Key: MAPREDUCE-1222 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1222 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/mumak >Affects Versions: 0.21.0, 0.22.0 >Reporter: Hong Tang >Assignee: Hong Tang > Fix For: 0.21.0, 0.22.0 > > Attachments: IPv6-predicate.patch, mapreduce-1222-20091119.patch, > mapreduce-1222-20091121.patch > > > Rumen infers cluster topology by parsing input split locations from job > history logs. Due to HDFS-778, a cluster node may appear both as a numeric ip > or as a host name in job history logs. We should exclude nodes appeared as > numeric ips in cluster toplogy when we run mumak until a solution is found so > that numeric ips would never appear in input split locations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1251) c++ utils doesn't compile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated MAPREDUCE-1251: --- Status: Patch Available (was: Open) > c++ utils doesn't compile > - > > Key: MAPREDUCE-1251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 > Environment: ubuntu karmic 64-bit >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch > > > c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for > HADOOP-5611 needs to be applied first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1251) c++ utils doesn't compile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated MAPREDUCE-1251: --- Status: Open (was: Patch Available) > c++ utils doesn't compile > - > > Key: MAPREDUCE-1251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 > Environment: ubuntu karmic 64-bit >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch > > > c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for > HADOOP-5611 needs to be applied first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1222) [Mumak] We should not include nodes with numeric ips in cluster topology.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783948#action_12783948 ] Hong Tang commented on MAPREDUCE-1222: -- @dick, thanks for the help. The solution is more complex than I would like to have in solving the trivial problem this jira represents. I suggest we go for a less efficient, but more direct and thus easier-to-maintain solution: unescaping the dots and rely on java's IPv4 or IPv6 parsing code. > [Mumak] We should not include nodes with numeric ips in cluster topology. > - > > Key: MAPREDUCE-1222 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1222 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/mumak >Affects Versions: 0.21.0, 0.22.0 >Reporter: Hong Tang >Assignee: Hong Tang > Fix For: 0.21.0, 0.22.0 > > Attachments: IPv6-predicate.patch, mapreduce-1222-20091119.patch, > mapreduce-1222-20091121.patch > > > Rumen infers cluster topology by parsing input split locations from job > history logs. Due to HDFS-778, a cluster node may appear both as a numeric ip > or as a host name in job history logs. We should exclude nodes appeared as > numeric ips in cluster toplogy when we run mumak until a solution is found so > that numeric ips would never appear in input split locations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Moved: (MAPREDUCE-1251) c++ utils doesn't compile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins moved HDFS-790 to MAPREDUCE-1251: - Affects Version/s: (was: 0.22.0) (was: 0.20.2) (was: 0.20.1) (was: 0.21.0) 0.22.0 0.21.0 0.20.2 0.20.1 Key: MAPREDUCE-1251 (was: HDFS-790) Project: Hadoop Map/Reduce (was: Hadoop HDFS) > c++ utils doesn't compile > - > > Key: MAPREDUCE-1251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 > Environment: ubuntu karmic 64-bit >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch > > > c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for > HADOOP-5611 needs to be applied first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1251) c++ utils doesn't compile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated MAPREDUCE-1251: --- Attachment: MR-1251.patch Patch for trunk, against MR where pipes lives. > c++ utils doesn't compile > - > > Key: MAPREDUCE-1251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 > Environment: ubuntu karmic 64-bit >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch > > > c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for > HADOOP-5611 needs to be applied first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783899#action_12783899 ] Arun C Murthy commented on MAPREDUCE-1247: -- I agree with Todd, we should *never* do any i/o operation holding locks... we've added the taskcleanup thread long ago for precisely the same reason. It is quite possible that we've since violated that - we should fix the primary cause rather than hide it with out-of-band heartbeats. > Send out-of-band heartbeat to avoid fake lost tasktracker > - > > Key: MAPREDUCE-1247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: ZhuGuanyin > > Currently the TaskTracker report task status to jobtracker through heartbeat, > sometimes if the tasktracker lock the tasktracker to do some cleanup job, > like remove task temp data on disk, the heartbeat thread would hang for a > long time while waiting for the lock, so the jobtracker just thought it had > lost and would reschedule all its finished maps or un finished reduce on > other tasktrackers, we call it "fake lost tasktracker", some times it doesn't > acceptable especially when we run some large jobs. So We introduce a > out-of-band heartbeat mechanism to send an out-of-band heartbeat in that case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-698) Per-pool task limits for the fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Peterson updated MAPREDUCE-698: - Attachment: mapreduce-698-trunk-3.patch changes from previous patch: - extra newline in fairscheduler.java removed - removed "single test" changes from build-contrib.xml (they didn't accomplish what I wanted -- to run just a single test method) - Regarding checkAssignment, I made the change you suggested, but I'm not sure I'm testing things in the best way. The only thing I'm concerned with is that it ends up scheduling the right number from each pool, the only way I was able to get it to actually assign the jobs was to use checkAssignment. - in the UI, labels are "Max Share" - Removed Pool.numRunningTasks since it was only used from within PoolSchedulable, where this data is already available. - Moved cap from getDemand() to updateDemand(). - Documentation updated - Removed tabs. > Per-pool task limits for the fair scheduler > --- > > Key: MAPREDUCE-698 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-698 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia > Fix For: 0.21.0 > > Attachments: MAPREDUCE-698-prelim.patch, mapreduce-698-trunk-3.patch, > mapreduce-698-trunk.patch, mapreduce-698-trunk.patch > > > The fair scheduler could use a way to cap the share of a given pool similar > to MAPREDUCE-532. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1218) Collecting cpu and memory usage for TaskTrackers
[ https://issues.apache.org/jira/browse/MAPREDUCE-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783846#action_12783846 ] Hadoop QA commented on MAPREDUCE-1218: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426171/MAPREDUCE-1218-rename.sh against trunk revision 885530. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/280/console This message is automatically generated. > Collecting cpu and memory usage for TaskTrackers > > > Key: MAPREDUCE-1218 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1218 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Affects Versions: 0.22.0 > Environment: linux >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1218-rename.sh, MAPREDUCE-1218.patch > > > The information can be used for resource aware scheduling. > Note that this is related to MAPREDUCE-220. There the per task resource > information is collected. > This one collects the per machine information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1218) Collecting cpu and memory usage for TaskTrackers
[ https://issues.apache.org/jira/browse/MAPREDUCE-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1218: -- Fix Version/s: 0.22.0 Affects Version/s: 0.22.0 Status: Patch Available (was: Open) > Collecting cpu and memory usage for TaskTrackers > > > Key: MAPREDUCE-1218 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1218 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Affects Versions: 0.22.0 > Environment: linux >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1218-rename.sh, MAPREDUCE-1218.patch > > > The information can be used for resource aware scheduling. > Note that this is related to MAPREDUCE-220. There the per task resource > information is collected. > This one collects the per machine information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1119) When tasks fail to report status, show tasks's stack dump before killing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783830#action_12783830 ] Aaron Kimball commented on MAPREDUCE-1119: -- * GridMix has been sporadically failing for a while now in Hudson, but not in a deterministic fashion. Cannot reproduce locally. * The fair scheduler test times out on both my trunk and MAPREDUCE-1119 branches locally. MAPREDUCE-1245? * TestJobHistory passes locally both on my trunk and MAPREDUCE-1119 branches. > When tasks fail to report status, show tasks's stack dump before killing > > > Key: MAPREDUCE-1119 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Aaron Kimball > Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.3.patch, > MAPREDUCE-1119.4.patch, MAPREDUCE-1119.5.patch, MAPREDUCE-1119.6.patch, > MAPREDUCE-1119.patch > > > When the TT kills tasks that haven't reported status, it should somehow > gather a stack dump for the task. This could be done either by sending a > SIGQUIT (so the dump ends up in stdout) or perhaps something like JDI to > gather the stack directly from Java. This may be somewhat tricky since the > child may be running as another user (so the SIGQUIT would have to go through > LinuxTaskController). This feature would make debugging these kinds of > failures much easier, especially if we could somehow get it into the > TaskDiagnostic message -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (MAPREDUCE-1078) Unit test for zero map jobs and killed jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-1078 started by Anirban Dasgupta. > Unit test for zero map jobs and killed jobs > --- > > Key: MAPREDUCE-1078 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1078 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Affects Versions: 0.21.0, 0.22.0 >Reporter: Anirban Dasgupta >Assignee: Anirban Dasgupta > > Adding unit test for zero map jobs and killed jobs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1224) Calling "SELECT t.* from AS t" to get meta information is too expensive for big tables
[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783810#action_12783810 ] Aaron Kimball commented on MAPREDUCE-1224: -- Good to know that this works with SQL Server as well. Thanks for the patch. > Calling "SELECT t.* from AS t" to get meta information is too > expensive for big tables > -- > > Key: MAPREDUCE-1224 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/sqoop >Affects Versions: 0.20.1 > Environment: all platforms, generic jdbc driver >Reporter: Spencer Ho > Attachments: MAPREDUCE-1224.patch, SqlManager.java > > > The SqlManager uses the query, "SELECT t.* from AS t" to get table > spec is too expensive for big tables, and it was called twice to generate > column names and types. For tables that are big enough to be map-reduced, > this is too expensive to make sqoop useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783801#action_12783801 ] Hudson commented on MAPREDUCE-1140: --- Integrated in Hadoop-Mapreduce-trunk-Commit #138 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/138/]) > Per cache-file refcount can become negative when tasks release > distributed-cache files > -- > > Key: MAPREDUCE-1140 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Vinod K V >Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1140-1.txt, patch-1140-2-ydist.txt, > patch-1140-2.txt, patch-1140-3.txt, patch-1140-ydist.txt, > patch-1140-ydist.txt, patch-1140.txt > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1231) Distcp is very slow
[ https://issues.apache.org/jira/browse/MAPREDUCE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783802#action_12783802 ] Hudson commented on MAPREDUCE-1231: --- Integrated in Hadoop-Mapreduce-trunk-Commit #138 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/138/]) . Added a new DistCp option, -skipcrccheck, so that the CRC check during setup can be skipped. Contributed by Jothi Padmanabhan > Distcp is very slow > --- > > Key: MAPREDUCE-1231 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1231 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.20.1 >Reporter: Jothi Padmanabhan >Assignee: Jothi Padmanabhan > Fix For: 0.22.0 > > Attachments: mapred-1231-v1.patch, mapred-1231-v2.patch, > mapred-1231-v3.patch, mapred-1231-v3.patch, mapred-1231-y20-v2.patch, > mapred-1231-y20-v3.patch, mapred-1231-y20-v4.patch, mapred-1231-y20.patch, > mapred-1231.patch > > > Currently distcp does a checksums check in addition to file length check to > decide if a remote file has to be copied. If the number of files is high > (thousands), this checksum check is proving to be fairly costly leading to a > long time before the copy is started. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1250) Refactor job token to use a common token interface
Refactor job token to use a common token interface -- Key: MAPREDUCE-1250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security Reporter: Kan Zhang Assignee: Kan Zhang The idea is to use a common token interface for both job token and delegation token (HADOOP-6373) so that the RPC layer that uses them don't have to differentiate them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1231) Distcp is very slow
[ https://issues.apache.org/jira/browse/MAPREDUCE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-1231: -- Resolution: Fixed Fix Version/s: 0.22.0 Status: Resolved (was: Patch Available) I have committed this. Thanks, Jothi! > Distcp is very slow > --- > > Key: MAPREDUCE-1231 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1231 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.20.1 >Reporter: Jothi Padmanabhan >Assignee: Jothi Padmanabhan > Fix For: 0.22.0 > > Attachments: mapred-1231-v1.patch, mapred-1231-v2.patch, > mapred-1231-v3.patch, mapred-1231-v3.patch, mapred-1231-y20-v2.patch, > mapred-1231-y20-v3.patch, mapred-1231-y20-v4.patch, mapred-1231-y20.patch, > mapred-1231.patch > > > Currently distcp does a checksums check in addition to file length check to > decide if a remote file has to be copied. If the number of files is high > (thousands), this checksum check is proving to be fairly costly leading to a > long time before the copy is started. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1224) Calling "SELECT t.* from AS t" to get meta information is too expensive for big tables
[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783776#action_12783776 ] Spencer Ho commented on MAPREDUCE-1224: --- @Aaron, This particular case that triggered the patch submission is for Microsoft SQL Server. For MySQL, I am using direct mode which works for most of the cases. > Calling "SELECT t.* from AS t" to get meta information is too > expensive for big tables > -- > > Key: MAPREDUCE-1224 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/sqoop >Affects Versions: 0.20.1 > Environment: all platforms, generic jdbc driver >Reporter: Spencer Ho > Attachments: MAPREDUCE-1224.patch, SqlManager.java > > > The SqlManager uses the query, "SELECT t.* from AS t" to get table > spec is too expensive for big tables, and it was called twice to generate > column names and types. For tables that are big enough to be map-reduced, > this is too expensive to make sqoop useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-763) Capacity scheduler should clean up reservations if it runs tasks on nodes other than where it has made reservations
[ https://issues.apache.org/jira/browse/MAPREDUCE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783702#action_12783702 ] rahul k singh commented on MAPREDUCE-763: - Approach : For every job with higher memory requirement we reserve a TT , now if a task from the job is assigned to a unreserved tasktracker, we simply remove a tasktracker from the existing reserved TT list. This would make sure that we do not reserve more than required. JobInProgress maintains the list of TTs reserved for map and reduce, we can use the same , TT removed will just be first element in the list. The above solution has a small starvation issue incase of speculative tasks.That is because speculative tasks cannot run on certain kind of TTs. For example: There is a job with 3 tips namely tip1 , tip2 , tip3. for attempt tip1_1 we have reserved TT1. for attempt tip2_1 we have reserved TT2. for attempt tip3_1 we have reserved TT3. In the above case if tip1_1 is assigned to TT4 then we simply unreserve TT1 as it is at the top of list. Now if there is second attempt tip1_2 for tip1 , and if it cannot run on TT2 or TT3, there can be slight starvation for tip1_2 as it has to wait till it gets a TT where it could run.But this is kind of ok , as this is comparatively remote case and list of TTs is a dynamic structure. In order to make above work correctly we would need attempt id , reserved tt mapping(Any other suggestions?) . This would require some significant changes. The above approach would also make sure that code changes are simple and straight forward . It definitly alleviate the current situation, where chances of going overboard with reservation is relatively higher. > Capacity scheduler should clean up reservations if it runs tasks on nodes > other than where it has made reservations > --- > > Key: MAPREDUCE-763 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-763 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/capacity-sched >Affects Versions: 0.21.0 >Reporter: Hemanth Yamijala >Assignee: rahul k singh > Fix For: 0.21.0 > > > Currently capacity scheduler makes a reservation on nodes for high memory > jobs that cannot currently run at the time. It could happen that in the > meantime other tasktrackers become free to run the tasks of this job. Ideally > in the next heartbeat from the reserved TTs the reservation should be > removed. Otherwise it could unnecessarily block capacity for a while (until > the TT has enough slots free to run a task of this job). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1249) mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783650#action_12783650 ] Hadoop QA commented on MAPREDUCE-1249: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426393/patch-1249.txt against trunk revision 884832. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/279/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/279/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/279/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/279/console This message is automatically generated. > mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in > mapred-default.xml > > > Key: MAPREDUCE-1249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Priority: Blocker > Fix For: 0.21.0 > > Attachments: patch-1249.txt > > > mapreduce.reduce.shuffle.read.timeout has a value of 30,000 (30 seconds) in > mapred-default.xml, whereas the default value in Fetcher code is 3 minutes. > It should be 3 minutes by default, as it was in pre MAPREDUCE-353. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783643#action_12783643 ] Iyappan Srinivasan commented on MAPREDUCE-1140: --- | Regarding the tests, I spoke offline to Amarsri to understand the scenario executed by Karthikeyan in the comment above. It was not very clear why file1 was being added twice. Some more details on configuration - that it was run on a single node, max failures was set to 1 should be documented for better understanding. - Configuration : The tests run on a single node running JT/NN and another node running TT/DN. map.max.attempts is set to 1 and reduce.max.attempts is also set to 1 and local.cache.size is set to 4 GB and mapred.local.dir is set to only 1 spindle and not all the spindles. This is done to force the TT to localize in the same path and try deleting the localcaheFiles when the size exceeds 4 GB. The idea of the above testcase is : Ran Job1 with cache files file1 and file2 - Job succeeded. Ran Job2 with cache files file3 and file1. When file3 is getting localized, removed file3 from dfs - Job2 failed. - Here since file3 is deleted, the reference count of file1 should not be decremented twice(once during setup and once during cleanup).Thats the objective of this scenario. Ran Job3 with cache files file1, file1(again) and file4. file4 is huge (say 5GB), larger than local.cache.size. - To make sure that the decrement happened properly, file1 is added twice. When file4 is added, which is more than local cache size, other files like file2 and file3 ( which were used in the previous jobs) gets deleted but not file1 (because it had reference count proper ). | In order to match the regressions tests in trunk, I would suggest we also have in Job 2 a file, say file5, which we should verify is not even localized (because file3 fails localization). Then we can include file 5 in Job 3 and make sure localization happens successfully. This scenario is tested and the localization happens successfully. > Per cache-file refcount can become negative when tasks release > distributed-cache files > -- > > Key: MAPREDUCE-1140 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Vinod K V >Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1140-1.txt, patch-1140-2-ydist.txt, > patch-1140-2.txt, patch-1140-3.txt, patch-1140-ydist.txt, > patch-1140-ydist.txt, patch-1140.txt > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-896) Users can set non-writable permissions on temporary files for TT and can abuse disk usage.
[ https://issues.apache.org/jira/browse/MAPREDUCE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-896: --- Attachment: MR-896.v1.patch Updated the patch so that it appies to current trunk. Please review and provide your comments. > Users can set non-writable permissions on temporary files for TT and can > abuse disk usage. > -- > > Key: MAPREDUCE-896 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-896 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Vinod K V >Assignee: Ravi Gummadi > Fix For: 0.21.0 > > Attachments: MR-896.patch, MR-896.v1.patch > > > As of now, irrespective of the TaskController in use, TT itself does a full > delete on local files created by itself or job tasks. This step, depending > upon TT's umask and the permissions set by files by the user, for e.g in > job-work/task-work or child.tmp directories, may or may not go through > successful completion fully. Thus is left an opportunity for abusing disk > space usage either accidentally or intentionally by TT/users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783614#action_12783614 ] ZhuGuanyin commented on MAPREDUCE-1248: --- the same thing happenes in KeyValueLineRecordReader.java, when it calles the next() method. > Redundant memory copying in StreamKeyValUtil > > > Key: MAPREDUCE-1248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming >Reporter: Ruibang He >Priority: Minor > > I found that when MROutputThread collecting the output of Reducer, it calls > StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there > for each line of output. Later these two byte-arrays are passed to variable > key and val. There are twice memory copying here, one is the > System.arraycopy() method, the other is inside key.set() / val.set(). > This causes double times of memory copying for the whole output (may lead to > higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-1140: Resolution: Fixed Fix Version/s: 0.22.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I verified TestJobRetire passes on my machine with the patch as well. Hence, I committed this to trunk. Thanks, Amareshwari ! > Per cache-file refcount can become negative when tasks release > distributed-cache files > -- > > Key: MAPREDUCE-1140 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Vinod K V >Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1140-1.txt, patch-1140-2-ydist.txt, > patch-1140-2.txt, patch-1140-3.txt, patch-1140-ydist.txt, > patch-1140-ydist.txt, patch-1140.txt > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783604#action_12783604 ] ZhuGuanyin commented on MAPREDUCE-1247: --- We could make the out-of-band heartbeat thread in tasktracker as optionally(default not start the thread through a configurable parameter), small cluster (running small jobs) are not needed. The additional thread is very usefull for the cluster running large jobs. Our Product hadoop cluster became more Robustness and never fake-lost-tasktracker any more, I would attach the patch if someone interested. > Send out-of-band heartbeat to avoid fake lost tasktracker > - > > Key: MAPREDUCE-1247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: ZhuGuanyin > > Currently the TaskTracker report task status to jobtracker through heartbeat, > sometimes if the tasktracker lock the tasktracker to do some cleanup job, > like remove task temp data on disk, the heartbeat thread would hang for a > long time while waiting for the lock, so the jobtracker just thought it had > lost and would reschedule all its finished maps or un finished reduce on > other tasktrackers, we call it "fake lost tasktracker", some times it doesn't > acceptable especially when we run some large jobs. So We introduce a > out-of-band heartbeat mechanism to send an out-of-band heartbeat in that case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783593#action_12783593 ] Ruibang He commented on MAPREDUCE-1248: --- I suggest to remove the two local byte-arrays, and replace the following code: key.set(keyBytes); val.set(valBytes); with: key.set(utf, start, keyLen); val.set(utf, splitPos+separatorLength, valLen); I have simply tested the above in my cluster. It works and the momery stops keeping going up. Any thoughts? > Redundant memory copying in StreamKeyValUtil > > > Key: MAPREDUCE-1248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming >Reporter: Ruibang He >Priority: Minor > > I found that when MROutputThread collecting the output of Reducer, it calls > StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there > for each line of output. Later these two byte-arrays are passed to variable > key and val. There are twice memory copying here, one is the > System.arraycopy() method, the other is inside key.set() / val.set(). > This causes double times of memory copying for the whole output (may lead to > higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783591#action_12783591 ] Ruibang He commented on MAPREDUCE-1248: --- I often observed the memory consumption in the reduce phase of Reducers go up to heap limit and fall down repeatly. This phenomenon is often caused by frequent temporay object allocation. This is an impact to performance, regarding GC has to keep working constantly. > Redundant memory copying in StreamKeyValUtil > > > Key: MAPREDUCE-1248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming >Reporter: Ruibang He >Priority: Minor > > I found that when MROutputThread collecting the output of Reducer, it calls > StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there > for each line of output. Later these two byte-arrays are passed to variable > key and val. There are twice memory copying here, one is the > System.arraycopy() method, the other is inside key.set() / val.set(). > This causes double times of memory copying for the whole output (may lead to > higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1249) mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1249: --- Status: Patch Available (was: Open) > mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in > mapred-default.xml > > > Key: MAPREDUCE-1249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Priority: Blocker > Fix For: 0.21.0 > > Attachments: patch-1249.txt > > > mapreduce.reduce.shuffle.read.timeout has a value of 30,000 (30 seconds) in > mapred-default.xml, whereas the default value in Fetcher code is 3 minutes. > It should be 3 minutes by default, as it was in pre MAPREDUCE-353. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1249) mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1249: --- Attachment: patch-1249.txt Patch changing the value in mapred-default.xml > mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in > mapred-default.xml > > > Key: MAPREDUCE-1249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Priority: Blocker > Fix For: 0.21.0 > > Attachments: patch-1249.txt > > > mapreduce.reduce.shuffle.read.timeout has a value of 30,000 (30 seconds) in > mapred-default.xml, whereas the default value in Fetcher code is 3 minutes. > It should be 3 minutes by default, as it was in pre MAPREDUCE-353. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1249) mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in mapred-default.xml
mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in mapred-default.xml Key: MAPREDUCE-1249 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1249 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.21.0 Reporter: Amareshwari Sriramadasu Priority: Blocker Fix For: 0.21.0 mapreduce.reduce.shuffle.read.timeout has a value of 30,000 (30 seconds) in mapred-default.xml, whereas the default value in Fetcher code is 3 minutes. It should be 3 minutes by default, as it was in pre MAPREDUCE-353. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
Redundant memory copying in StreamKeyValUtil Key: MAPREDUCE-1248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming Reporter: Ruibang He Priority: Minor I found that when MROutputThread collecting the output of Reducer, it calls StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there for each line of output. Later these two byte-arrays are passed to variable key and val. There are twice memory copying here, one is the System.arraycopy() method, the other is inside key.set() / val.set(). This causes double times of memory copying for the whole output (may lead to higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1082) Command line UI for queues' information is broken with hierarchical queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783579#action_12783579 ] Hemanth Yamijala commented on MAPREDUCE-1082: - Have a few comments on this patch: - I don't know why readFields in JobQueueInfo needs to be overridden. In all the API, it is QueueInfo objects that are transferred over IPC, and I think it should remain that way. - I think the test case needs to be an end-to-end test by definition, as the fix is both in JobQueueInfo as well as in the JobTracker, where when we translate JobQueueInfos to QueueInfos, we fix the translation by walking the entire hierarchy. I would suggest a test that brings up a MiniMRCluster with hierarchical queues, submits a job to one of the queues and calls Cluster.getRootQueues and verifies the returned QueueInfo information. We might need a package private JobTracker.setQueueManager to enable setting up hierarchical queues with a miniMRCluster. > Command line UI for queues' information is broken with hierarchical queues. > --- > > Key: MAPREDUCE-1082 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1082 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobtracker >Affects Versions: 0.21.0 >Reporter: Vinod K V >Assignee: V.V.Chaitanya Krishna >Priority: Blocker > Fix For: 0.21.0 > > Attachments: MAPREDUCE-1082-1.txt, MAPREDUCE-1082-2.patch > > > When the command "./bin/mapred --config ~/tmp/conf/ queue -list" is run, it > just hangs. I can see the following in the JT logs: > {code} > 2009-10-08 13:19:26,762 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 1 on 5 caught: java.lang.NullPointerException > at org.apache.hadoop.mapreduce.QueueInfo.write(QueueInfo.java:217) > at org.apache.hadoop.mapreduce.QueueInfo.write(QueueInfo.java:223) > at > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:159) > at > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:126) > at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:70) > at org.apache.hadoop.ipc.Server.setupResponse(Server.java:1074) > at org.apache.hadoop.ipc.Server.access$2400(Server.java:77) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:983) > {code} > Same is the case with "./bin/mapred --config ~/tmp/conf/ queue -info > " -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-763) Capacity scheduler should clean up reservations if it runs tasks on nodes other than where it has made reservations
[ https://issues.apache.org/jira/browse/MAPREDUCE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreekanth Ramakrishnan reassigned MAPREDUCE-763: Assignee: rahul k singh (was: Sreekanth Ramakrishnan) > Capacity scheduler should clean up reservations if it runs tasks on nodes > other than where it has made reservations > --- > > Key: MAPREDUCE-763 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-763 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/capacity-sched >Affects Versions: 0.21.0 >Reporter: Hemanth Yamijala >Assignee: rahul k singh > Fix For: 0.21.0 > > > Currently capacity scheduler makes a reservation on nodes for high memory > jobs that cannot currently run at the time. It could happen that in the > meantime other tasktrackers become free to run the tasks of this job. Ideally > in the next heartbeat from the reserved TTs the reservation should be > removed. Otherwise it could unnecessarily block capacity for a while (until > the TT has enough slots free to run a task of this job). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1185) URL to JT webconsole for running job and job history should be the same
[ https://issues.apache.org/jira/browse/MAPREDUCE-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783576#action_12783576 ] Amareshwari Sriramadasu commented on MAPREDUCE-1185: test-patch and ant test passed on Y!20 patch, except TestHdfsProxy. > URL to JT webconsole for running job and job history should be the same > --- > > Key: MAPREDUCE-1185 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1185 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Sharad Agarwal >Assignee: Sharad Agarwal > Attachments: 1185_v1.patch, 1185_v2.patch, 1185_v3.patch, > 1185_v4.patch, 1185_v5.patch, patch-1185-ydist.txt > > > The tracking url for running jobs and the jobs which are retired is > different. This creates problem for clients which caches the job running url > because soon it becomes invalid when job is retired. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783574#action_12783574 ] ZhuGuanyin commented on MAPREDUCE-1247: --- The taskCleanup thread lock the TaskTracker when it call MapOutputFile.removeAll() through TaskTracker.purgeTask() to cleanup a task or TaskTracker.purgeJob() to cleanup a job, if the midoutput file larger than 50GB, and there some other io operations on this disk, it would hold the tasktracker lock for a long time enough to let the jobtracker treat this tasktracker as dead. I think the current heartbeat thread has to handle too many things which doesn't its duty. the deadlock in tasktracker currently may still happen and may not be found in current implentition. And I don't think it is the hearbeat's duty to found the deadlock in tasktracker. > Send out-of-band heartbeat to avoid fake lost tasktracker > - > > Key: MAPREDUCE-1247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: ZhuGuanyin > > Currently the TaskTracker report task status to jobtracker through heartbeat, > sometimes if the tasktracker lock the tasktracker to do some cleanup job, > like remove task temp data on disk, the heartbeat thread would hang for a > long time while waiting for the lock, so the jobtracker just thought it had > lost and would reschedule all its finished maps or un finished reduce on > other tasktrackers, we call it "fake lost tasktracker", some times it doesn't > acceptable especially when we run some large jobs. So We introduce a > out-of-band heartbeat mechanism to send an out-of-band heartbeat in that case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783569#action_12783569 ] Todd Lipcon commented on MAPREDUCE-1247: My worry about a heartbeat thread that's entirely disconnected from the operation of the TT is that there are certain cases where the TT is "as good as dead" but not actually dead. For example, if the TT's in a deadlocked state, your "true heartbeat" would continue to function, whereas the TT is not healthy and should be considered dead. I agree that the optimal system would separate these things, and provide some kind of health check interface to ensure that the service is actually getting work done. For a more achievable short term goal, I think deferring these slow operations to other threads is the safer route. Admittedly I don't work on the guts of this part of the system much, so will defer now to those that do. > Send out-of-band heartbeat to avoid fake lost tasktracker > - > > Key: MAPREDUCE-1247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: ZhuGuanyin > > Currently the TaskTracker report task status to jobtracker through heartbeat, > sometimes if the tasktracker lock the tasktracker to do some cleanup job, > like remove task temp data on disk, the heartbeat thread would hang for a > long time while waiting for the lock, so the jobtracker just thought it had > lost and would reschedule all its finished maps or un finished reduce on > other tasktrackers, we call it "fake lost tasktracker", some times it doesn't > acceptable especially when we run some large jobs. So We introduce a > out-of-band heartbeat mechanism to send an out-of-band heartbeat in that case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783568#action_12783568 ] Amareshwari Sriramadasu commented on MAPREDUCE-1140: test-patch result for 0.20 patch : {noformat} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. {noformat} -1 tests included. It is difficult to port the testcase to 0.20, because the code in 0.20 is all static methods. All unit tests passed on my machine except TestHdfsProxy > Per cache-file refcount can become negative when tasks release > distributed-cache files > -- > > Key: MAPREDUCE-1140 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Vinod K V >Assignee: Amareshwari Sriramadasu > Attachments: patch-1140-1.txt, patch-1140-2-ydist.txt, > patch-1140-2.txt, patch-1140-3.txt, patch-1140-ydist.txt, > patch-1140-ydist.txt, patch-1140.txt > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783566#action_12783566 ] ZhuGuanyin commented on MAPREDUCE-1247: --- The out-of-band heartbeat thread (or we could call it the true heartbeat thread) only send tasktracker's name to jobtracker, and the jobtracker just update it's last seen time, we could add a new interface to InterTrackerProtocol, so it doesn't add a lot of confusion or complexity. > Send out-of-band heartbeat to avoid fake lost tasktracker > - > > Key: MAPREDUCE-1247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: ZhuGuanyin > > Currently the TaskTracker report task status to jobtracker through heartbeat, > sometimes if the tasktracker lock the tasktracker to do some cleanup job, > like remove task temp data on disk, the heartbeat thread would hang for a > long time while waiting for the lock, so the jobtracker just thought it had > lost and would reschedule all its finished maps or un finished reduce on > other tasktrackers, we call it "fake lost tasktracker", some times it doesn't > acceptable especially when we run some large jobs. So We introduce a > out-of-band heartbeat mechanism to send an out-of-band heartbeat in that case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783567#action_12783567 ] Todd Lipcon commented on MAPREDUCE-1247: Any chance you have job jars that contain many thousands of classes? MAPREDUCE-967 may help with the cleanup taking a long time. Nevertheless I agree that some of those things should be deferred to other threads so the brunt of the work (eg IO bound things) don't hold critical locks. > Send out-of-band heartbeat to avoid fake lost tasktracker > - > > Key: MAPREDUCE-1247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: ZhuGuanyin > > Currently the TaskTracker report task status to jobtracker through heartbeat, > sometimes if the tasktracker lock the tasktracker to do some cleanup job, > like remove task temp data on disk, the heartbeat thread would hang for a > long time while waiting for the lock, so the jobtracker just thought it had > lost and would reschedule all its finished maps or un finished reduce on > other tasktrackers, we call it "fake lost tasktracker", some times it doesn't > acceptable especially when we run some large jobs. So We introduce a > out-of-band heartbeat mechanism to send an out-of-band heartbeat in that case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783564#action_12783564 ] ZhuGuanyin commented on MAPREDUCE-1247: --- We print the java jstack when it became fake lost tasktracker on hadoop version 0.19, and found: 7 times the heartbeat thread waiting the TaskTracker lock ( 5 times because of taskCleanup thread hold for a long time, 2 times because of reduce sub jvm call TaskTracker.getMapCompletionEvents()) 4 times the heartbeat thread waiting for the TaskTracker.TaskInProgress lock ( 3 times because of taskCleanup thread hold for a long time, 1 time because of TaskLauncher hold for a long time) 2 times the heartbeat thread waiting for the AllocatorPerContext lock The heartbeat thread should only answer for the live or death status of tasktracker, but current implentition it has too many others things to do, we should let the heartbeat thread only do what it has to do. > Send out-of-band heartbeat to avoid fake lost tasktracker > - > > Key: MAPREDUCE-1247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: ZhuGuanyin > > Currently the TaskTracker report task status to jobtracker through heartbeat, > sometimes if the tasktracker lock the tasktracker to do some cleanup job, > like remove task temp data on disk, the heartbeat thread would hang for a > long time while waiting for the lock, so the jobtracker just thought it had > lost and would reschedule all its finished maps or un finished reduce on > other tasktrackers, we call it "fake lost tasktracker", some times it doesn't > acceptable especially when we run some large jobs. So We introduce a > out-of-band heartbeat mechanism to send an out-of-band heartbeat in that case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.