[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Shvachko updated MAPREDUCE-1853: ------------------------------------------- Component/s: task Description: In MultipleOutputs there is {code} private TaskAttemptContext getContext(String nameOutput) throws IOException { // The following trick leverages the instantiation of a record writer via // the job thus supporting arbitrary output formats. Job job = new Job(context.getConfiguration()); job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); TaskAttemptContext taskContext = new TaskAttemptContextImpl(job.getConfiguration(), context.getTaskAttemptID()); return taskContext; } {code} so for every reduce call it creates a new Job instance ...which creates a new LocalJobRunner. That does not sound like a good idea. You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized" This should probably also be added to 0.22. was: In MultipleOutputs there is [code] private TaskAttemptContext getContext(String nameOutput) throws IOException { // The following trick leverages the instantiation of a record writer via // the job thus supporting arbitrary output formats. Job job = new Job(context.getConfiguration()); job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); TaskAttemptContext taskContext = new TaskAttemptContextImpl(job.getConfiguration(), context.getTaskAttemptID()); return taskContext; } [code] so for every reduce call it creates a new Job instance ...which creates a new LocalJobRunner. That does not sound like a good idea. You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized" This should probably also be added to 0.22. Assignee: Torsten Curdt > MultipleOutputs does not cache TaskAttemptContext > ------------------------------------------------- > > Key: MAPREDUCE-1853 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task > Affects Versions: 0.21.0, 0.22.0 > Environment: OSX 10.6 > java6 > Reporter: Torsten Curdt > Assignee: Torsten Curdt > Priority: Critical > Fix For: 0.21.0, 0.22.0 > > Attachments: cache-task-attempts.diff > > > In MultipleOutputs there is > {code} > private TaskAttemptContext getContext(String nameOutput) throws IOException { > // The following trick leverages the instantiation of a record writer via > // the job thus supporting arbitrary output formats. > Job job = new Job(context.getConfiguration()); > job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); > job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); > job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); > TaskAttemptContext taskContext = > new TaskAttemptContextImpl(job.getConfiguration(), > context.getTaskAttemptID()); > return taskContext; > } > {code} > so for every reduce call it creates a new Job instance ...which creates a new > LocalJobRunner. > That does not sound like a good idea. > You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized" > This should probably also be added to 0.22. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira