[jira] [Updated] (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated MAPREDUCE-1853: --- Component/s: task Description: In MultipleOutputs there is {code} private TaskAttemptContext getContext(String nameOutput) throws IOException { // The following trick leverages the instantiation of a record writer via // the job thus supporting arbitrary output formats. Job job = new Job(context.getConfiguration()); job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); TaskAttemptContext taskContext = new TaskAttemptContextImpl(job.getConfiguration(), context.getTaskAttemptID()); return taskContext; } {code} so for every reduce call it creates a new Job instance ...which creates a new LocalJobRunner. That does not sound like a good idea. You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized" This should probably also be added to 0.22. was: In MultipleOutputs there is [code] private TaskAttemptContext getContext(String nameOutput) throws IOException { // The following trick leverages the instantiation of a record writer via // the job thus supporting arbitrary output formats. Job job = new Job(context.getConfiguration()); job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); TaskAttemptContext taskContext = new TaskAttemptContextImpl(job.getConfiguration(), context.getTaskAttemptID()); return taskContext; } [code] so for every reduce call it creates a new Job instance ...which creates a new LocalJobRunner. That does not sound like a good idea. You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized" This should probably also be added to 0.22. Assignee: Torsten Curdt > MultipleOutputs does not cache TaskAttemptContext > - > > Key: MAPREDUCE-1853 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0, 0.22.0 > Environment: OSX 10.6 > java6 >Reporter: Torsten Curdt >Assignee: Torsten Curdt >Priority: Critical > Fix For: 0.21.0, 0.22.0 > > Attachments: cache-task-attempts.diff > > > In MultipleOutputs there is > {code} > private TaskAttemptContext getContext(String nameOutput) throws IOException { > // The following trick leverages the instantiation of a record writer via > // the job thus supporting arbitrary output formats. > Job job = new Job(context.getConfiguration()); > job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); > job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); > job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); > TaskAttemptContext taskContext = > new TaskAttemptContextImpl(job.getConfiguration(), > context.getTaskAttemptID()); > return taskContext; > } > {code} > so for every reduce call it creates a new Job instance ...which creates a new > LocalJobRunner. > That does not sound like a good idea. > You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized" > This should probably also be added to 0.22. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated MAPREDUCE-1853: --- Fix Version/s: 0.22.0 Affects Version/s: 0.22.0 > MultipleOutputs does not cache TaskAttemptContext > - > > Key: MAPREDUCE-1853 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0, 0.22.0 > Environment: OSX 10.6 > java6 >Reporter: Torsten Curdt >Priority: Critical > Fix For: 0.21.0, 0.22.0 > > Attachments: cache-task-attempts.diff > > > In MultipleOutputs there is > [code] > private TaskAttemptContext getContext(String nameOutput) throws IOException { > // The following trick leverages the instantiation of a record writer via > // the job thus supporting arbitrary output formats. > Job job = new Job(context.getConfiguration()); > job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); > job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); > job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); > TaskAttemptContext taskContext = > new TaskAttemptContextImpl(job.getConfiguration(), > context.getTaskAttemptID()); > return taskContext; > } > [code] > so for every reduce call it creates a new Job instance ...which creates a new > LocalJobRunner. > That does not sound like a good idea. > You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized" > This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1853: --- Hadoop Flags: [Reviewed] > MultipleOutputs does not cache TaskAttemptContext > - > > Key: MAPREDUCE-1853 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 > Environment: OSX 10.6 > java6 >Reporter: Torsten Curdt >Priority: Critical > Fix For: 0.21.0 > > Attachments: cache-task-attempts.diff > > > In MultipleOutputs there is > [code] > private TaskAttemptContext getContext(String nameOutput) throws IOException { > // The following trick leverages the instantiation of a record writer via > // the job thus supporting arbitrary output formats. > Job job = new Job(context.getConfiguration()); > job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); > job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); > job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); > TaskAttemptContext taskContext = > new TaskAttemptContextImpl(job.getConfiguration(), > context.getTaskAttemptID()); > return taskContext; > } > [code] > so for every reduce call it creates a new Job instance ...which creates a new > LocalJobRunner. > That does not sound like a good idea. > You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized" > This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1853: --- Status: Resolved (was: Patch Available) Fix Version/s: 0.21.0 (was: 0.22.0) Resolution: Fixed I just committed this to trunk and branch 0.21. Thanks Torsen ! > MultipleOutputs does not cache TaskAttemptContext > - > > Key: MAPREDUCE-1853 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 > Environment: OSX 10.6 > java6 >Reporter: Torsten Curdt >Priority: Critical > Fix For: 0.21.0 > > Attachments: cache-task-attempts.diff > > > In MultipleOutputs there is > [code] > private TaskAttemptContext getContext(String nameOutput) throws IOException { > // The following trick leverages the instantiation of a record writer via > // the job thus supporting arbitrary output formats. > Job job = new Job(context.getConfiguration()); > job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); > job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); > job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); > TaskAttemptContext taskContext = > new TaskAttemptContextImpl(job.getConfiguration(), > context.getTaskAttemptID()); > return taskContext; > } > [code] > so for every reduce call it creates a new Job instance ...which creates a new > LocalJobRunner. > That does not sound like a good idea. > You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized" > This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1853: --- Status: Patch Available (was: Open) Fix Version/s: 0.22.0 Changes look good. > MultipleOutputs does not cache TaskAttemptContext > - > > Key: MAPREDUCE-1853 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 > Environment: OSX 10.6 > java6 >Reporter: Torsten Curdt >Priority: Critical > Fix For: 0.22.0 > > Attachments: cache-task-attempts.diff > > > In MultipleOutputs there is > [code] > private TaskAttemptContext getContext(String nameOutput) throws IOException { > // The following trick leverages the instantiation of a record writer via > // the job thus supporting arbitrary output formats. > Job job = new Job(context.getConfiguration()); > job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); > job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); > job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); > TaskAttemptContext taskContext = > new TaskAttemptContextImpl(job.getConfiguration(), > context.getTaskAttemptID()); > return taskContext; > } > [code] > so for every reduce call it creates a new Job instance ...which creates a new > LocalJobRunner. > That does not sound like a good idea. > You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized" > This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Curdt updated MAPREDUCE-1853: - Attachment: (was: cache-task-attempts.diff) > MultipleOutputs does not cache TaskAttemptContext > - > > Key: MAPREDUCE-1853 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 > Environment: OSX 10.6 > java6 >Reporter: Torsten Curdt >Priority: Critical > Attachments: cache-task-attempts.diff > > > In MultipleOutputs there is > [code] > private TaskAttemptContext getContext(String nameOutput) throws IOException { > // The following trick leverages the instantiation of a record writer via > // the job thus supporting arbitrary output formats. > Job job = new Job(context.getConfiguration()); > job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); > job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); > job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); > TaskAttemptContext taskContext = > new TaskAttemptContextImpl(job.getConfiguration(), > context.getTaskAttemptID()); > return taskContext; > } > [code] > so for every reduce call it creates a new Job instance ...which creates a new > LocalJobRunner. > That does not sound like a good idea. > You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized" > This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Curdt updated MAPREDUCE-1853: - Attachment: cache-task-attempts.diff with --no-prefix > MultipleOutputs does not cache TaskAttemptContext > - > > Key: MAPREDUCE-1853 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 > Environment: OSX 10.6 > java6 >Reporter: Torsten Curdt >Priority: Critical > Attachments: cache-task-attempts.diff > > > In MultipleOutputs there is > [code] > private TaskAttemptContext getContext(String nameOutput) throws IOException { > // The following trick leverages the instantiation of a record writer via > // the job thus supporting arbitrary output formats. > Job job = new Job(context.getConfiguration()); > job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); > job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); > job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); > TaskAttemptContext taskContext = > new TaskAttemptContextImpl(job.getConfiguration(), > context.getTaskAttemptID()); > return taskContext; > } > [code] > so for every reduce call it creates a new Job instance ...which creates a new > LocalJobRunner. > That does not sound like a good idea. > You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized" > This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Curdt updated MAPREDUCE-1853: - Attachment: cache-task-attempts.diff > MultipleOutputs does not cache TaskAttemptContext > - > > Key: MAPREDUCE-1853 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 > Environment: OSX 10.6 > java6 >Reporter: Torsten Curdt >Priority: Critical > Attachments: cache-task-attempts.diff > > > In MultipleOutputs there is > [code] > private TaskAttemptContext getContext(String nameOutput) throws IOException { > // The following trick leverages the instantiation of a record writer via > // the job thus supporting arbitrary output formats. > Job job = new Job(context.getConfiguration()); > job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); > job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); > job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); > TaskAttemptContext taskContext = > new TaskAttemptContextImpl(job.getConfiguration(), > context.getTaskAttemptID()); > return taskContext; > } > [code] > so for every reduce call it creates a new Job instance ...which creates a new > LocalJobRunner. > That does not sound like a good idea. > You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized" > This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.