[jira] [Updated] (TEZ-3165) Allow Inputs/Outputs to be initialized serially, control processor initialization relative to Inputs/Outputs

2016-04-14 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3165:
-
Attachment: TEZ-3165.4-branch-0.7.patch

> Allow Inputs/Outputs to be initialized serially, control processor 
> initialization relative to Inputs/Outputs
> 
>
> Key: TEZ-3165
> URL: https://issues.apache.org/jira/browse/TEZ-3165
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3165.1.patch, TEZ-3165.2.patch, TEZ-3165.3.patch, 
> TEZ-3165.4-branch-0.7.patch, TEZ-3165.4.patch
>
>
> 2016-03-13 23:55:17,162 [INFO] [main] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initializing 
> LogicalIOProcessorRuntimeTask with TaskSpec: DAGName : 
> PigLatin:Script.pig-0_scope-0, VertexName: scope-203, VertexParallelism: 
> 2707, TaskAttemptID:attempt_1, 
> processorName=org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor,
>  inputSpecListSize=1, outputSpecListSize=1, inputSpecList=[{{ 
> sourceVertexName=scope-0, physicalEdgeCount=1, 
> inputClassName=org.apache.tez.mapreduce.input.MRInput }}, ], 
> outputSpecList=[{{ destinationVertexName=scope-28, physicalEdgeCount=0, 
> outputClassName=org.apache.tez.mapreduce.output.MROutput }}, ]
> 2016-03-13 23:55:17,164 [INFO] [main] |resources.MemoryDistributor|: 
> InitialMemoryDistributor (isEnabled=true) invoked with: numInputs=1, 
> numOutputs=1, JVM.maxFree=1059061760, 
> allocatorClassName=org.apache.tez.runtime.library.resources.WeightedScalingMemoryDistributor
> 2016-03-13 23:55:17,175 [INFO] [TezChild] |task.TezTaskRunner|: Initializing 
> task, taskAttemptId=attempt_1
> 2016-03-13 23:55:17,182 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Routing events from heartbeat response to task, 
> currentTaskAttemptId=attempt_1, eventCount=1 fromEventId=0 nextFromEventId=0
> 2016-03-13 23:55:17,212 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |Configuration.deprecation|: mapreduce.inputformat.class is deprecated. 
> Instead, use mapreduce.job.inputformat.class
> 2016-03-13 23:55:17,214 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |Configuration.deprecation|: fs.default.name is deprecated. Instead, use 
> fs.defaultFS
> 2016-03-13 23:55:17,223 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |counters.Limits|: Counter limits initialized with parameters:  
> GROUP_NAME_MAX=256, MAX_GROUPS=1000, COUNTER_NAME_MAX=128, MAX_COUNTERS=5000
> 2016-03-13 23:55:17,228 [INFO] [I/O Setup 0 Initialize: {scope-0}] 
> |input.MRInput|: scope-0 using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-03-13 23:55:17,233 [INFO] [I/O Setup 0 Initialize: {scope-0}] 
> |input.MRInput|: Initialized MRInput: scope-0
> 2016-03-13 23:55:17,345 [INFO] [TezChild] |data.SchemaTupleBackend|: Key 
> [pig.schematuple] was not set... will not generate code.
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-03-13 23:55:17,400 [INFO] [TezChild] |task.TezTaskRunner|: Encounted an 
> error while executing task: attempt_1
> java.lang.RuntimeException: could not instantiate 
> 'com.twitter.elephantbird.pig.store.SequenceFileStorage' with arguments '[-c 
> com.twitter.elephantbird.pig.util.TextConverter, -c 
> com.twitter.elephantbird.pig.util.TextConverter]'
>   at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:766)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:250)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:76)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigOutputFormatTez.getRecordWriter(PigOutputFormatTez.java:43)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:399)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:506)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:489)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:474)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> 

[jira] [Updated] (TEZ-3165) Allow Inputs/Outputs to be initialized serially, control processor initialization relative to Inputs/Outputs

2016-04-14 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3165:
-
Attachment: TEZ-3165.4.patch

Thanks for the review [~hitesh] and [~sseth]. Addressed comments and started 
the commit process.

> Allow Inputs/Outputs to be initialized serially, control processor 
> initialization relative to Inputs/Outputs
> 
>
> Key: TEZ-3165
> URL: https://issues.apache.org/jira/browse/TEZ-3165
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3165.1.patch, TEZ-3165.2.patch, TEZ-3165.3.patch, 
> TEZ-3165.4.patch
>
>
> 2016-03-13 23:55:17,162 [INFO] [main] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initializing 
> LogicalIOProcessorRuntimeTask with TaskSpec: DAGName : 
> PigLatin:Script.pig-0_scope-0, VertexName: scope-203, VertexParallelism: 
> 2707, TaskAttemptID:attempt_1, 
> processorName=org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor,
>  inputSpecListSize=1, outputSpecListSize=1, inputSpecList=[{{ 
> sourceVertexName=scope-0, physicalEdgeCount=1, 
> inputClassName=org.apache.tez.mapreduce.input.MRInput }}, ], 
> outputSpecList=[{{ destinationVertexName=scope-28, physicalEdgeCount=0, 
> outputClassName=org.apache.tez.mapreduce.output.MROutput }}, ]
> 2016-03-13 23:55:17,164 [INFO] [main] |resources.MemoryDistributor|: 
> InitialMemoryDistributor (isEnabled=true) invoked with: numInputs=1, 
> numOutputs=1, JVM.maxFree=1059061760, 
> allocatorClassName=org.apache.tez.runtime.library.resources.WeightedScalingMemoryDistributor
> 2016-03-13 23:55:17,175 [INFO] [TezChild] |task.TezTaskRunner|: Initializing 
> task, taskAttemptId=attempt_1
> 2016-03-13 23:55:17,182 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Routing events from heartbeat response to task, 
> currentTaskAttemptId=attempt_1, eventCount=1 fromEventId=0 nextFromEventId=0
> 2016-03-13 23:55:17,212 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |Configuration.deprecation|: mapreduce.inputformat.class is deprecated. 
> Instead, use mapreduce.job.inputformat.class
> 2016-03-13 23:55:17,214 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |Configuration.deprecation|: fs.default.name is deprecated. Instead, use 
> fs.defaultFS
> 2016-03-13 23:55:17,223 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |counters.Limits|: Counter limits initialized with parameters:  
> GROUP_NAME_MAX=256, MAX_GROUPS=1000, COUNTER_NAME_MAX=128, MAX_COUNTERS=5000
> 2016-03-13 23:55:17,228 [INFO] [I/O Setup 0 Initialize: {scope-0}] 
> |input.MRInput|: scope-0 using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-03-13 23:55:17,233 [INFO] [I/O Setup 0 Initialize: {scope-0}] 
> |input.MRInput|: Initialized MRInput: scope-0
> 2016-03-13 23:55:17,345 [INFO] [TezChild] |data.SchemaTupleBackend|: Key 
> [pig.schematuple] was not set... will not generate code.
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-03-13 23:55:17,400 [INFO] [TezChild] |task.TezTaskRunner|: Encounted an 
> error while executing task: attempt_1
> java.lang.RuntimeException: could not instantiate 
> 'com.twitter.elephantbird.pig.store.SequenceFileStorage' with arguments '[-c 
> com.twitter.elephantbird.pig.util.TextConverter, -c 
> com.twitter.elephantbird.pig.util.TextConverter]'
>   at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:766)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:250)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:76)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigOutputFormatTez.getRecordWriter(PigOutputFormatTez.java:43)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:399)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:506)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:489)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:474)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 

[jira] [Updated] (TEZ-3165) Allow Inputs/Outputs to be initialized serially, control processor initialization relative to Inputs/Outputs

2016-03-23 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-3165:

Summary: Allow Inputs/Outputs to be initialized serially, control processor 
initialization relative to Inputs/Outputs  (was: Parallel initialization of 
inputs, outputs, and processor can cause NoSuchMethodException)

> Allow Inputs/Outputs to be initialized serially, control processor 
> initialization relative to Inputs/Outputs
> 
>
> Key: TEZ-3165
> URL: https://issues.apache.org/jira/browse/TEZ-3165
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3165.1.patch, TEZ-3165.2.patch, TEZ-3165.3.patch
>
>
> 2016-03-13 23:55:17,162 [INFO] [main] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initializing 
> LogicalIOProcessorRuntimeTask with TaskSpec: DAGName : 
> PigLatin:Script.pig-0_scope-0, VertexName: scope-203, VertexParallelism: 
> 2707, TaskAttemptID:attempt_1, 
> processorName=org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor,
>  inputSpecListSize=1, outputSpecListSize=1, inputSpecList=[{{ 
> sourceVertexName=scope-0, physicalEdgeCount=1, 
> inputClassName=org.apache.tez.mapreduce.input.MRInput }}, ], 
> outputSpecList=[{{ destinationVertexName=scope-28, physicalEdgeCount=0, 
> outputClassName=org.apache.tez.mapreduce.output.MROutput }}, ]
> 2016-03-13 23:55:17,164 [INFO] [main] |resources.MemoryDistributor|: 
> InitialMemoryDistributor (isEnabled=true) invoked with: numInputs=1, 
> numOutputs=1, JVM.maxFree=1059061760, 
> allocatorClassName=org.apache.tez.runtime.library.resources.WeightedScalingMemoryDistributor
> 2016-03-13 23:55:17,175 [INFO] [TezChild] |task.TezTaskRunner|: Initializing 
> task, taskAttemptId=attempt_1
> 2016-03-13 23:55:17,182 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Routing events from heartbeat response to task, 
> currentTaskAttemptId=attempt_1, eventCount=1 fromEventId=0 nextFromEventId=0
> 2016-03-13 23:55:17,212 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |Configuration.deprecation|: mapreduce.inputformat.class is deprecated. 
> Instead, use mapreduce.job.inputformat.class
> 2016-03-13 23:55:17,214 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |Configuration.deprecation|: fs.default.name is deprecated. Instead, use 
> fs.defaultFS
> 2016-03-13 23:55:17,223 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |counters.Limits|: Counter limits initialized with parameters:  
> GROUP_NAME_MAX=256, MAX_GROUPS=1000, COUNTER_NAME_MAX=128, MAX_COUNTERS=5000
> 2016-03-13 23:55:17,228 [INFO] [I/O Setup 0 Initialize: {scope-0}] 
> |input.MRInput|: scope-0 using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-03-13 23:55:17,233 [INFO] [I/O Setup 0 Initialize: {scope-0}] 
> |input.MRInput|: Initialized MRInput: scope-0
> 2016-03-13 23:55:17,345 [INFO] [TezChild] |data.SchemaTupleBackend|: Key 
> [pig.schematuple] was not set... will not generate code.
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-03-13 23:55:17,400 [INFO] [TezChild] |task.TezTaskRunner|: Encounted an 
> error while executing task: attempt_1
> java.lang.RuntimeException: could not instantiate 
> 'com.twitter.elephantbird.pig.store.SequenceFileStorage' with arguments '[-c 
> com.twitter.elephantbird.pig.util.TextConverter, -c 
> com.twitter.elephantbird.pig.util.TextConverter]'
>   at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:766)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:250)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:76)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigOutputFormatTez.getRecordWriter(PigOutputFormatTez.java:43)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:399)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:506)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:489)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:474)
>   at