[ 
https://issues.apache.org/jira/browse/TEZ-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201770#comment-15201770
 ] 

Jonathan Eagles commented on TEZ-3165:
--------------------------------------

[~sseth], it was found that both pig's 
[HBaseStorage|https://github.com/apache/pig/blob/branch-0.14/src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java]
 and elephant-bird's 
[SequenceFileStorage|https://github.com/twitter/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/util/SequenceFileConfig.java]
 use OptionBuilder which is not thread-safe ([Thread-safety 
notice|https://commons.apache.org/proper/commons-cli/javadocs/api-release/org/apache/commons/cli/OptionBuilder.html])

{noformat}
myinput = load 'hbase://mydb:mytable' using 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:m','-loadKey true $OPTIONS')
...
store output into 'myoutput' using 
com.twitter.elephantbird.pig.store.SequenceFileStorage('-c 
com.twitter.elephantbird.pig.util.TextConverter','-c 
com.twitter.elephantbird.pig.util.TextConverter');
{noformat}

In this case we need to be able to completely serialize the initializations of 
pig's processor, inputs, and outputs to avoid this condition. The fixes to pig 
and elephant bird are in process, but this will allow compatibility mode for 
the mode widely used versions as well as user defined functions which 
potentially have the same issue.

> Parallel initialization of inputs, outputs, and processor can cause 
> NoSuchMethodException
> -----------------------------------------------------------------------------------------
>
>                 Key: TEZ-3165
>                 URL: https://issues.apache.org/jira/browse/TEZ-3165
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: TEZ-3165.1.patch
>
>
> 2016-03-13 23:55:17,162 [INFO] [main] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initializing 
> LogicalIOProcessorRuntimeTask with TaskSpec: DAGName : 
> PigLatin:Script.pig-0_scope-0, VertexName: scope-203, VertexParallelism: 
> 2707, TaskAttemptID:attempt_1, 
> processorName=org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor,
>  inputSpecListSize=1, outputSpecListSize=1, inputSpecList=[{{ 
> sourceVertexName=scope-0, physicalEdgeCount=1, 
> inputClassName=org.apache.tez.mapreduce.input.MRInput }}, ], 
> outputSpecList=[{{ destinationVertexName=scope-28, physicalEdgeCount=0, 
> outputClassName=org.apache.tez.mapreduce.output.MROutput }}, ]
> 2016-03-13 23:55:17,164 [INFO] [main] |resources.MemoryDistributor|: 
> InitialMemoryDistributor (isEnabled=true) invoked with: numInputs=1, 
> numOutputs=1, JVM.maxFree=1059061760, 
> allocatorClassName=org.apache.tez.runtime.library.resources.WeightedScalingMemoryDistributor
> 2016-03-13 23:55:17,175 [INFO] [TezChild] |task.TezTaskRunner|: Initializing 
> task, taskAttemptId=attempt_1
> 2016-03-13 23:55:17,182 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Routing events from heartbeat response to task, 
> currentTaskAttemptId=attempt_1, eventCount=1 fromEventId=0 nextFromEventId=0
> 2016-03-13 23:55:17,212 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |Configuration.deprecation|: mapreduce.inputformat.class is deprecated. 
> Instead, use mapreduce.job.inputformat.class
> 2016-03-13 23:55:17,214 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |Configuration.deprecation|: fs.default.name is deprecated. Instead, use 
> fs.defaultFS
> 2016-03-13 23:55:17,223 [INFO] [I/O Setup 1 Initialize: {scope-28}] 
> |counters.Limits|: Counter limits initialized with parameters:  
> GROUP_NAME_MAX=256, MAX_GROUPS=1000, COUNTER_NAME_MAX=128, MAX_COUNTERS=5000
> 2016-03-13 23:55:17,228 [INFO] [I/O Setup 0 Initialize: {scope-0}] 
> |input.MRInput|: scope-0 using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-03-13 23:55:17,233 [INFO] [I/O Setup 0 Initialize: {scope-0}] 
> |input.MRInput|: Initialized MRInput: scope-0
> 2016-03-13 23:55:17,345 [INFO] [TezChild] |data.SchemaTupleBackend|: Key 
> [pig.schematuple] was not set... will not generate code.
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-03-13 23:55:17,400 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-03-13 23:55:17,400 [INFO] [TezChild] |task.TezTaskRunner|: Encounted an 
> error while executing task: attempt_1
> java.lang.RuntimeException: could not instantiate 
> 'com.twitter.elephantbird.pig.store.SequenceFileStorage' with arguments '[-c 
> com.twitter.elephantbird.pig.util.TextConverter, -c 
> com.twitter.elephantbird.pig.util.TextConverter]'
>       at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:766)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:250)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:76)
>       at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigOutputFormatTez.getRecordWriter(PigOutputFormatTez.java:43)
>       at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:399)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:506)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:489)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:474)
>       at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>       at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:734)
>       ... 14 more
> Caused by: java.lang.RuntimeException: Failed to create WritableConverter 
> instance
>       at 
> com.twitter.elephantbird.pig.util.SequenceFileConfig.getWritableConverter(SequenceFileConfig.java:225)
>       at 
> com.twitter.elephantbird.pig.util.SequenceFileConfig.<init>(SequenceFileConfig.java:101)
>       at 
> com.twitter.elephantbird.pig.store.SequenceFileStorage$Config.<init>(SequenceFileStorage.java:89)
>       at 
> com.twitter.elephantbird.pig.store.SequenceFileStorage.<init>(SequenceFileStorage.java:190)
>       ... 19 more
> Caused by: java.lang.NoSuchMethodException: 
> com.twitter.elephantbird.pig.util.TextConverter.<init>(java.lang.String)
>       at java.lang.Class.getConstructor0(Class.java:3074)
>       at java.lang.Class.getConstructor(Class.java:1817)
>       at 
> com.twitter.elephantbird.pig.util.SequenceFileConfig.getWritableConverter(SequenceFileConfig.java:213)
>       ... 22 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to