Re: Spark cluster error

Pat Ferrel Tue, 29 May 2018 14:14:37 -0700

Yes, the spark-submit --jars is where we started to find the missing class.
The class isn’t found on the remote executor so we looked in the jars
actually downloaded into the executor’s work dir. the PIO assembly jars are
there are do have the classes. This would be in the classpath of the
executor, right? Not sure what you are asking.


Are you asking about the SPARK_CLASSPATH in spark-env.sh? The default
should include the work subdir for the job, I believe. and it can only be
added to so we couldn’t have messed that up if it points first to the
work/job-number dir, right?

I guess the root of my question is how can the jars be downloaded to the
executor’s work dir and still the classes we know are in the jar are not
found?


From: Donald Szeto <don...@apache.org> <don...@apache.org>
Reply: user@predictionio.apache.org <user@predictionio.apache.org>
<user@predictionio.apache.org>
Date: May 29, 2018 at 1:27:03 PM
To: user@predictionio.apache.org <user@predictionio.apache.org>
<user@predictionio.apache.org>
Subject:  Re: Spark cluster error

Sorry, what I meant was the actual spark-submit command that PIO was using.
It should be in the log.

What Spark version was that? I recall classpath issues with certain
versions of Spark.

On Thu, May 24, 2018 at 4:52 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> Thanks Donald,
>
> We have:
>
>    - built pio with hbase 1.4.3, which is what we have deployed
>    - verified that the `ProtobufUtil` class is in the pio hbase assembly
>    - verified the assembly is passed in --jars to spark-submit
>    - verified that the executors receive and store the assemblies in the
>    FS work dir on the worker machines
>    - verified that hashes match the original assembly so the class is
>    being received by every executor
>
> However the executor is unable to find the class.
>
> This seems just short of impossible but clearly possible. How can the
> executor deserialize the code but not find it later?
>
> Not sure what you mean the classpath going in to the cluster? The classDef
> not found does seem to be in the pio 0.12.1 hbase assembly, isn’t this
> where it should get it?
>
> Thanks again
> p
>
>
> From: Donald Szeto <don...@apache.org> <don...@apache.org>
> Reply: user@predictionio.apache.org <user@predictionio.apache.org>
> <user@predictionio.apache.org>
> Date: May 24, 2018 at 2:10:24 PM
> To: user@predictionio.apache.org <user@predictionio.apache.org>
> <user@predictionio.apache.org>
> Subject:  Re: Spark cluster error
>
> 0.12.1 packages HBase 0.98.5-hadoop2 in the storage driver assembly.
> Looking at Git history it has not changed in a while.
>
> Do you have the exact classpath that has gone into your Spark cluster?
>
> On Wed, May 23, 2018 at 1:30 PM, Pat Ferrel <p...@actionml.com> wrote:
>
>> A source build did not fix the problem, has anyone run PIO 0.12.1 on a
>> Spark cluster? The issue seems to be how to pass the correct code to Spark
>> to connect to HBase:
>>
>> [ERROR] [TransportRequestHandler] Error while invoking
>> RpcHandler#receive() for one-way message.
>> [ERROR] [TransportRequestHandler] Error while invoking
>> RpcHandler#receive() for one-way message.
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent
>> failure: Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0):
>> java.lang.NoClassDefFoundError: Could not initialize class
>> org.apache.hadoop.hbase.protobuf.ProtobufUtil
>>     at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convert
>> StringToScan(TableMapReduceUtil.java:521)
>>     at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(
>> TableInputFormat.java:110)
>>     at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRD
>> D.scala:170)
>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>>     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsR
>> DD.scala:38)
>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)```
>> (edited)
>>
>> Now that we have these pluggable DBs did I miss something? This works
>> with master=local but not with remote Spark master
>>
>> I’ve passed in the hbase-client in the --jars part of spark-submit, still
>> fails, what am I missing?
>>
>>
>> From: Pat Ferrel <p...@actionml.com> <p...@actionml.com>
>> Reply: Pat Ferrel <p...@actionml.com> <p...@actionml.com>
>> Date: May 23, 2018 at 8:57:32 AM
>> To: user@predictionio.apache.org <user@predictionio.apache.org>
>> <user@predictionio.apache.org>
>> Subject:  Spark cluster error
>>
>> Same CLI works using local Spark master, but fails using remote master
>> for a cluster due to a missing class def for protobuf used in hbase. We are
>> using the binary dist 0.12.1.  Is this known? Is there a work around?
>>
>> We are now trying a source build in hope the class will be put in the
>> assembly passed to Spark and the reasoning is that the executors don’t
>> contain hbase classes but when you run a local executor it does, due to
>> some local classpath. If the source built assembly does not have these
>> classes, we will have the same problem. Namely how to get protobuf to the
>> executors.
>>
>> Has anyone seen this?
>>
>>
>

Re: Spark cluster error

Reply via email to