Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-18 Thread Ted Yu
Interesting.

I will watching your PR.

On Wed, Nov 18, 2015 at 7:51 AM, 임정택  wrote:

> Ted,
>
> I suspect I hit the issue
> https://issues.apache.org/jira/browse/SPARK-11818
> Could you refer the issue and verify that it makes sense?
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2015-11-18 20:32 GMT+09:00 Ted Yu :
>
>> Here is related code:
>>
>>   private static void checkDefaultsVersion(Configuration conf) {
>>
>> if (conf.getBoolean("hbase.defaults.for.version.skip", Boolean.FALSE))
>> return;
>>
>> String defaultsVersion = conf.get("hbase.defaults.for.version");
>>
>> String thisVersion = VersionInfo.getVersion();
>>
>> if (!thisVersion.equals(defaultsVersion)) {
>>
>>   throw new RuntimeException(
>>
>> "hbase-default.xml file seems to be for an older version of
>> HBase (" +
>>
>> defaultsVersion + "), this version is " + thisVersion);
>>
>> null means that "hbase.defaults.for.version" was not set in the other
>> hbase-default.xml
>>
>> Can you retrieve the classpath of Spark task so that we can have more
>> clue ?
>>
>>
>> Cheers
>>
>> On Tue, Nov 17, 2015 at 10:06 PM, 임정택  wrote:
>>
>>> Ted,
>>>
>>> Thanks for the reply.
>>>
>>> My fat jar has dependency with spark related library to only spark-core
>>> as "provided".
>>> Seems like Spark only adds 0.98.7-hadoop2 of hbase-common in
>>> spark-example module.
>>>
>>> And if there're two hbase-default.xml in the classpath, should one of
>>> them be loaded, instead of showing (null)?
>>>
>>> Best,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>>
>>>
>>> 2015-11-18 13:50 GMT+09:00 Ted Yu :
>>>
 Looks like there're two hbase-default.xml in the classpath: one for 0.98.6
 and another for 0.98.7-hadoop2 (used by Spark)

 You can specify hbase.defaults.for.version.skip as true in your
 hbase-site.xml

 Cheers

 On Tue, Nov 17, 2015 at 1:01 AM, 임정택  wrote:

> Hi all,
>
> I'm evaluating zeppelin to run driver which interacts with HBase.
> I use fat jar to include HBase dependencies, and see failures on
> executor level.
> I thought it is zeppelin's issue, but it fails on spark-shell, too.
>
> I loaded fat jar via --jars option,
>
> > ./bin/spark-shell --jars hbase-included-assembled.jar
>
> and run driver code using provided SparkContext instance, and see
> failures from spark-shell console and executor logs.
>
> below is stack traces,
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> 55 in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in 
> stage 0.0 (TID 281, ): java.lang.NoClassDefFoundError: 
> Could not initialize class 
> org.apache.hadoop.hbase.client.HConnectionManager
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
> at 
> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> 

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-18 Thread 임정택
Ted,

I suspect I hit the issue https://issues.apache.org/jira/browse/SPARK-11818
Could you refer the issue and verify that it makes sense?

Thanks,
Jungtaek Lim (HeartSaVioR)

2015-11-18 20:32 GMT+09:00 Ted Yu :

> Here is related code:
>
>   private static void checkDefaultsVersion(Configuration conf) {
>
> if (conf.getBoolean("hbase.defaults.for.version.skip", Boolean.FALSE))
> return;
>
> String defaultsVersion = conf.get("hbase.defaults.for.version");
>
> String thisVersion = VersionInfo.getVersion();
>
> if (!thisVersion.equals(defaultsVersion)) {
>
>   throw new RuntimeException(
>
> "hbase-default.xml file seems to be for an older version of HBase
> (" +
>
> defaultsVersion + "), this version is " + thisVersion);
>
> null means that "hbase.defaults.for.version" was not set in the other
> hbase-default.xml
>
> Can you retrieve the classpath of Spark task so that we can have more clue
> ?
>
>
> Cheers
>
> On Tue, Nov 17, 2015 at 10:06 PM, 임정택  wrote:
>
>> Ted,
>>
>> Thanks for the reply.
>>
>> My fat jar has dependency with spark related library to only spark-core
>> as "provided".
>> Seems like Spark only adds 0.98.7-hadoop2 of hbase-common in
>> spark-example module.
>>
>> And if there're two hbase-default.xml in the classpath, should one of
>> them be loaded, instead of showing (null)?
>>
>> Best,
>> Jungtaek Lim (HeartSaVioR)
>>
>>
>>
>> 2015-11-18 13:50 GMT+09:00 Ted Yu :
>>
>>> Looks like there're two hbase-default.xml in the classpath: one for 0.98.6
>>> and another for 0.98.7-hadoop2 (used by Spark)
>>>
>>> You can specify hbase.defaults.for.version.skip as true in your
>>> hbase-site.xml
>>>
>>> Cheers
>>>
>>> On Tue, Nov 17, 2015 at 1:01 AM, 임정택  wrote:
>>>
 Hi all,

 I'm evaluating zeppelin to run driver which interacts with HBase.
 I use fat jar to include HBase dependencies, and see failures on
 executor level.
 I thought it is zeppelin's issue, but it fails on spark-shell, too.

 I loaded fat jar via --jars option,

 > ./bin/spark-shell --jars hbase-included-assembled.jar

 and run driver code using provided SparkContext instance, and see
 failures from spark-shell console and executor logs.

 below is stack traces,

 org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 
 in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in stage 
 0.0 (TID 281, ): java.lang.NoClassDefFoundError: Could not 
 initialize class org.apache.hadoop.hbase.client.HConnectionManager
 at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
 at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
 at 
 org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
 at 
 org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:70)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 Driver stacktrace:
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
 at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
   

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-18 Thread Ted Yu
Here is related code:

  private static void checkDefaultsVersion(Configuration conf) {

if (conf.getBoolean("hbase.defaults.for.version.skip", Boolean.FALSE))
return;

String defaultsVersion = conf.get("hbase.defaults.for.version");

String thisVersion = VersionInfo.getVersion();

if (!thisVersion.equals(defaultsVersion)) {

  throw new RuntimeException(

"hbase-default.xml file seems to be for an older version of HBase ("
+

defaultsVersion + "), this version is " + thisVersion);

null means that "hbase.defaults.for.version" was not set in the other
hbase-default.xml

Can you retrieve the classpath of Spark task so that we can have more clue ?


Cheers

On Tue, Nov 17, 2015 at 10:06 PM, 임정택  wrote:

> Ted,
>
> Thanks for the reply.
>
> My fat jar has dependency with spark related library to only spark-core as
> "provided".
> Seems like Spark only adds 0.98.7-hadoop2 of hbase-common in spark-example
> module.
>
> And if there're two hbase-default.xml in the classpath, should one of them
> be loaded, instead of showing (null)?
>
> Best,
> Jungtaek Lim (HeartSaVioR)
>
>
>
> 2015-11-18 13:50 GMT+09:00 Ted Yu :
>
>> Looks like there're two hbase-default.xml in the classpath: one for 0.98.6
>> and another for 0.98.7-hadoop2 (used by Spark)
>>
>> You can specify hbase.defaults.for.version.skip as true in your
>> hbase-site.xml
>>
>> Cheers
>>
>> On Tue, Nov 17, 2015 at 1:01 AM, 임정택  wrote:
>>
>>> Hi all,
>>>
>>> I'm evaluating zeppelin to run driver which interacts with HBase.
>>> I use fat jar to include HBase dependencies, and see failures on
>>> executor level.
>>> I thought it is zeppelin's issue, but it fails on spark-shell, too.
>>>
>>> I loaded fat jar via --jars option,
>>>
>>> > ./bin/spark-shell --jars hbase-included-assembled.jar
>>>
>>> and run driver code using provided SparkContext instance, and see
>>> failures from spark-shell console and executor logs.
>>>
>>> below is stack traces,
>>>
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 
>>> in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in stage 
>>> 0.0 (TID 281, ): java.lang.NoClassDefFoundError: Could not 
>>> initialize class org.apache.hadoop.hbase.client.HConnectionManager
>>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
>>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>>> at 
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
>>> at 
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>> at 
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>> at 
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
>>> at 
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:70)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> Driver stacktrace:
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
>>> at 
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>>> at scala.Option.foreach(Option.scala:236)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
>>> at 
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
>>> at 
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
>>> at 

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread Ted Yu
I am a bit curious:
Hbase depends on hdfs. 
Has hdfs support for Mesos been fully implemented ?

Last time I checked, there was still work to be done. 

Thanks

> On Nov 17, 2015, at 1:06 AM, 임정택  wrote:
> 
> Oh, one thing I missed is, I built Spark 1.4.1 Cluster with 6 nodes of Mesos 
> 0.22.1 H/A (via ZK) cluster.
> 
> 2015-11-17 18:01 GMT+09:00 임정택 :
>> Hi all,
>> 
>> I'm evaluating zeppelin to run driver which interacts with HBase.
>> I use fat jar to include HBase dependencies, and see failures on executor 
>> level.
>> I thought it is zeppelin's issue, but it fails on spark-shell, too.
>> 
>> I loaded fat jar via --jars option,
>>  
>> > ./bin/spark-shell --jars hbase-included-assembled.jar
>> 
>> and run driver code using provided SparkContext instance, and see failures 
>> from spark-shell console and executor logs.
>> 
>> below is stack traces,
>> 
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 
>> in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in stage 
>> 0.0 (TID 281, ): java.lang.NoClassDefFoundError: Could not 
>> initialize class org.apache.hadoop.hbase.client.HConnectionManager
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>> at 
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
>> at 
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
>> at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:70)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 
>> Driver stacktrace:
>> at 
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
>> at 
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at 
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>> at scala.Option.foreach(Option.scala:236)
>> at 
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
>> at 
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
>> at 
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>> 
>> 15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 
>> 14)
>> java.lang.ExceptionInInitializerError
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>> at 
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
>> at 
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
>> at 
>> 

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
Ted,

Could you elaborate, please?

I maintain separated HBase cluster and Mesos cluster for some reasons, and
I just can make it work via spark-submit or spark-shell / zeppelin with
newly initialized SparkContext.

Thanks,
Jungtaek Lim (HeartSaVioR)

2015-11-17 22:17 GMT+09:00 Ted Yu :

> I am a bit curious:
> Hbase depends on hdfs.
> Has hdfs support for Mesos been fully implemented ?
>
> Last time I checked, there was still work to be done.
>
> Thanks
>
> On Nov 17, 2015, at 1:06 AM, 임정택  wrote:
>
> Oh, one thing I missed is, I built Spark 1.4.1 Cluster with 6 nodes of
> Mesos 0.22.1 H/A (via ZK) cluster.
>
> 2015-11-17 18:01 GMT+09:00 임정택 :
>
>> Hi all,
>>
>> I'm evaluating zeppelin to run driver which interacts with HBase.
>> I use fat jar to include HBase dependencies, and see failures on executor
>> level.
>> I thought it is zeppelin's issue, but it fails on spark-shell, too.
>>
>> I loaded fat jar via --jars option,
>>
>> > ./bin/spark-shell --jars hbase-included-assembled.jar
>>
>> and run driver code using provided SparkContext instance, and see
>> failures from spark-shell console and executor logs.
>>
>> below is stack traces,
>>
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 
>> in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in stage 
>> 0.0 (TID 281, ): java.lang.NoClassDefFoundError: Could not 
>> initialize class org.apache.hadoop.hbase.client.HConnectionManager
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>> at 
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
>> at 
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
>> at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:70)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Driver stacktrace:
>> at 
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
>> at 
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at 
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>> at scala.Option.foreach(Option.scala:236)
>> at 
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
>> at 
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
>> at 
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>
>>
>> 15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 
>> 14)
>> java.lang.ExceptionInInitializerError
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>> at 
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
>> at 
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> 

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread Ted Yu
I see - your HBase cluster is separate from Mesos cluster.
I somehow got (incorrect) impression that HBase cluster runs on Mesos.

On Tue, Nov 17, 2015 at 7:53 PM, 임정택  wrote:

> Ted,
>
> Could you elaborate, please?
>
> I maintain separated HBase cluster and Mesos cluster for some reasons, and
> I just can make it work via spark-submit or spark-shell / zeppelin with
> newly initialized SparkContext.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2015-11-17 22:17 GMT+09:00 Ted Yu :
>
>> I am a bit curious:
>> Hbase depends on hdfs.
>> Has hdfs support for Mesos been fully implemented ?
>>
>> Last time I checked, there was still work to be done.
>>
>> Thanks
>>
>> On Nov 17, 2015, at 1:06 AM, 임정택  wrote:
>>
>> Oh, one thing I missed is, I built Spark 1.4.1 Cluster with 6 nodes of
>> Mesos 0.22.1 H/A (via ZK) cluster.
>>
>> 2015-11-17 18:01 GMT+09:00 임정택 :
>>
>>> Hi all,
>>>
>>> I'm evaluating zeppelin to run driver which interacts with HBase.
>>> I use fat jar to include HBase dependencies, and see failures on
>>> executor level.
>>> I thought it is zeppelin's issue, but it fails on spark-shell, too.
>>>
>>> I loaded fat jar via --jars option,
>>>
>>> > ./bin/spark-shell --jars hbase-included-assembled.jar
>>>
>>> and run driver code using provided SparkContext instance, and see
>>> failures from spark-shell console and executor logs.
>>>
>>> below is stack traces,
>>>
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 
>>> in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in stage 
>>> 0.0 (TID 281, ): java.lang.NoClassDefFoundError: Could not 
>>> initialize class org.apache.hadoop.hbase.client.HConnectionManager
>>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
>>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>>> at 
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
>>> at 
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>> at 
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>> at 
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
>>> at 
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:70)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> Driver stacktrace:
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
>>> at 
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>>> at scala.Option.foreach(Option.scala:236)
>>> at 
>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
>>> at 
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
>>> at 
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
>>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>>
>>>
>>> 15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 
>>> 14)
>>> java.lang.ExceptionInInitializerError
>>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
>>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>>> at 
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
>>> at 
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
>>> at 

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
Ted,

Thanks for the reply.

My fat jar has dependency with spark related library to only spark-core as
"provided".
Seems like Spark only adds 0.98.7-hadoop2 of hbase-common in spark-example
module.

And if there're two hbase-default.xml in the classpath, should one of them
be loaded, instead of showing (null)?

Best,
Jungtaek Lim (HeartSaVioR)



2015-11-18 13:50 GMT+09:00 Ted Yu :

> Looks like there're two hbase-default.xml in the classpath: one for 0.98.6
> and another for 0.98.7-hadoop2 (used by Spark)
>
> You can specify hbase.defaults.for.version.skip as true in your
> hbase-site.xml
>
> Cheers
>
> On Tue, Nov 17, 2015 at 1:01 AM, 임정택  wrote:
>
>> Hi all,
>>
>> I'm evaluating zeppelin to run driver which interacts with HBase.
>> I use fat jar to include HBase dependencies, and see failures on executor
>> level.
>> I thought it is zeppelin's issue, but it fails on spark-shell, too.
>>
>> I loaded fat jar via --jars option,
>>
>> > ./bin/spark-shell --jars hbase-included-assembled.jar
>>
>> and run driver code using provided SparkContext instance, and see
>> failures from spark-shell console and executor logs.
>>
>> below is stack traces,
>>
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 
>> in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in stage 
>> 0.0 (TID 281, ): java.lang.NoClassDefFoundError: Could not 
>> initialize class org.apache.hadoop.hbase.client.HConnectionManager
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>> at 
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
>> at 
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
>> at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:70)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Driver stacktrace:
>> at 
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
>> at 
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at 
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>> at scala.Option.foreach(Option.scala:236)
>> at 
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
>> at 
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
>> at 
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>
>>
>> 15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 
>> 14)
>> java.lang.ExceptionInInitializerError
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>> at 
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
>> at 
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> 

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
Oh, one thing I missed is, I built Spark 1.4.1 Cluster with 6 nodes of
Mesos 0.22.1 H/A (via ZK) cluster.

2015-11-17 18:01 GMT+09:00 임정택 :

> Hi all,
>
> I'm evaluating zeppelin to run driver which interacts with HBase.
> I use fat jar to include HBase dependencies, and see failures on executor
> level.
> I thought it is zeppelin's issue, but it fails on spark-shell, too.
>
> I loaded fat jar via --jars option,
>
> > ./bin/spark-shell --jars hbase-included-assembled.jar
>
> and run driver code using provided SparkContext instance, and see failures
> from spark-shell console and executor logs.
>
> below is stack traces,
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in stage 0.0 
> (TID 281, ): java.lang.NoClassDefFoundError: Could not 
> initialize class org.apache.hadoop.hbase.client.HConnectionManager
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
> at 
> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at scala.Option.foreach(Option.scala:236)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
>
> 15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 14)
> java.lang.ExceptionInInitializerError
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
> at 
> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at 

zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
Hi all,

I'm evaluating zeppelin to run driver which interacts with HBase.
I use fat jar to include HBase dependencies, and see failures on executor
level.
I thought it is zeppelin's issue, but it fails on spark-shell, too.

I loaded fat jar via --jars option,

> ./bin/spark-shell --jars hbase-included-assembled.jar

and run driver code using provided SparkContext instance, and see failures
from spark-shell console and executor logs.

below is stack traces,

org.apache.spark.SparkException: Job aborted due to stage failure:
Task 55 in stage 0.0 failed 4 times, most recent failure: Lost task
55.3 in stage 0.0 (TID 281, ):
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hadoop.hbase.client.HConnectionManager
at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
at 
org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)


15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 14)
java.lang.ExceptionInInitializerError
at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
at 
org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: hbase-default.xml file seems to
be for and old version of HBase (null), this version is
0.98.6-cdh5.2.0
at 
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73)
at 

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
I just make it work from both side (zeppelin, spark-shell) via initializing
another SparkContext and run.
But since it feels me as a workaround, so I'd love to get proper ways (or
more beautiful workarounds) to resolve this.
Please let me know if you have any suggestions.

Best,
Jungtaek Lim (HeartSaVioR)

2015-11-17 18:06 GMT+09:00 임정택 :

> Oh, one thing I missed is, I built Spark 1.4.1 Cluster with 6 nodes of
> Mesos 0.22.1 H/A (via ZK) cluster.
>
> 2015-11-17 18:01 GMT+09:00 임정택 :
>
>> Hi all,
>>
>> I'm evaluating zeppelin to run driver which interacts with HBase.
>> I use fat jar to include HBase dependencies, and see failures on executor
>> level.
>> I thought it is zeppelin's issue, but it fails on spark-shell, too.
>>
>> I loaded fat jar via --jars option,
>>
>> > ./bin/spark-shell --jars hbase-included-assembled.jar
>>
>> and run driver code using provided SparkContext instance, and see
>> failures from spark-shell console and executor logs.
>>
>> below is stack traces,
>>
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 
>> in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in stage 
>> 0.0 (TID 281, ): java.lang.NoClassDefFoundError: Could not 
>> initialize class org.apache.hadoop.hbase.client.HConnectionManager
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>> at 
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
>> at 
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
>> at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:70)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Driver stacktrace:
>> at 
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
>> at 
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at 
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>> at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>> at scala.Option.foreach(Option.scala:236)
>> at 
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
>> at 
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
>> at 
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>
>>
>> 15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 
>> 14)
>> java.lang.ExceptionInInitializerError
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>> at 
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
>> at 
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> at 
>>