Interesting. I will watching your PR.
On Wed, Nov 18, 2015 at 7:51 AM, 임정택 <kabh...@gmail.com> wrote: > Ted, > > I suspect I hit the issue > https://issues.apache.org/jira/browse/SPARK-11818 > Could you refer the issue and verify that it makes sense? > > Thanks, > Jungtaek Lim (HeartSaVioR) > > 2015-11-18 20:32 GMT+09:00 Ted Yu <yuzhih...@gmail.com>: > >> Here is related code: >> >> private static void checkDefaultsVersion(Configuration conf) { >> >> if (conf.getBoolean("hbase.defaults.for.version.skip", Boolean.FALSE)) >> return; >> >> String defaultsVersion = conf.get("hbase.defaults.for.version"); >> >> String thisVersion = VersionInfo.getVersion(); >> >> if (!thisVersion.equals(defaultsVersion)) { >> >> throw new RuntimeException( >> >> "hbase-default.xml file seems to be for an older version of >> HBase (" + >> >> defaultsVersion + "), this version is " + thisVersion); >> >> null means that "hbase.defaults.for.version" was not set in the other >> hbase-default.xml >> >> Can you retrieve the classpath of Spark task so that we can have more >> clue ? >> >> >> Cheers >> >> On Tue, Nov 17, 2015 at 10:06 PM, 임정택 <kabh...@gmail.com> wrote: >> >>> Ted, >>> >>> Thanks for the reply. >>> >>> My fat jar has dependency with spark related library to only spark-core >>> as "provided". >>> Seems like Spark only adds 0.98.7-hadoop2 of hbase-common in >>> spark-example module. >>> >>> And if there're two hbase-default.xml in the classpath, should one of >>> them be loaded, instead of showing (null)? >>> >>> Best, >>> Jungtaek Lim (HeartSaVioR) >>> >>> >>> >>> 2015-11-18 13:50 GMT+09:00 Ted Yu <yuzhih...@gmail.com>: >>> >>>> Looks like there're two hbase-default.xml in the classpath: one for 0.98.6 >>>> and another for 0.98.7-hadoop2 (used by Spark) >>>> >>>> You can specify hbase.defaults.for.version.skip as true in your >>>> hbase-site.xml >>>> >>>> Cheers >>>> >>>> On Tue, Nov 17, 2015 at 1:01 AM, 임정택 <kabh...@gmail.com> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I'm evaluating zeppelin to run driver which interacts with HBase. >>>>> I use fat jar to include HBase dependencies, and see failures on >>>>> executor level. >>>>> I thought it is zeppelin's issue, but it fails on spark-shell, too. >>>>> >>>>> I loaded fat jar via --jars option, >>>>> >>>>> > ./bin/spark-shell --jars hbase-included-assembled.jar >>>>> >>>>> and run driver code using provided SparkContext instance, and see >>>>> failures from spark-shell console and executor logs. >>>>> >>>>> below is stack traces, >>>>> >>>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task >>>>> 55 in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in >>>>> stage 0.0 (TID 281, <svr hostname>): java.lang.NoClassDefFoundError: >>>>> Could not initialize class >>>>> org.apache.hadoop.hbase.client.HConnectionManager >>>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:197) >>>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159) >>>>> at >>>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101) >>>>> at >>>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:128) >>>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) >>>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) >>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>>> at >>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>>> at >>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) >>>>> at >>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>>>> at >>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> >>>>> Driver stacktrace: >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) >>>>> at >>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >>>>> at scala.Option.foreach(Option.scala:236) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) >>>>> at >>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) >>>>> at >>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) >>>>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >>>>> >>>>> >>>>> 15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 >>>>> (TID 14) >>>>> java.lang.ExceptionInInitializerError >>>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:197) >>>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159) >>>>> at >>>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101) >>>>> at >>>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:128) >>>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) >>>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) >>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>>> at >>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>>> at >>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) >>>>> at >>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>>>> at >>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be >>>>> for and old version of HBase (null), this version is 0.98.6-cdh5.2.0 >>>>> at >>>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73) >>>>> at >>>>> org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:105) >>>>> at >>>>> org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:116) >>>>> at >>>>> org.apache.hadoop.hbase.client.HConnectionManager.<clinit>(HConnectionManager.java:222) >>>>> ... 18 more >>>>> >>>>> >>>>> Please note that it runs smoothly on spark-submit. >>>>> >>>>> Btw, if issue is that hbase-default.xml is not properly loaded (maybe >>>>> because of classloader), it seems to run properly on driver level. >>>>> >>>>> import org.apache.hadoop.hbase.HBaseConfiguration >>>>> val conf = HBaseConfiguration.create() >>>>> println(conf.get("hbase.defaults.for.version")) >>>>> >>>>> It prints "0.98.6-cdh5.2.0". >>>>> >>>>> I'm using Spark-1.4.1-hadoop-2.4-bin, and zeppelin 0.5.5, and HBase >>>>> 0.98.6-CDH5.2.0. >>>>> >>>>> Thanks in advance! >>>>> >>>>> Best, >>>>> Jungtaek Lim (HeartSaVioR) >>>>> >>>> >>>> >>> >>> >>> -- >>> Name : 임 정택 >>> Blog : http://www.heartsavior.net / http://dev.heartsavior.net >>> Twitter : http://twitter.com/heartsavior >>> LinkedIn : http://www.linkedin.com/in/heartsavior >>> >> >> > > > -- > Name : 임 정택 > Blog : http://www.heartsavior.net / http://dev.heartsavior.net > Twitter : http://twitter.com/heartsavior > LinkedIn : http://www.linkedin.com/in/heartsavior >