I wrote a simple program to read data from HBase, the program works find in Cloudera backed by HDFS. The program works fine on SPARK RUNTIME 1.6 on Cloudera. But does NOT work on EMR with Spark Runtime 2.2.1.
But getting an exception while testing data on EMR with S3. // Spark conf SparkConf sparkConf = new SparkConf().setMaster("local[4]").setAppName("My App"); JavaSparkContext jsc = new JavaSparkContext(sparkConf); // Hbase conf Configuration conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.quorum","localhost"); conf.set("hbase.zookeeper.property.client.port","2181"); // Submit scan into hbase conf // conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(scan)); conf.set(TableInputFormat.INPUT_TABLE, "mytable"); conf.set(TableInputFormat.SCAN_ROW_START, "startrow"); conf.set(TableInputFormat.SCAN_ROW_STOP, "endrow"); // Get RDD JavaPairRDD<ImmutableBytesWritable, Result> source = jsc .newAPIHadoopRDD(conf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); // Process RDD System.out.println("&&&&&&&&&&&&&&&&&&&&&&& " + source.count()); 0 down vote favorite I wrote a simple program to read data from HBase, the program works find in Cloudera backed by HDFS. But getting an exception while testing data on EMR with S3. // Spark conf SparkConf sparkConf = new SparkConf().setMaster("local[4]").setAppName("My App"); JavaSparkContext jsc = new JavaSparkContext(sparkConf); // Hbase conf Configuration conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.quorum","localhost"); conf.set("hbase.zookeeper.property.client.port","2181"); // Submit scan into hbase conf // conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(scan)); conf.set(TableInputFormat.INPUT_TABLE, "mytable"); conf.set(TableInputFormat.SCAN_ROW_START, "startrow"); conf.set(TableInputFormat.SCAN_ROW_STOP, "endrow"); // Get RDD JavaPairRDD<ImmutableBytesWritable, Result> source = jsc .newAPIHadoopRDD(conf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); // Process RDD System.out.println("&&&&&&&&&&&&&&&&&&&&&&& " + source.count()); 18/05/04 00:22:02 INFO MetricRegistries: Loaded MetricRegistries class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl 18/05/04 00:22:02 ERROR TableInputFormat: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238) Caused by: java.lang.IllegalAccessError: tried to access class org.apache.hadoop.metrics2.lib.MetricsInfoImpl from class org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry at org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry.newGauge(DynamicMetricsRegistry.java:139) at org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSourceImpl.(MetricsZooKeeperSourceImpl.java:59) at org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSourceImpl.(MetricsZooKeeperSourceImpl.java:51) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) ... 42 more Exception in thread "main" java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details. at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:270) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:256) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2094) at org.apache.spark.rdd.RDD.count(RDD.scala:1158) at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:455) at org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45) at HbaseScan.main(HbaseScan.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeTable either in your constructor or initialize method at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:652) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:265) ... 20 more With ALL APACHE HBASE LIBS: e.hadoop.hbase.metrics.impl.MetricRegistriesImpl 18/05/04 04:05:54 ERROR TableInputFormat: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.initialize(TableInputFormat.java:202) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:259) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:256) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2094) at org.apache.spark.rdd.RDD.count(RDD.scala:1158) at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:455) at org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45) at HbaseScan.main(HbaseScan.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238) ... 24 more Caused by: java.lang.RuntimeException: Could not create interface org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSource Is the hadoop compatibility jar on the classpath? at org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:75) at org.apache.hadoop.hbase.zookeeper.MetricsZooKeeper.(MetricsZooKeeper.java:38) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.(RecoverableZooKeeper.java:130) at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:143) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:181) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:155) at org.apache.hadoop.hbase.client.ZooKeeperKeepAliveConnection.(ZooKeeperKeepAliveConnection.java:43) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveZooKeeperWatcher(ConnectionManager.java:1737) -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org