unable to write data to hbase without any error
Hello guys I have a spark-sql app which writes some data to hbase, however this app hangs without any exception or error. Here is my code: //code base :https://hbase.apache.org/book.html#scala val sparkMasterUrlDev = "spark://master60:7077" val sparkMasterUrlLocal = "local[2]" val sparkConf = new SparkConf().setAppName("HbaseConnector2").setMaster(sparkMasterUrlDev).set("spark.executor.memory", "10g") val sc = new SparkContext(sparkConf) val sqlContext = new org.apache.spark.sql.SQLContext(sc) val hc = new org.apache.spark.sql.hive.HiveContext(sc) val hivetable = hc.sql("select * from house_id_city_pv_range") hivetable.persist() val c = new CalendarTool val yesterday = c.getDate val stringA = "" hivetable.repartition(6).foreachPartition { y => println("") val conf = new HBaseConfiguration() conf.set("hbase.zookeeper.quorum", "master60,slave61,slave62") conf.set("hbase.zookeeper.property.clientPort", "2181") conf.set("hbase.rootdir", "hdfs://master60:10001/hbase") val table = new HTable(conf, "id_pv") val connection = ConnectionFactory.createConnection(conf) val admin = connection.getAdmin() y.foreach { x => val rowkeyp1 = x.getInt(2).toString() val rowkeyp2 = stringA.substring(0, 8 - rowkeyp1.length()) val rowkeyp3 = x.getInt(1).toString() val rowkeyp4 = stringA.substring(0, 8 - rowkeyp3.length()) val rowkeyp5 = yesterday val rowkey = rowkeyp1 + rowkeyp2 + rowkeyp3 + rowkeyp4 + rowkeyp5 val theput = new Put(Bytes.toBytes(yesterday)) theput.add(Bytes.toBytes("id"), Bytes.toBytes(x.getInt(0).toString()), Bytes.toBytes(x.getInt(0))) theput.add(Bytes.toBytes("pv"), Bytes.toBytes(x.getInt(3).toString()), Bytes.toBytes(x.getInt(3))) table.put(theput) } } Last 20 lines of My Spark APP log: 16/07/13 17:18:33 INFO DAGScheduler: looking for newly runnable stages 16/07/13 17:18:33 INFO DAGScheduler: running: Set() 16/07/13 17:18:33 INFO DAGScheduler: waiting: Set(ResultStage 1) 16/07/13 17:18:33 INFO DAGScheduler: failed: Set() 16/07/13 17:18:33 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[5] at foreachPartition at HbaseConnector2.scala:33), which has no missing parents 16/07/13 17:18:33 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.0 KB, free 101.0 KB) 16/07/13 17:18:33 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.3 KB, free 103.4 KB) 16/07/13 17:18:33 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.0.10.60:48953 (size: 2.3 KB, free: 4.1 GB) 16/07/13 17:18:33 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006 16/07/13 17:18:33 INFO DAGScheduler: Submitting 6 missing tasks from ResultStage 1 (MapPartitionsRDD[5] at foreachPartition at HbaseConnector2.scala:33) 16/07/13 17:18:33 INFO TaskSchedulerImpl: Adding task set 1.0 with 6 tasks 16/07/13 17:18:33 INFO FairSchedulableBuilder: Added task set TaskSet_1 tasks to pool default 16/07/13 17:18:33 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, slave62, partition 0,NODE_LOCAL, 2430 bytes) 16/07/13 17:18:33 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, slave62, partition 1,NODE_LOCAL, 2430 bytes) 16/07/13 17:18:33 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3, slave62, partition 2,NODE_LOCAL, 2430 bytes) 16/07/13 17:18:33 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on slave62:38108 (size: 2.3 KB, free: 7.0 GB) 16/07/13 17:18:33 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to slave62:52360 16/07/13 17:18:33 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 138 bytes 16/07/13 17:18:36 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4, slave61, partition 3,ANY, 2430 bytes) 16/07/13 17:18:36 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5, master60, partition 4,ANY, 2430 bytes) 16/07/13 17:18:36 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 6, slave61, partition 5,ANY, 2430 bytes) 16/07/13 17:18:37 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on master60:56270 (size: 2.3 KB, free: 7.0 GB) 16/07/13 17:18:37 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on slave61:47971 (size: 2.3 KB, free: 7.0 GB) 16/07/13 17:18:38 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to master60:33085 16/07/13 17:18:38 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to slave61:36961 The log in regionserver: 2016-07-13 17:27:44,189 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: Stopping HBase metrics system... 2016-07-13 17:27:44,191 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system stopped. 2016-07-13 17:27:44,692
答复: [Marketing Mail] Re: question about SparkSQL loading hbase tables
Hi ted, yes I check hbase's github and the doc , but seems there isn't an available dependency for module hbase-spark. I checked the http://mvnrepository.com/search?q=HBase-Spark, not result matches. I also checked https://hbase.apache.org/book.html#spark> ,there is not dependency of hbase-spark or relative as well. And -邮件原件- 发件人: Ted Yu [mailto:yuzhih...@gmail.com] 发送时间: 2016年6月29日 11:24 收件人: user@hbase.apache.org 主题: [Marketing Mail] Re: question about SparkSQL loading hbase tables There is no hbase release with full support for SparkSQL yet. For #1, the classes / directories are (master branch): ./hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext ./hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext hbase-spark/src/main/scala/org/apache/spark/sql/datasources/hbase/HBaseTableCatalog.scala ./hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/datasources/HBaseSparkConf.scala For documentation, see HBASE-15473. On Tue, Jun 28, 2016 at 7:13 PM, 罗辉 <luo...@ifeng.com> wrote: > Hi there > > I am using SparkSQL to read from hbase, however > > 1. I find some API not available in my dependencies. Where to add > them: > > org.apache.hadoop.hbase.spark.example.hbasecontext > > org.apache.spark.sql.datasources.hbase.HBaseTableCatalog > > org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf > > 2. Is there a complete example code about how to use SparkSQL > read/write from hbase? > > The document I refered is this: > http://hbase.apache.org/book.html#_sparksql_dataframes. It seems that > this is a snapshot for 2.0, while I am using hbase 1.2.1 + spark1.6.1 > + hadoop2.7.1. > > > > In my App, I want to load the entire hbase table into sparksql > > My code: > > > > import org.apache.spark._ > > import org.apache.hadoop.hbase._ > > import org.apache.hadoop.hbase.HBaseConfiguration > > import org.apache.hadoop.hbase.spark.example.hbasecontext > > import org.apache.spark.sql.datasources.hbase.HBaseTableCatalog > > import org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf > > > > object HbaseConnector { > > def main(args: Array[String]) { > > val tableName = args(0) > > val sparkMasterUrlDev = "spark:// hadoopmaster:7077" > > val sparkMasterUrlLocal = "local[2]" > > > > val sparkConf = new SparkConf().setAppName("HbaseConnector for table " > + tableName).setMaster(sparkMasterUrlDev).set("spark.executor.memory", > "10g") > > val sc = new SparkContext(sparkConf) > > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > > val conf = new HBaseConfiguration() > > conf.set("hbase.zookeeper.quorum", "z1,z2,z3") > > conf.set("hbase.zookeeper.property.clientPort", "2181") > > conf.set("hbase.rootdir", "hdfs://hadoopmaster:8020/hbase") > > //val hbaseContext = new HBaseContext(sc, conf) > > > > val pv = > sqlContext.read.options(Map(HBaseTableCatalog.tableCatalog -> > writeCatalog, HBaseSparkConf.TIMESTAMP -> tsSpecified.toString)) > > .format("org.apache.hadoop.hbase.spark") > > .load() > > pv.write.saveAsTable(tableName) > > > > } > > > > } > > > > My POM file is attached as well. > > > > Thanks for a help. > > > > San.Luo >
question about SparkSQL loading hbase tables
Hi there I am using SparkSQL to read from hbase, however 1. I find some API not available in my dependencies. Where to add them: org.apache.hadoop.hbase.spark.example.hbasecontext org.apache.spark.sql.datasources.hbase.HBaseTableCatalog org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf 2. Is there a complete example code about how to use SparkSQL read/write from hbase? The document I refered is this: http://hbase.apache.org/book.html#_sparksql_dataframes. It seems that this is a snapshot for 2.0, while I am using hbase 1.2.1 + spark1.6.1 + hadoop2.7.1. In my App, I want to load the entire hbase table into sparksql My code: import org.apache.spark._ import org.apache.hadoop.hbase._ import org.apache.hadoop.hbase.HBaseConfiguration import org.apache.hadoop.hbase.spark.example.hbasecontext import org.apache.spark.sql.datasources.hbase.HBaseTableCatalog import org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf object HbaseConnector { def main(args: Array[String]) { val tableName = args(0) val sparkMasterUrlDev = "spark:// hadoopmaster:7077" val sparkMasterUrlLocal = "local[2]" val sparkConf = new SparkConf().setAppName("HbaseConnector for table " + tableName).setMaster(sparkMasterUrlDev).set("spark.executor.memory", "10g") val sc = new SparkContext(sparkConf) val sqlContext = new org.apache.spark.sql.SQLContext(sc) val conf = new HBaseConfiguration() conf.set("hbase.zookeeper.quorum", "z1,z2,z3") conf.set("hbase.zookeeper.property.clientPort", "2181") conf.set("hbase.rootdir", "hdfs://hadoopmaster:8020/hbase") //val hbaseContext = new HBaseContext(sc, conf) val pv = sqlContext.read.options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.TIMESTAMP -> tsSpecified.toString)) .format("org.apache.hadoop.hbase.spark") .load() pv.write.saveAsTable(tableName) } } My POM file is attached as well. Thanks for a help. San.Luo pom.xml Description: pom.xml