Hello guys I have a spark-sql app which writes some data to hbase, however this app hangs without any exception or error. Here is my code: //code base :https://hbase.apache.org/book.html#scala val sparkMasterUrlDev = "spark://master60:7077" val sparkMasterUrlLocal = "local[2]"
val sparkConf = new SparkConf().setAppName("HbaseConnector2").setMaster(sparkMasterUrlDev).set("spark.executor.memory", "10g") val sc = new SparkContext(sparkConf) val sqlContext = new org.apache.spark.sql.SQLContext(sc) val hc = new org.apache.spark.sql.hive.HiveContext(sc) val hivetable = hc.sql("select * from house_id_city_pv_range") hivetable.persist() val c = new CalendarTool val yesterday = c.getDate val stringA = "aaaaaaaa" hivetable.repartition(6).foreachPartition { y => println("================================================") val conf = new HBaseConfiguration() conf.set("hbase.zookeeper.quorum", "master60,slave61,slave62") conf.set("hbase.zookeeper.property.clientPort", "2181") conf.set("hbase.rootdir", "hdfs://master60:10001/hbase") val table = new HTable(conf, "id_pv") val connection = ConnectionFactory.createConnection(conf) val admin = connection.getAdmin() y.foreach { x => val rowkeyp1 = x.getInt(2).toString() val rowkeyp2 = stringA.substring(0, 8 - rowkeyp1.length()) val rowkeyp3 = x.getInt(1).toString() val rowkeyp4 = stringA.substring(0, 8 - rowkeyp3.length()) val rowkeyp5 = yesterday val rowkey = rowkeyp1 + rowkeyp2 + rowkeyp3 + rowkeyp4 + rowkeyp5 val theput = new Put(Bytes.toBytes(yesterday)) theput.add(Bytes.toBytes("id"), Bytes.toBytes(x.getInt(0).toString()), Bytes.toBytes(x.getInt(0))) theput.add(Bytes.toBytes("pv"), Bytes.toBytes(x.getInt(3).toString()), Bytes.toBytes(x.getInt(3))) table.put(theput) } } Last 20 lines of My Spark APP log: 16/07/13 17:18:33 INFO DAGScheduler: looking for newly runnable stages 16/07/13 17:18:33 INFO DAGScheduler: running: Set() 16/07/13 17:18:33 INFO DAGScheduler: waiting: Set(ResultStage 1) 16/07/13 17:18:33 INFO DAGScheduler: failed: Set() 16/07/13 17:18:33 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[5] at foreachPartition at HbaseConnector2.scala:33), which has no missing parents 16/07/13 17:18:33 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.0 KB, free 101.0 KB) 16/07/13 17:18:33 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.3 KB, free 103.4 KB) 16/07/13 17:18:33 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.0.10.60:48953 (size: 2.3 KB, free: 4.1 GB) 16/07/13 17:18:33 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006 16/07/13 17:18:33 INFO DAGScheduler: Submitting 6 missing tasks from ResultStage 1 (MapPartitionsRDD[5] at foreachPartition at HbaseConnector2.scala:33) 16/07/13 17:18:33 INFO TaskSchedulerImpl: Adding task set 1.0 with 6 tasks 16/07/13 17:18:33 INFO FairSchedulableBuilder: Added task set TaskSet_1 tasks to pool default 16/07/13 17:18:33 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, slave62, partition 0,NODE_LOCAL, 2430 bytes) 16/07/13 17:18:33 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, slave62, partition 1,NODE_LOCAL, 2430 bytes) 16/07/13 17:18:33 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3, slave62, partition 2,NODE_LOCAL, 2430 bytes) 16/07/13 17:18:33 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on slave62:38108 (size: 2.3 KB, free: 7.0 GB) 16/07/13 17:18:33 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to slave62:52360 16/07/13 17:18:33 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 138 bytes 16/07/13 17:18:36 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4, slave61, partition 3,ANY, 2430 bytes) 16/07/13 17:18:36 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5, master60, partition 4,ANY, 2430 bytes) 16/07/13 17:18:36 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 6, slave61, partition 5,ANY, 2430 bytes) 16/07/13 17:18:37 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on master60:56270 (size: 2.3 KB, free: 7.0 GB) 16/07/13 17:18:37 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on slave61:47971 (size: 2.3 KB, free: 7.0 GB) 16/07/13 17:18:38 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to master60:33085 16/07/13 17:18:38 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to slave61:36961 The log in regionserver: 2016-07-13 17:27:44,189 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: Stopping HBase metrics system... 2016-07-13 17:27:44,191 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system stopped. 2016-07-13 17:27:44,692 INFO [HBase-Metrics2-1] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties 2016-07-13 17:27:44,694 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2016-07-13 17:27:44,694 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system started 2016-07-13 17:32:34,740 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.59 MB, freeSize=6.22 GB, max=6.23 GB, blockCount=8, accesses=205, hits=191, hitRatio=93.17%, , cachingAccesses=202, cachingHits=191, cachingHitsRatio=94.55%, evictions=89, evicted=3, evictedPerRun=0.033707864582538605 2016-07-13 17:32:44,189 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: Stopping HBase metrics system... 2016-07-13 17:32:44,191 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system stopped. 2016-07-13 17:32:44,692 INFO [HBase-Metrics2-1] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties 2016-07-13 17:32:44,694 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2016-07-13 17:32:44,694 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system started 2016-07-13 17:37:34,740 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.59 MB, freeSize=6.22 GB, max=6.23 GB, blockCount=8, accesses=268, hits=254, hitRatio=94.78%, , cachingAccesses=265, cachingHits=254, cachingHitsRatio=95.85%, evictions=119, evicted=3, evictedPerRun=0.02521008439362049 2016-07-13 17:37:44,190 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: Stopping HBase metrics system... 2016-07-13 17:37:44,191 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system stopped. 2016-07-13 17:37:44,692 INFO [HBase-Metrics2-1] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties 2016-07-13 17:37:44,693 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2016-07-13 17:37:44,694 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system started 2016-07-13 17:42:34,740 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=6.59 MB, freeSize=6.22 GB, max=6.23 GB, blockCount=8, accesses=331, hits=317, hitRatio=95.77%, , cachingAccesses=328, cachingHits=317, cachingHitsRatio=96.65%, evictions=149, evicted=3, evictedPerRun=0.020134227350354195 2016-07-13 17:42:44,190 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: Stopping HBase metrics system... 2016-07-13 17:42:44,191 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system stopped. 2016-07-13 17:42:44,692 INFO [HBase-Metrics2-1] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties 2016-07-13 17:42:44,694 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2016-07-13 17:42:44,694 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system started I see no exception nor errors in hmaster and rs logs, only upper metics system info. So I also tried to disable metrics by comment out any uncommented lines in $HBASE_HOME/conf/hadoop-metrics2-hbase.properties. However no improvement is captured. ZK didn't append any new logs after I submit this APP. So any idea is welcome and appreciated. San.Luo