unable to write data to hbase without any error

2016-07-13 Thread
Hello guys
I have a spark-sql app which writes some data to hbase, however this app hangs 
without any exception or error.
Here is my code:
//code base :https://hbase.apache.org/book.html#scala
val sparkMasterUrlDev = "spark://master60:7077"
val sparkMasterUrlLocal = "local[2]"

val sparkConf = new 
SparkConf().setAppName("HbaseConnector2").setMaster(sparkMasterUrlDev).set("spark.executor.memory",
 "10g")

val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
val hivetable = hc.sql("select * from house_id_city_pv_range")
hivetable.persist()

val c = new CalendarTool
val yesterday = c.getDate
val stringA = ""

hivetable.repartition(6).foreachPartition { y =>
  println("")
  val conf = new HBaseConfiguration()
  conf.set("hbase.zookeeper.quorum", "master60,slave61,slave62")
  conf.set("hbase.zookeeper.property.clientPort", "2181")
  conf.set("hbase.rootdir", "hdfs://master60:10001/hbase")
  val table = new HTable(conf, "id_pv")
  val connection = ConnectionFactory.createConnection(conf)
  val admin = connection.getAdmin()
  y.foreach { x =>
val rowkeyp1 = x.getInt(2).toString()
val rowkeyp2 = stringA.substring(0, 8 - rowkeyp1.length())
val rowkeyp3 = x.getInt(1).toString()
val rowkeyp4 = stringA.substring(0, 8 - rowkeyp3.length())
val rowkeyp5 = yesterday
val rowkey = rowkeyp1 + rowkeyp2 + rowkeyp3 + rowkeyp4 + rowkeyp5
val theput = new Put(Bytes.toBytes(yesterday))
theput.add(Bytes.toBytes("id"), Bytes.toBytes(x.getInt(0).toString()), 
Bytes.toBytes(x.getInt(0)))
theput.add(Bytes.toBytes("pv"), Bytes.toBytes(x.getInt(3).toString()), 
Bytes.toBytes(x.getInt(3)))
table.put(theput)
  }
}
Last 20 lines of My Spark APP log:
16/07/13 17:18:33 INFO DAGScheduler: looking for newly runnable stages
16/07/13 17:18:33 INFO DAGScheduler: running: Set()
16/07/13 17:18:33 INFO DAGScheduler: waiting: Set(ResultStage 1)
16/07/13 17:18:33 INFO DAGScheduler: failed: Set()
16/07/13 17:18:33 INFO DAGScheduler: Submitting ResultStage 1 
(MapPartitionsRDD[5] at foreachPartition at HbaseConnector2.scala:33), which 
has no missing parents
16/07/13 17:18:33 INFO MemoryStore: Block broadcast_2 stored as values in 
memory (estimated size 4.0 KB, free 101.0 KB)
16/07/13 17:18:33 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in 
memory (estimated size 2.3 KB, free 103.4 KB)
16/07/13 17:18:33 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 
10.0.10.60:48953 (size: 2.3 KB, free: 4.1 GB)
16/07/13 17:18:33 INFO SparkContext: Created broadcast 2 from broadcast at 
DAGScheduler.scala:1006
16/07/13 17:18:33 INFO DAGScheduler: Submitting 6 missing tasks from 
ResultStage 1 (MapPartitionsRDD[5] at foreachPartition at 
HbaseConnector2.scala:33)
16/07/13 17:18:33 INFO TaskSchedulerImpl: Adding task set 1.0 with 6 tasks
16/07/13 17:18:33 INFO FairSchedulableBuilder: Added task set TaskSet_1 tasks 
to pool default
16/07/13 17:18:33 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, 
slave62, partition 0,NODE_LOCAL, 2430 bytes)
16/07/13 17:18:33 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, 
slave62, partition 1,NODE_LOCAL, 2430 bytes)
16/07/13 17:18:33 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3, 
slave62, partition 2,NODE_LOCAL, 2430 bytes)
16/07/13 17:18:33 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 
slave62:38108 (size: 2.3 KB, free: 7.0 GB)
16/07/13 17:18:33 INFO MapOutputTrackerMasterEndpoint: Asked to send map output 
locations for shuffle 0 to slave62:52360
16/07/13 17:18:33 INFO MapOutputTrackerMaster: Size of output statuses for 
shuffle 0 is 138 bytes
16/07/13 17:18:36 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4, 
slave61, partition 3,ANY, 2430 bytes)
16/07/13 17:18:36 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5, 
master60, partition 4,ANY, 2430 bytes)
16/07/13 17:18:36 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 6, 
slave61, partition 5,ANY, 2430 bytes)
16/07/13 17:18:37 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 
master60:56270 (size: 2.3 KB, free: 7.0 GB)
16/07/13 17:18:37 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 
slave61:47971 (size: 2.3 KB, free: 7.0 GB)
16/07/13 17:18:38 INFO MapOutputTrackerMasterEndpoint: Asked to send map output 
locations for shuffle 0 to master60:33085
16/07/13 17:18:38 INFO MapOutputTrackerMasterEndpoint: Asked to send map output 
locations for shuffle 0 to slave61:36961

The log in regionserver:
2016-07-13 17:27:44,189 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: 
Stopping HBase metrics system...
2016-07-13 17:27:44,191 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase 
metrics system stopped.
2016-07-13 17:27:44,692 

答复: [Marketing Mail] Re: question about SparkSQL loading hbase tables

2016-07-03 Thread
Hi ted, yes I check hbase's github and the doc , but seems there isn't an 
available dependency for module hbase-spark.
I checked the http://mvnrepository.com/search?q=HBase-Spark, not result matches.
I also checked https://hbase.apache.org/book.html#spark> ,there is not 
dependency of hbase-spark or relative as well.
And 

-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2016年6月29日 11:24
收件人: user@hbase.apache.org
主题: [Marketing Mail] Re: question about SparkSQL loading hbase tables

There is no hbase release with full support for SparkSQL yet.
For #1, the classes / directories are (master branch):

./hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext
./hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext

hbase-spark/src/main/scala/org/apache/spark/sql/datasources/hbase/HBaseTableCatalog.scala

./hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/datasources/HBaseSparkConf.scala

For documentation, see HBASE-15473.


On Tue, Jun 28, 2016 at 7:13 PM, 罗辉 <luo...@ifeng.com> wrote:

> Hi there
>
>  I am using SparkSQL to read from hbase, however
>
> 1.   I find some API not available in my dependencies. Where to add
> them:
>
> org.apache.hadoop.hbase.spark.example.hbasecontext
>
> org.apache.spark.sql.datasources.hbase.HBaseTableCatalog
>
> org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf
>
> 2.   Is there a complete example code about how to use SparkSQL
> read/write from hbase?
>
> The document I refered is this:
> http://hbase.apache.org/book.html#_sparksql_dataframes. It seems that 
> this is a snapshot for 2.0, while I am using hbase 1.2.1 + spark1.6.1 
> + hadoop2.7.1.
>
>
>
> In my App, I want to load the entire hbase table into sparksql
>
> My code:
>
>
>
> import org.apache.spark._
>
> import org.apache.hadoop.hbase._
>
> import org.apache.hadoop.hbase.HBaseConfiguration
>
> import org.apache.hadoop.hbase.spark.example.hbasecontext
>
> import org.apache.spark.sql.datasources.hbase.HBaseTableCatalog
>
> import org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf
>
>
>
> object HbaseConnector {
>
>   def main(args: Array[String]) {
>
> val tableName = args(0)
>
> val sparkMasterUrlDev = "spark:// hadoopmaster:7077"
>
> val sparkMasterUrlLocal = "local[2]"
>
>
>
> val sparkConf = new SparkConf().setAppName("HbaseConnector for table "
> + tableName).setMaster(sparkMasterUrlDev).set("spark.executor.memory",
> "10g")
>
> val sc = new SparkContext(sparkConf)
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
> val conf = new HBaseConfiguration()
>
> conf.set("hbase.zookeeper.quorum", "z1,z2,z3")
>
> conf.set("hbase.zookeeper.property.clientPort", "2181")
>
> conf.set("hbase.rootdir", "hdfs://hadoopmaster:8020/hbase")
>
> //val hbaseContext = new HBaseContext(sc, conf)
>
>
>
> val pv = 
> sqlContext.read.options(Map(HBaseTableCatalog.tableCatalog -> 
> writeCatalog, HBaseSparkConf.TIMESTAMP -> tsSpecified.toString))
>
>   .format("org.apache.hadoop.hbase.spark")
>
>   .load()
>
> pv.write.saveAsTable(tableName)
>
>
>
>   }
>
>
>
> }
>
>
>
> My POM file is attached as well.
>
>
>
> Thanks for a help.
>
>
>
> San.Luo
>


question about SparkSQL loading hbase tables

2016-06-28 Thread
Hi there
 I am using SparkSQL to read from hbase, however

1.   I find some API not available in my dependencies. Where to add them:

org.apache.hadoop.hbase.spark.example.hbasecontext

org.apache.spark.sql.datasources.hbase.HBaseTableCatalog

org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf

2.   Is there a complete example code about how to use SparkSQL read/write 
from hbase?

The document I refered is this: 
http://hbase.apache.org/book.html#_sparksql_dataframes. It seems that this is a 
snapshot for 2.0, while I am using hbase 1.2.1 + spark1.6.1 + hadoop2.7.1.



In my App, I want to load the entire hbase table into sparksql
My code:

import org.apache.spark._
import org.apache.hadoop.hbase._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.spark.example.hbasecontext
import org.apache.spark.sql.datasources.hbase.HBaseTableCatalog
import org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf

object HbaseConnector {
  def main(args: Array[String]) {
val tableName = args(0)
val sparkMasterUrlDev = "spark:// hadoopmaster:7077"
val sparkMasterUrlLocal = "local[2]"

val sparkConf = new SparkConf().setAppName("HbaseConnector for table " + 
tableName).setMaster(sparkMasterUrlDev).set("spark.executor.memory", "10g")
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val conf = new HBaseConfiguration()
conf.set("hbase.zookeeper.quorum", "z1,z2,z3")
conf.set("hbase.zookeeper.property.clientPort", "2181")
conf.set("hbase.rootdir", "hdfs://hadoopmaster:8020/hbase")
//val hbaseContext = new HBaseContext(sc, conf)

val pv = sqlContext.read.options(Map(HBaseTableCatalog.tableCatalog -> 
writeCatalog, HBaseSparkConf.TIMESTAMP -> tsSpecified.toString))
  .format("org.apache.hadoop.hbase.spark")
  .load()
pv.write.saveAsTable(tableName)

  }

}

My POM file is attached as well.

Thanks for a help.

San.Luo


pom.xml
Description: pom.xml