I want to read and write data from hbase using pyspark. I am getting below
error plz help

My code

from pyspark import SparkContext, SQLContext
sc = SparkContext()
sqlContext = SQLContext(sc)


sparkconf = {
    "hbase.zookeeper.quorum": "localhost",
    "hbase.mapreduce.inputtable": "test"
}
keyConv = "org.apache.spark.examples.pythonconverters.
ImmutableBytesWritableToStringConverter"
valueConv = "org.apache.spark.examples.pythonconverters.
HBaseResultToStringConverter"

hbase_rdd = sc.newAPIHadoopRDD(
    "org.apache.hadoop.hbase.mapreduce.TableInputFormat",
    "org.apache.hadoop.hbase.io.ImmutableBytesWritable",
    "org.apache.hadoop.hbase.client.Result",
    keyConverter=keyConv,
    valueConverter=valueConv,
    conf=sparkconf)


This raise error

Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.io.IOException: No table was provided.
        at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:130)
        at 
org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:121)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
        at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1284)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
        at org.apache.spark.rdd.RDD.take(RDD.scala:1279)
        at 
org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:203)
        at 
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:582)
        at 
org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:211)
        at java.lang.Thread.run(Thread.java:745)


-- 
Md Mehrab Alam

https://www.linkedin.com/in/iammehrabalam
https://github.com/iammehrabalam

Reply via email to