unsubscribe
unsubscribe
Unsubscribe
Unsubscribe
Unsubscribe
HBase connector does not read ZK configuration from Spark session
I am trying to write a Spark program that reads data from HBase and store it in DataFrame. I am able to run it perfectly with hbase-site.xml in the $SPARK_HOME/conf folder, but I am facing few issues here. Issue 1 The first issue is passing hbase-site.xml location with the --files parameter submitted through client mode (it works in cluster mode). When I removed hbase-site.xml from $SPARK_HOME/conf and tried to execute it in client mode by passing with the --files parameter over YARN I keep getting the an exception (which I think means it is not taking the ZooKeeper configuration from hbase-site.xml. spark-submit \ --master yarn \ --deploy-mode client \ --files /home/siddesh/hbase-site.xml \ --class com.orzota.rs.json.HbaseConnector \ --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 \ --repositories http://repo.hortonworks.com/content/groups/public/ \ target/scala-2.11/test-0.1-SNAPSHOT.jar at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) 18/02/22 01:43:09 INFO ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 18/02/22 01:43:09 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) However it works good when I run it in cluster mode. Issue 2 Passing the HBase configuration details through the Spark session, which I can't get to work in both client and cluster mode. Instead of passing the entire hbase-site.xml I am trying to add the configuration directly in the code by adding it as a configuration parameter in the SparkSession, e.g.: val spark = SparkSession .builder() .appName(name) .config("hbase.zookeeper.property.clientPort", "2181") .config("hbase.zookeeper.quorum", "ip1,ip2,ip3") .config("spark.hbase.host","zookeeperquorum") .getOrCreate() val json_df = spark.read.option("catalog",catalog_read). format("org.apache.spark.sql.execution.datasources.hbase"). load() This is not working in cluster mode either. Can anyone help me with a solution or explanation why this is happening are there any workarounds?
Hortonworks Spark-Hbase-Connector does not read zookeeper configurations from spark session config ??(Spark on Yarn)
Hi I am trying to write a spark code that reads data from Hbase and store it in DataFrame. I am able to run it perfectly with hbase-site.xml in $spark-home/conf folder. But I am facing few issues Here. Issue 1: Passing hbase-site.xml location with --file parameter submitted through client mode(It is working in cluster mode) When I removed hbase-site.xml from spark/conf and try to execute it in the client mode by passing with file --file parameter over yarn I keep getting the following exception. Which I think it means it is not taking the zookeeper configuration from hbase-site.xml. How ever it works good when i run it in cluster mode. sample command spark-submit --master yarn --deploy-mode cluster --files /home/siddesh/hbase-site.xml --class com.orzota.rs.json.HbaseConnector --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ target/scala-2.11/test-0.1-SNAPSHOT.jar at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) 18/02/22 01:43:09 INFO ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 18/02/22 01:43:09 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) Issue 2: Passing hbase configuration details through spark session(Not working in cluster as well as client mode). Instead of passing the entire hbase-site.xml I am trying to add the configuration directly in the spark code by adding it as a config parameter in spark session the following is a sample spark-session command. val spark = SparkSession .builder() .appName(name) .config("hbase.zookeeper.property.clientPort", "2181") .config("hbase.zookeeper.quorum", "ip1,ip2,ip3") .config("spark.hbase.host","zookeeperquorum") .getOrCreate() val json_df = spark.read.option("catalog",catalog_read).format("org.apache.spark.sql.execution.datasources.hbase").load() But it is not working in the cluster mode while the issue-1 continues in the client mode. Can anyone help me with a solution or explanation why this is happening are there any work arounds ??. regards Sid
Reg:Reading a csv file with String label into labelepoint
Hi I am trying to read a csv with few double attributes and String Label . How can i convert it to labelpoint RDD so that i can run it with spark mllib classification algorithms. I have tried The LabelPoint Constructor (is available only for Regression ) but it accepts only double format label. Is there any other way to point out the string label and convert it into RDD Regards Siddesh