unsubscribe

2023-11-30 Thread Dharmin Siddesh J
unsubscribe


Unsubscribe

2021-07-07 Thread Dharmin Siddesh J



Unsubscribe

2021-01-26 Thread Dharmin Siddesh J
Unsubscribe


HBase connector does not read ZK configuration from Spark session

2018-02-22 Thread Dharmin Siddesh J
I am trying to write a Spark program that reads data from HBase and store
it in DataFrame.

I am able to run it perfectly with hbase-site.xml in the $SPARK_HOME/conf
folder, but I am facing few issues here.

Issue 1

The first issue is passing hbase-site.xml location with the --files
parameter submitted through client mode (it works in cluster mode).


When I removed hbase-site.xml from $SPARK_HOME/conf and tried to execute it
in client mode by passing with the --files parameter over YARN I keep
getting the an exception (which I think means it is not taking the
ZooKeeper configuration from hbase-site.xml.

spark-submit \

  --master yarn \

  --deploy-mode client \

  --files /home/siddesh/hbase-site.xml \

  --class com.orzota.rs.json.HbaseConnector \

  --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 \

  --repositories http://repo.hortonworks.com/content/groups/public/ \

  target/scala-2.11/test-0.1-SNAPSHOT.jar

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)

18/02/22 01:43:09 INFO ClientCnxn: Opening socket connection to server
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL
(unknown error)

18/02/22 01:43:09 WARN ClientCnxn: Session 0x0 for server null, unexpected
error, closing socket connection and attempting reconnect

java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)

However it works good when I run it in cluster mode.


Issue 2

Passing the HBase configuration details through the Spark session, which I
can't get to work in both client and cluster mode.


Instead of passing the entire hbase-site.xml I am trying to add the
configuration directly in the code by adding it as a configuration
parameter in the SparkSession, e.g.:


val spark = SparkSession

  .builder()

  .appName(name)

  .config("hbase.zookeeper.property.clientPort", "2181")

  .config("hbase.zookeeper.quorum", "ip1,ip2,ip3")

  .config("spark.hbase.host","zookeeperquorum")

  .getOrCreate()


val json_df =

  spark.read.option("catalog",catalog_read).

  format("org.apache.spark.sql.execution.datasources.hbase").

  load()

This is not working in cluster mode either.


Can anyone help me with a solution or explanation why this is happening are
there any workarounds?


Hortonworks Spark-Hbase-Connector does not read zookeeper configurations from spark session config ??(Spark on Yarn)

2018-02-22 Thread Dharmin Siddesh J
Hi

I am trying to write a spark code that reads data from Hbase and store it
in DataFrame.
I am able to run it perfectly with hbase-site.xml in $spark-home/conf
folder.
But I am facing few issues Here.

Issue 1: Passing hbase-site.xml location with --file parameter submitted
through client mode(It is working in cluster mode)

When I  removed hbase-site.xml from spark/conf and try to execute it in the
client mode by passing with file --file parameter over yarn I keep getting
the following exception. Which I think it means it is not taking the
zookeeper configuration from hbase-site.xml. How ever it works good when
i run it in cluster mode.
sample command

spark-submit --master yarn --deploy-mode cluster --files
/home/siddesh/hbase-site.xml --class com.orzota.rs.json.HbaseConnector
 --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 --repositories
http://repo.hortonworks.com/content/groups/public/
target/scala-2.11/test-0.1-SNAPSHOT.jar

at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
18/02/22 01:43:09 INFO ClientCnxn: Opening socket connection to server
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL
(unknown error)
18/02/22 01:43:09 WARN ClientCnxn: Session 0x0 for server null, unexpected
error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)

Issue 2: Passing hbase configuration details through spark session(Not
working in cluster as well as client mode).
Instead of passing the entire hbase-site.xml I am trying to add the
configuration directly in the spark code by adding it as a config parameter
in spark session the following is a sample spark-session command.

val spark = SparkSession
.builder()
.appName(name)

.config("hbase.zookeeper.property.clientPort", "2181")
.config("hbase.zookeeper.quorum",
"ip1,ip2,ip3")

.config("spark.hbase.host","zookeeperquorum")
.getOrCreate()

val json_df =
spark.read.option("catalog",catalog_read).format("org.apache.spark.sql.execution.datasources.hbase").load()

But it is not working in the cluster mode while the issue-1 continues in
the client mode.

Can anyone help me with a solution or explanation why this is happening are
there any work arounds ??.

regards
Sid


Reg:Reading a csv file with String label into labelepoint

2016-03-15 Thread Dharmin Siddesh J
Hi

I am trying to read a csv with few double attributes and String Label . How
can i convert it to labelpoint RDD so that i can run it with spark mllib
classification algorithms.

I have tried
The LabelPoint Constructor (is available only for Regression ) but it
accepts only double format label. Is there any other way to point out the
string label and convert it into RDD

Regards
Siddesh