Re: Spark reading from HBase using hbase-connectors - any benefit from localization?

2023-01-06 Thread Aaron Grubb
e HDFS or HBase. Spark is used to process the data stored in such distributed systems. In case there is a spark application which is processing data stored in HDFS., for example PARQUET files on HDFS, Spark will attempt to place computation tasks alongside HDFS blocks. With HDFS the Spark driver co

Re: Spark reading from HBase using hbase-connectors - any benefit from localization?

2023-01-05 Thread Mich Talebzadeh
cality in simple terms means doing computation on the node where data resides. As you are already aware Spark is a cluster computing system. It is not a storage system like HDFS or HBase. Spark is used to process the data stored in such distributed systems. In case there is a spark application

Re: Spark reading from HBase using hbase-connectors - any benefit from localization?

2023-01-05 Thread Aaron Grubb
(3.3.4) - HBase RegionServer (2.4.15) - LLAP on YARN (3.1.3) So to answer your questions directly, putting Spark on the Hadoop nodes is the first idea that I had in order to colocate Spark with HBase for reads (HBase is sharing nodes with Hadoop to answer the second question). However, what currently

Re: Spark reading from HBase using hbase-connectors - any benefit from localization?

2023-01-05 Thread Mich Talebzadeh
Few questions - As I understand you already have a Hadoop cluster. Are you going to put your spark as Hadoopp nodes? - Where is your HBase cluster? Is it sharing nodes with Hadoop or has its own cluster I looked at that link and it does not say much. Essentially you want to use HBase

Spark reading from HBase using hbase-connectors - any benefit from localization?

2023-01-05 Thread Aaron Grubb
(cross-posting from the HBase user list as I didn't receive a reply there) Hello, I'm completely new to Spark and evaluating setting up a cluster either in YARN or standalone. Our idea for the general workflow is create a concatenated dataframe using historical pickle/parquet files (whichever

How can I read two different hbase cluster with kerberos

2021-08-22 Thread igyu
I have two hbase cluster and enable kerberos I want run saprk application at clusterA to read clusterB with kerberos in my code I add initKerberos functin like this sparkSession.sparkContext.addFile("hdfs://clusterA/krb5ClusterB.conf") sparkSession.sparkContext.addFile("

about spark on hbase problem

2021-08-17 Thread igyu
System.setProperty("java.security.krb5.conf", config.getJSONObject("auth").getString("krb5")) val conf = HBaseConfiguration.create() val zookeeper = config.getString("zookeeper") val port = config.getString("port") conf.set(HConstants.ZOOKEEPER_QUORUM, zookeeper)

spark hbase

2021-04-20 Thread KhajaAsmath Mohammed
Hi, I have tried multiple ways to use hbase-spark and none of them works as expected. SHC and hbase-spark library are loading all the data on executors and it is running for ever. https://ramottamado.dev/how-to-use-hbase-fuzzyrowfilter-in-spark/ Above link has the solution that I am looking

Re: Spark submit hbase issue

2021-04-14 Thread Mich Talebzadeh
Try adding hbase-site.xml file to %SPARK_HOME%\conf and see if it works HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any

Spark submit hbase issue

2021-04-14 Thread KhajaAsmath Mohammed
Hi, Spark submit is connecting to local host instead of zookeeper mentioned in hbase-site.xml. This same program works in ide which gets connected to hbase-site.xml. What am I missing in spark submit? > >  > spark-submit --driver-class-path > C:\Users\mdkha\bitbucket\clx-spark

Spark Hbase Hive error in EMR

2021-04-09 Thread KhajaAsmath Mohammed
Hi, I am trying to connect hbase which sits on top of hive as external table. I am getting below exception. Am I missing anything to pass here? 21/04/09 18:08:11 INFO ZooKeeper: Client environment:user.dir=/ 21/04/09 18:08:11 INFO ZooKeeper: Initiating client connection, connectString=localhost

Re: Spark RDD + HBase: adoption trend

2021-01-20 Thread Sean Owen
into that API for certain operations. If that's a connector to read data from HBase - you probably do want to return DataFrames ideally. Unless you're relying on very specific APIs from very specific versions, I wouldn't think a distro's Spark or HBase is much different? On Wed, Jan 20, 2021 at 7:44 AM Marco

Re: Spark RDD + HBase: adoption trend

2021-01-20 Thread Jacek Laskowski
Hi Marco, IMHO RDD is only for very sophisticated use cases that very few Spark devs would be capable of. I consider RDD API a sort of Spark assembler and most Spark devs should stick to Dataset API. Speaking of HBase, see https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master

Spark RDD + HBase: adoption trend

2021-01-20 Thread Marco Firrincieli
Hi, my name is Marco and I'm one of the developers behind  https://github.com/unicredit/hbase-rdd  a project we are currently reviewing for various reasons. We were basically wondering if RDD "is still a thing" nowadays (we see lots of usage for DataFrames or Datasets) and we're no

Re: how to integrate hbase and hive in spark3.0.1?

2021-01-09 Thread michael.yang
Hi all, We also encountered these exceptions when integrated Spark 3.0.1 with hive 2.1.1-cdh6.1.0 and hbase 2.1.0-cdh-6.1.0. Does anyone have some ideas to solve these exceptions? Thanks in advance. Best. Michael Yang -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com

Re: how to manage HBase connections in Executors of Spark Streaming ?

2020-11-25 Thread chen kevin
. best practices about how to manage Hbase connections with kerberos authentication, the demo.java is the code about how to get the hbase connection. From: big data Date: Tuesday, November 24, 2020 at 1:58 PM To: "user@spark.apache.org" Subject: how to manage HBase connections in

how to manage HBase connections in Executors of Spark Streaming ?

2020-11-23 Thread big data
Hi, Does any best practices about how to manage Hbase connections with kerberos authentication in Spark Streaming (YARN) environment? Want to now how executors manage the HBase connections,how to create them, close them and refresh Kerberos expires. Thanks.

how to integrate hbase and hive in spark3.0.1?

2020-09-18 Thread ??????
hello, I am using spark3.0.1, I want to integrate hive and hbase, but I don't know choose hive and hbase version, I had re-compiled spark source and installed spark3.0.1 with hive and Hadoop,but I encountered below the error, anyone who can help? root@namenode bin]# ./spark-sql 20/09/18 23

Re: Needed some best practices to integrate Spark with HBase

2020-07-20 Thread YogeshGovi
I also need good docs on this. Especially integrating pyspark with hive reading tables from hbase. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Error in using hbase-spark connector

2020-03-11 Thread PRAKASH GOPALSAMY
Hi Team, We are trying to read hbase table from spark using hbase-spark connector. But our job is failing in the pushdown part of the filter in stage 0, due the below error. kindly help us to resolve this issue. caused by : java.lang.NoClassDefFoundError: scala/collection/immutable/StringOps

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-23 Thread Jörn Franke
native_2.11-3.5.3.jar, \ > json4s-jackson_2.11-3.5.3.jar, \ > hbase-client-1.2.3.jar, \ > hbase-common-1.2.3.jar > > Now I still get the same error! > > scala> val df = withCatalog(catalog) > java.lang.NoSuchMethodError: > org.json4s.jackson.Jso

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-23 Thread Sean Busbey
Hi Mich! Please try to keep your thread on a single mailing list. It's much easier to have things show up on a new list if you give a brief summary of the discussion and a pointer to the original thread (lists.apache.org is great for this). It looks like you're using "SHC" aka the &q

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-23 Thread Mich Talebzadeh
.1-s_2.11.jar, \ > json4s-native_2.11-3.5.3.jar, \ > json4s-jackson_2.11-3.5.3.jar, \ > hbase-client-1.2.3.jar, \ > hbase-common-1.2.3.jar > > Now I still get the same error! > > scala> val df = withCatalog(ca

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh
I stripped everything from the jar list. This is all I have sspark-shell --jars shc-core-1.1.1-2.1-s_2.11.jar, \ json4s-native_2.11-3.5.3.jar, \ json4s-jackson_2.11-3.5.3.jar, \ hbase-client-1.2.3.jar, \ hbase-common-1.2.3.jar Now I still

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh
Let me check and confirm. regards, Mich On Mon, 17 Feb 2020 at 21:33, Jörn Franke wrote: > Is there a reason why different Scala (it seems at least 2.10/2.11) > versions are mixed? This never works. > Do you include by accident a dependency to with an old Scala version? Ie > th

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Jörn Franke
Is there a reason why different Scala (it seems at least 2.10/2.11) versions are mixed? This never works. Do you include by accident a dependency to with an old Scala version? Ie the Hbase datasource maybe? > Am 17.02.2020 um 22:15 schrieb Mich Talebzadeh : > >  > Thanks Muthu,

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Muthu Jayakumar
b 2020 at 20:28, Muthu Jayakumar wrote: > >> I suspect the spark job is somehow having an incorrect (newer) version of >> json4s in the classpath. json4s 3.5.3 is the utmost version that can be >> used. >> >> Thanks, >> Muthu >> >> On Mon, Feb 17, 2020

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh
wrote: > >> Hi, >> >> Spark version 2.4.3 >> Hbase 1.2.7 >> >> Data is stored in Hbase as Json. example of a row shown below >> [image: image.png] >> I am trying to read this table in Spark Scala >> >> import org.apache.spark.

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Muthu Jayakumar
I suspect the spark job is somehow having an incorrect (newer) version of json4s in the classpath. json4s 3.5.3 is the utmost version that can be used. Thanks, Muthu On Mon, Feb 17, 2020, 06:43 Mich Talebzadeh wrote: > Hi, > > Spark version 2.4.3 > Hbase 1.2.7 > > Data

Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh
Hi, Spark version 2.4.3 Hbase 1.2.7 Data is stored in Hbase as Json. example of a row shown below [image: image.png] I am trying to read this table in Spark Scala import org.apache.spark.sql.{SQLContext, _} import org.apache.spark.sql.execution.datasources.hbase._ import org.apache.spark

Re: Putting record in HBase with Spark - error get regions.

2019-05-28 Thread Guillermo Ortiz Fernández
> I'm executing a load process into HBase with spark. (around 150M record). > At the end of the process there are a lot of fail tasks. > > I get this error: > > 19/05/28 11:02:31 ERROR client.AsyncProcess: Failed to get region location > org.apache.hadoop.hbase.Table

Putting record in HBase with Spark - error get regions.

2019-05-28 Thread Guillermo Ortiz Fernández
I'm executing a load process into HBase with spark. (around 150M record). At the end of the process there are a lot of fail tasks. I get this error: 19/05/28 11:02:31 ERROR client.AsyncProcess: Failed to get region location org.apache.hadoop.hbase.TableNotFoundException: my_table

Re: Using newApiHadoopRDD for reading from HBase

2018-06-29 Thread Biplob Biswas
s data > from HBase. > > 1. Does it load all the data from a scan operation directly in memory? > 2. According to my understanding, the data is loaded from different > regions to different executors, is that assumption/understanding correct? > 3. If it does load all the data from t

Using newApiHadoopRDD for reading from HBase

2018-06-28 Thread Biplob Biswas
Hi, I had a few questions regarding the way *newApiHadoopRDD *accesses data from HBase. 1. Does it load all the data from a scan operation directly in memory? 2. According to my understanding, the data is loaded from different regions to different executors, is that assumption/understanding

Re: load hbase data using spark

2018-06-20 Thread vaquar khan
Why you need tool,you can directly connect Hbase using spark. Regards, Vaquar khan On Jun 18, 2018 4:37 PM, "Lian Jiang" wrote: Hi, I am considering tools to load hbase data using spark. One choice is https://github.com/Huawei-Spark/Spark-SQL-on-HBase. However, this seems to be o

load hbase data using spark

2018-06-18 Thread Lian Jiang
Hi, I am considering tools to load hbase data using spark. One choice is https://github.com/Huawei-Spark/Spark-SQL-on-HBase. However, this seems to be out-of-date (e.g. "This version of 1.0.0 requires Spark 1.4.0."). Which tool should I use for this purpose? Thanks for any hint.

Write data from Hbase using Spark Failing with NPE

2018-05-23 Thread Alchemist
aI am using Spark to write data to Hbase, I can read data just fine but write is failing with following exception. I found simila issue that got resolved by adding *site.xml and hbase JARs. But it is npot working for me.       JavaPairRDD<ImmutableBytesWritable, Put>  tab

Getting Data From Hbase using Spark is Extremely Slow

2018-05-17 Thread SparkUser6
I have written four lines of simple spark program to process data in Phoenix table: queryString = getQueryFullString( );// Get data from Phoenix table select col from table JavaPairRDD phRDD = jsc.newAPIHadoopRDD(

Spark with HBase on Spark Runtime 2.2.1

2018-05-05 Thread SparkUser6
I wrote a simple program to read data from HBase, the program works find in Cloudera backed by HDFS. The program works fine on SPARK RUNTIME 1.6 on Cloudera. But does NOT work on EMR with Spark Runtime 2.2.1. But getting an exception while testing data on EMR with S3. // Spark conf

NullPointerException when scanning HBase table

2018-04-30 Thread Huiliang Zhang
Hi, In my spark job, I need to scan HBase table. I set up a scan with custom filters. Then I use newAPIHadoopRDD function to get a JavaPairRDD variable X. The problem is when no records inside HBase matches my filters, the call X.isEmpty() or X.count() will cause

spark hbase connector

2018-04-17 Thread Lian Jiang
Hi, My spark jobs need to talk to hbase and I am not sure which spark hbase connector is recommended: https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/ https://phoenix.apache.org/phoenix_spark.html Or there is any other better solutions. Appreciate any guidance.

Re: HBase connector does not read ZK configuration from Spark session

2018-02-23 Thread Deepak Sharma
working directory but you still have to read it and use the properties to be set in conf. Thanks Deepak On Fri, Feb 23, 2018 at 10:25 AM, Dharmin Siddesh J < siddeshjdhar...@gmail.com> wrote: > I am trying to write a Spark program that reads data from HBase and store > it in DataFra

Re: HBase connector does not read ZK configuration from Spark session

2018-02-22 Thread Jorge Machado
Can it be that you are missing the HBASE_HOME var ? Jorge Machado > On 23 Feb 2018, at 04:55, Dharmin Siddesh J <siddeshjdhar...@gmail.com> wrote: > > I am trying to write a Spark program that reads data from HBase and store it > in DataFrame. > > I am

HBase connector does not read ZK configuration from Spark session

2018-02-22 Thread Dharmin Siddesh J
I am trying to write a Spark program that reads data from HBase and store it in DataFrame. I am able to run it perfectly with hbase-site.xml in the $SPARK_HOME/conf folder, but I am facing few issues here. Issue 1 The first issue is passing hbase-site.xml location with the --files parameter

Hortonworks Spark-Hbase-Connector does not read zookeeper configurations from spark session config ??(Spark on Yarn)

2018-02-22 Thread Dharmin Siddesh J
Hi I am trying to write a spark code that reads data from Hbase and store it in DataFrame. I am able to run it perfectly with hbase-site.xml in $spark-home/conf folder. But I am facing few issues Here. Issue 1: Passing hbase-site.xml location with --file parameter submitted through client mode

Re: Bulk load to HBase

2017-10-22 Thread Jörn Franke
better. BTW if you need to use Spark then go for 2.x - it is also available in HDP. > On 22. Oct 2017, at 10:20, Pradeep <pradeep.mi...@mail.com> wrote: > > We are on Hortonworks 2.5 and very soon upgrading to 2.6. Spark version 1.6.2. > > We have large volume of data that

Bulk load to HBase

2017-10-22 Thread Pradeep
We are on Hortonworks 2.5 and very soon upgrading to 2.6. Spark version 1.6.2. We have large volume of data that we bulk load to HBase using import tsv. Map Reduce job is very slow and looking for options we can use spark to improve performance. Please let me know if this can be optimized

Re: NullPointerException error while saving Scala Dataframe to HBase

2017-10-01 Thread Marco Mistroni
Hi The question is getting to the list. I have no experience in hbase ...though , having seen similar stuff when saving a df somewhere else...it might have to do with the properties you need to set to let spark know it is dealing with hbase? Don't u need to set some properties on the spark

Re: HDP 2.5 - Python - Spark-On-Hbase

2017-09-30 Thread Debabrata Ghosh
Ayan, Did you get to work the HBase Connection through Pyspark as well ? I have got the Spark - HBase connection working with Scala (via HBasecontext). However, but I eventually want to get this working within a Pyspark code - Would you have some suitable code snippets

Re: NullPointerException error while saving Scala Dataframe to HBase

2017-09-30 Thread mailfordebu
repeatedly hitting a NullPointerException > error while saving a Scala Dataframe to HBase. Please can you help resolving > this for me. Here is the code snippet: > > scala> def catalog = s"""{ > ||"table":{"namespace":"default&qu

NullPointerException error while saving Scala Dataframe to HBase

2017-09-30 Thread Debabrata Ghosh
Dear All, Greetings ! I am repeatedly hitting a NullPointerException error while saving a Scala Dataframe to HBase. Please can you help resolving this for me. Here is the code snippet: scala> def catalog = s"""{ ||"table":{"nam

Needed some best practices to integrate Spark with HBase

2017-09-29 Thread Debabrata Ghosh
Dear All, Greetings ! I needed some best practices for integrating Spark with HBase. Would you be able to point me to some useful resources / URL's to your convenience please. Thanks, Debu

kylin 2.1.1 for hbase 0.98

2017-09-04 Thread yuyong . zhai
how to build kylin(v2.1.0) Binary Package for hbase0.98?

Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-28 Thread ayan guha
Hi Thanks for all of you, I could get HBase connector working. there are still some details around namespace is pending, but overall it is working well. Now, as usual, I would like to use the same concept into Structured Streaming. Is there any similar way I can use writeStream.format and use

Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-26 Thread Weiqing Yang
gt;- The ability to configure closure serializer >>- HTTPBroadcast >>- TTL-based metadata cleaning >>- *Semi-private class org.apache.spark.Logging. We suggest you use >>slf4j directly.* >>- SparkContext.metricsSystem >> >> Thanks, >> &

Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-26 Thread ayan guha
t; *From:* ayan guha [mailto:guha.a...@gmail.com] > *Sent:* Monday, June 26, 2017 6:26 AM > *To:* Weiqing Yang > *Cc:* user > *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase > > > > Hi > > > > I am using following: > > > > --packages com.hortonwork

RE: HDP 2.5 - Python - Spark-On-Hbase

2017-06-25 Thread Mahesh Sawaiker
Yang Cc: user Subject: Re: HDP 2.5 - Python - Spark-On-Hbase Hi I am using following: --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ Is it compatible with Spark 2.X? I would like to use it Best Ayan On Sat, Jun 24, 2017 at 2

Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-25 Thread ayan guha
Hi I am using following: --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ Is it compatible with Spark 2.X? I would like to use it Best Ayan On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang wrote: >

Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-23 Thread Weiqing Yang
Yes. What SHC version you were using? If hitting any issues, you can post them in SHC github issues. There are some threads about this. On Fri, Jun 23, 2017 at 5:46 AM, ayan guha wrote: > Hi > > Is it possible to use SHC from Hortonworks with pyspark? If so, any > working

HDP 2.5 - Python - Spark-On-Hbase

2017-06-23 Thread ayan guha
Hi Is it possible to use SHC from Hortonworks with pyspark? If so, any working code sample available? Also, I faced an issue while running the samples with Spark 2.0 "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging" Any workaround? Thanks in advance -- Best

One question / kerberos, yarn-cluster -> connection to hbase

2017-05-25 Thread sudhir37
Facing one issue with Kerberos enabled Hadoop/CDH cluster. We are trying to run a streaming job on yarn-cluster, which interacts with Kafka (direct stream), and hbase. Somehow, we are not able to connect to hbase in the cluster mode. We use keytab to login to hbase. This is what we do: spark

Re: One question / kerberos, yarn-cluster -> connection to hbase

2017-05-24 Thread Michael Gummelt
<sud...@infoobjects.com> wrote: > Facing one issue with Kerberos enabled Hadoop/CDH cluster. > > > > We are trying to run a streaming job on yarn-cluster, which interacts with > Kafka (direct stream), and hbase. > > > > Somehow, we are not able to connect

One question / kerberos, yarn-cluster -> connection to hbase

2017-05-24 Thread Sudhir Jangir
Facing one issue with Kerberos enabled Hadoop/CDH cluster. We are trying to run a streaming job on yarn-cluster, which interacts with Kafka (direct stream), and hbase. Somehow, we are not able to connect to hbase in the cluster mode. We use keytab to login to hbase. This is what we

hbase + spark + hdfs

2017-05-08 Thread mathieu ferlay
Hi everybody. I’m totally new in Spark and I wanna know one stuff that I do not manage to find. I have a full ambary install with hbase, Hadoop and spark. My code reads and writes in hdfs via hbase. Thus, as I understood, all data stored are in bytes format in hdfs. Now, I know that it’s possible

hbase + spark + hdfs

2017-05-05 Thread mathieu ferlay
Hi everybody. I'm totally new in Spark and I wanna know one stuff that I do not manage to find. I have a full ambary install with hbase, Hadoop and spark. My code reads and writes in hdfs via hbase. Thus, as I understood, all data stored are in bytes format in hdfs. Now, I know that it's possible

Re: Graph Analytics on HBase with HGraphDB and Spark GraphFrames

2017-04-03 Thread Weiqing Yang
il.com> > wrote: > >> Interesting! >> >> -- >> *From:* Robert Yokota <rayok...@gmail.com> >> *Sent:* Sunday, April 2, 2017 9:40:07 AM >> *To:* user@spark.apache.org >> *Subject:* Graph Analytics on HBase with HGraphDB and Spark GraphFrames >

Re: Graph Analytics on HBase with HGraphDB and Spark GraphFrames

2017-04-02 Thread Irving Duran
t; *To:* user@spark.apache.org > *Subject:* Graph Analytics on HBase with HGraphDB and Spark GraphFrames > > Hi, > > In case anyone is interested in analyzing graphs in HBase with Apache > Spark GraphFrames, this might be helpful: > > https://yokota.blog/2017/04/02/graph-analytics-on-hbase-with > -hgraphdb-and-spark-graphframes/ >

Re: Graph Analytics on HBase with HGraphDB and Spark GraphFrames

2017-04-02 Thread Felix Cheung
Interesting! From: Robert Yokota <rayok...@gmail.com> Sent: Sunday, April 2, 2017 9:40:07 AM To: user@spark.apache.org Subject: Graph Analytics on HBase with HGraphDB and Spark GraphFrames Hi, In case anyone is interested in analyzing graphs in HBase with

Graph Analytics on HBase with HGraphDB and Spark GraphFrames

2017-04-02 Thread Robert Yokota
Hi, In case anyone is interested in analyzing graphs in HBase with Apache Spark GraphFrames, this might be helpful: https://yokota.blog/2017/04/02/graph-analytics-on-hbase- with-hgraphdb-and-spark-graphframes/

getting error while storing data in Hbase

2017-04-01 Thread Chintan Bhatt
Hello all, I'm running following command in Hbase shell: create "sample","cf" and getting following error ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the

Reusing HBase connection in transformations

2017-02-14 Thread DandyDev
Hi! I'm struggling with the following problem: I have a couple of Spark Streaming jobs that keep state (using mapWithState, and in one case updateStateByKey) and write their results to HBase. One of the Streaming jobs, needs the results that the other Streaming job writes to HBase. How it's

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
t; > > On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Asher, > > I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java > (1.8) version as our installation. The Scala (2.10.5) vers

Re: HBase Spark

2017-02-03 Thread Asher Krim
seeing this locally, you might want to >> check which version of the scala sdk your IDE is using >> >> Asher Krim >> Senior Software Engineer >> >> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com> wrote: >> >>> Hi Asher, >&

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
, you might want to >> check which version of the scala sdk your IDE is using >> >> Asher Krim >> Senior Software Engineer >> >> >> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com >> <mailto:bbuil...@gmail.com>> w

Re: HBase Spark

2017-02-03 Thread Asher Krim
ich version of the scala sdk your IDE is using > > Asher Krim > Senior Software Engineer > > On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > >> Hi Asher, >> >> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java &

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
ly, you might want to > check which version of the scala sdk your IDE is using > > Asher Krim > Senior Software Engineer > > > On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Asher, > > I modifie

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
o, if you're seeing this locally, you might want to > check which version of the scala sdk your IDE is using > > Asher Krim > Senior Software Engineer > > On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > > Hi Asher, > > I modified th

Re: HBase Spark

2017-02-03 Thread Asher Krim
uil...@gmail.com> wrote: > Hi Asher, > > I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java > (1.8) version as our installation. The Scala (2.10.5) version is already > the same as ours. But I’m still getting the same error. Can you think of > anything

Re: HBase Spark

2017-02-02 Thread Benjamin Kim
Hi Asher, I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java (1.8) version as our installation. The Scala (2.10.5) version is already the same as ours. But I’m still getting the same error. Can you think of anything else? Cheers, Ben > On Feb 2, 2017, at 11:06 AM, As

Re: HBase Spark

2017-02-02 Thread Asher Krim
) > at org.apache.spark.sql.execution.datasources.hbase. > DefaultSource.createRelation(HBaseRelation.scala:51) > at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply( > ResolvedDataSource.scala:158) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.s

Re: HBase Spark

2017-02-02 Thread Benjamin Kim
ltSource.createRelation(HBaseRelation.scala:51) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) If you can please help, I would be grateful. Cheers, Ben >

Re: HBase Spark

2017-01-31 Thread Benjamin Kim
Elek, If I cannot use the HBase Spark module, then I’ll give it a try. Thanks, Ben > On Jan 31, 2017, at 1:02 PM, Marton, Elek <h...@anzix.net> wrote: > > > I tested this one with hbase 1.2.4: > > https://github.com/hortonworks-spark/shc > > Marton > >

Re: HBase Spark

2017-01-31 Thread Marton, Elek
I tested this one with hbase 1.2.4: https://github.com/hortonworks-spark/shc Marton On 01/31/2017 09:17 PM, Benjamin Kim wrote: Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I tried to build it from source, but I cannot get it to work. Thanks, Ben

HBase Spark

2017-01-31 Thread Benjamin Kim
Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I tried to build it from source, but I cannot get it to work. Thanks, Ben - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Hbase and Spark

2017-01-29 Thread Sudev A C
Hi Masf, Do try the official Hbase Spark. https://hbase.apache.org/book.html#spark I think you will have to build the jar from source and run your spark program with --packages . https://spark-packages.org/package/hortonworks-spark/shc says it's not yet published to Spark packages or Maven Repo

Hbase and Spark

2017-01-29 Thread Masf
I´m trying to build an application where is necessary to do bulkGets and bulkLoad on Hbase. I think that I could use this component https://github.com/hortonworks-spark/shc *Is it a good option??* But* I can't import it in my project*. Sbt cannot resolve hbase connector This is my build.sbt

Re: Approach: Incremental data load from HBASE

2017-01-06 Thread Chetan Khatri
Ayan, Thanks Correct I am not thinking RDBMS terms, i am wearing NoSQL glasses ! On Fri, Jan 6, 2017 at 3:23 PM, ayan guha <guha.a...@gmail.com> wrote: > IMHO you should not "think" HBase in RDMBS terms, but you can use > ColumnFilters to filter out new records > >

Re: Approach: Incremental data load from HBASE

2017-01-06 Thread ayan guha
IMHO you should not "think" HBase in RDMBS terms, but you can use ColumnFilters to filter out new records On Fri, Jan 6, 2017 at 7:22 PM, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Hi Ayan, > > I mean by Incremental load from HBase, weekly running batch job

Re: Approach: Incremental data load from HBASE

2017-01-06 Thread Chetan Khatri
Hi Ayan, I mean by Incremental load from HBase, weekly running batch jobs takes rows from HBase table and dump it out to Hive. Now when next i run Job it only takes newly arrived jobs. Same as if we use Sqoop for incremental load from RDBMS to Hive with below command, sqoop job --create myssb1

Re: Approach: Incremental data load from HBASE

2017-01-04 Thread ayan guha
Hi Chetan What do you mean by incremental load from HBase? There is a timestamp marker for each cell, but not at Row level. On Wed, Jan 4, 2017 at 10:37 PM, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Ted Yu, > > You understood wrong, i said Incremental load fro

Re: Approach: Incremental data load from HBASE

2017-01-04 Thread Chetan Khatri
Ted Yu, You understood wrong, i said Incremental load from HBase to Hive, individually you can say Incremental Import from HBase. On Wed, Dec 21, 2016 at 10:04 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Incremental load traditionally means generating hfiles an

Reading specific column family and columns in Hbase table through spark

2016-12-29 Thread Mich Talebzadeh
Hi, I have a routine in Spark that iterates through Hbase rows and tries to read columns. My question is how can I read the correct ordering of columns? example val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable

Re: Approach: Incremental data load from HBASE

2016-12-23 Thread Chetan Khatri
Ted Correct, In my case i want Incremental Import from HBASE and Incremental load to Hive. Both approach discussed earlier with Indexing seems accurate to me. But like Sqoop support Incremental import and load for RDBMS, Is there any tool which supports Incremental import from HBase ? On Wed

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Ted Yu
Incremental load traditionally means generating hfiles and using org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to load the data into hbase. For your use case, the producer needs to find rows where the flag is 0 or 1. After such rows are obtained, it is up to you how the result

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Chetan Khatri
Ok, Sure will ask. But what would be generic best practice solution for Incremental load from HBASE. On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu <yuzhih...@gmail.com> wrote: > I haven't used Gobblin. > You can consider asking Gobblin mailing list of the first option. > > The s

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Ted Yu
for Distributed Incremental > load from HBase, Is there any *tool / incubactor tool* which satisfy > requirement ? > > *Approach 1:* > > Write Kafka Producer and maintain manually column flag for events and > ingest it with Linkedin Gobblin to HDFS / S3. > > *Approach 2:* > >

Approach: Incremental data load from HBASE

2016-12-21 Thread Chetan Khatri
Hello Guys, I would like to understand different approach for Distributed Incremental load from HBase, Is there any *tool / incubactor tool* which satisfy requirement ? *Approach 1:* Write Kafka Producer and maintain manually column flag for events and ingest it with Linkedin Gobblin to HDFS

Re: How to set NameSpace while storing from Spark to HBase using saveAsNewAPIHadoopDataSet

2016-12-19 Thread Rabin Banerjee
gt; HI All, >> >> I am trying to save data from Spark into HBase using saveHadoopDataSet >> API . Please refer the below code . Code is working fine .But the table is >> getting stored in the default namespace.how to set the NameSpace in the >> below code? >

Re: How to set NameSpace while storing from Spark to HBase using saveAsNewAPIHadoopDataSet

2016-12-19 Thread Dhaval Modi
Replace with ":" Regards, Dhaval Modi On 19 December 2016 at 13:10, Rabin Banerjee <dev.rabin.baner...@gmail.com> wrote: > HI All, > > I am trying to save data from Spark into HBase using saveHadoopDataSet > API . Please refer the below code . Code is working fine

How to set NameSpace while storing from Spark to HBase using saveAsNewAPIHadoopDataSet

2016-12-19 Thread Rabin Banerjee
HI All, I am trying to save data from Spark into HBase using saveHadoopDataSet API . Please refer the below code . Code is working fine .But the table is getting stored in the default namespace.how to set the NameSpace in the below code? wordCounts.foreachRDD ( rdd = { val conf

  1   2   3   4   5   6   7   >