Re: Hive From Spark: Jdbc VS sparkContext

2017-11-22 Thread Nicolas Paris
Hey Finally I improved a lot the spark-hive sql performances. I had some problem with some topology_script.py that made huge log error trace and reduced spark performances in python mode. I just corrected the python2 scripts to be python3 ready. I had some problem with broadcast variable while

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread ayan guha
Yes, my thought exactly. Kindly let me know if you need any help to port in pyspark. On Mon, Nov 6, 2017 at 8:54 AM, Nicolas Paris wrote: > Le 05 nov. 2017 à 22:46, ayan guha écrivait : > > Thank you for the clarification. That was my understanding too. However > how to > >

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread Nicolas Paris
Le 05 nov. 2017 à 22:46, ayan guha écrivait : > Thank you for the clarification. That was my understanding too. However how to > provide the upper bound as it changes for every call in real life. For example > it is not required for sqoop.  True. AFAIK sqoop begins with doing a "select

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread ayan guha
Thank you for the clarification. That was my understanding too. However how to provide the upper bound as it changes for every call in real life. For example it is not required for sqoop. On Mon, 6 Nov 2017 at 8:20 am, Nicolas Paris wrote: > Le 05 nov. 2017 à 22:02, ayan

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread Nicolas Paris
Le 05 nov. 2017 à 22:02, ayan guha écrivait : > Can you confirm if JDBC DF Reader actually loads all data from source to > driver > memory and then distributes to the executors? apparently yes when not using partition column > And this is true even when a > partition column is provided? No,

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread ayan guha
Hi Can you confirm if JDBC DF Reader actually loads all data from source to driver memory and then distributes to the executors? And this is true even when a partition column is provided? Best Ayan On Mon, Nov 6, 2017 at 3:00 AM, David Hodeffi < david.hode...@niceactimize.com> wrote: > Testing

RE: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread David Hodeffi
Testing Spark group e-mail Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread Nicolas Paris
Le 05 nov. 2017 à 14:11, Gourav Sengupta écrivait : > thanks a ton for your kind response. Have you used SPARK Session ? I think > that > hiveContext is a very old way of solving things in SPARK, and since then new > algorithms have been introduced in SPARK.  I will give a try out sparkSession.

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread Gourav Sengupta
Hi Nicolas, thanks a ton for your kind response. Have you used SPARK Session ? I think that hiveContext is a very old way of solving things in SPARK, and since then new algorithms have been introduced in SPARK. It will be a lot of help, given how kind you have been by sharing your experience,

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread Nicolas Paris
Hi After some testing, I have been quite disapointed with hiveContext way of accessing hive tables. The main problem is resource allocation: I have tons of users and they get a limited subset of workers. Then this does not allow to query huge datasetsn because to few memory allocated (or maybe I

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-15 Thread Gourav Sengupta
Hi Nicolas, without the hive thrift server, if you try to run a select * on a table which has around 10,000 partitions, SPARK will give you some surprises. PRESTO works fine in these scenarios, and I am sure SPARK community will soon learn from their algorithms. Regards, Gourav On Sun, Oct 15,

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-15 Thread Nicolas Paris
> I do not think that SPARK will automatically determine the partitions. > Actually > it does not automatically determine the partitions. In case a table has a few > million records, it all goes through the driver. Hi Gourav Actualy spark jdbc driver is able to deal direclty with partitions.

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-15 Thread Nicolas Paris
Hi Gourav > what if the table has partitions and sub-partitions? well this also work with multiple orc files having same schema: val people = sqlContext.read.format("orc").load("hdfs://cluster/people*") Am I missing something? > And you do not want to access the entire data? This works for

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-15 Thread Gourav Sengupta
Hi Nicolas, what if the table has partitions and sub-partitions? And you do not want to access the entire data? Regards, Gourav On Sun, Oct 15, 2017 at 12:55 PM, Nicolas Paris wrote: > Le 03 oct. 2017 à 20:08, Nicolas Paris écrivait : > > I wonder the differences

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-15 Thread Nicolas Paris
Le 03 oct. 2017 à 20:08, Nicolas Paris écrivait : > I wonder the differences accessing HIVE tables in two different ways: > - with jdbc access > - with sparkContext Well there is also a third way to access the hive data from spark: - with direct file access (here ORC format) For example: val

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-13 Thread Kabeer Ahmed
My take on this might sound a bit different. Here are few points to consider below: 1. Going through Hive JDBC means that the application is restricted by the # of queries that can be compiled. HS2 can only compile one SQL at a time and if users have bad SQL, it can take a long time just to

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-13 Thread Nicolas Paris
> In case a table has a few > million records, it all goes through the driver. This sounds clear in JDBC mode, the driver get all the rows and then it spreads the RDD over the executors. I d'say that most use cases deal with SQL to aggregate huge datasets, and retrieve small amount of rows to be

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-10 Thread Gourav Sengupta
. Regards, Gourav Sengupta On Tue, Oct 10, 2017 at 10:14 PM, weand <andreas.we...@gmail.com> wrote: > Is Hive from Spark via JDBC working for you? In case it does, I would be > interested in your setup :-) > > We can't get this working. See bug here, especially my last comment: > http

RE: Hive From Spark: Jdbc VS sparkContext

2017-10-10 Thread Walia, Reema
To: user@spark.apache.org Subject: Re: Hive From Spark: Jdbc VS sparkContext [ External Email ] Is Hive from Spark via JDBC working for you? In case it does, I would be interested in your setup :-) We can't get this working. See bug here, especially my last comment: https://issues.apache.org

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-10 Thread weand
Is Hive from Spark via JDBC working for you? In case it does, I would be interested in your setup :-) We can't get this working. See bug here, especially my last comment: https://issues.apache.org/jira/browse/SPARK-21063 Regards Andreas -- Sent from: http://apache-spark-user-list.1001560.n3

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-10 Thread ayan guha
That is not correct, IMHO. If I am not wrong, Spark will still load data in executor, by running some stats on the data itself to identify partitions On Tue, Oct 10, 2017 at 9:23 PM, 郭鹏飞 wrote: > > > 在 2017年10月4日,上午2:08,Nicolas Paris 写道: > > >

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-10 Thread 郭鹏飞
> 在 2017年10月4日,上午2:08,Nicolas Paris 写道: > > Hi > > I wonder the differences accessing HIVE tables in two different ways: > - with jdbc access > - with sparkContext > > I would say that jdbc is better since it uses HIVE that is based on > map-reduce / TEZ and then works on

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-04 Thread ayan guha
Well the obvious point is security. Ranger and Sentry can secure jdbc endpoints only. For performance aspect, I am equally curious 邏 On Wed, 4 Oct 2017 at 10:30 pm, Gourav Sengupta wrote: > Hi, > > I am genuinely curious to see whether any one responds to this

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-04 Thread Gourav Sengupta
Hi, I am genuinely curious to see whether any one responds to this question. Its very hard to shake off JAVA, OOPs and JDBC's :) Regards, Gourav Sengupta On Tue, Oct 3, 2017 at 7:08 PM, Nicolas Paris wrote: > Hi > > I wonder the differences accessing HIVE tables in two

Hive From Spark: Jdbc VS sparkContext

2017-10-03 Thread Nicolas Paris
Hi I wonder the differences accessing HIVE tables in two different ways: - with jdbc access - with sparkContext I would say that jdbc is better since it uses HIVE that is based on map-reduce / TEZ and then works on disk. Using spark rdd can lead to memory errors on very huge datasets. Anybody

Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-17 Thread Mich Talebzadeh
h may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > > On 16 September 2016 at 19:53, <anupama.gangad...@

RE: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-17 Thread anupama . gangadhar
ing to connect to Hive from Spark (Yarn-Cluster Mode) Is your Hive Thrift Server up and running on port jdbc:hive2://10001? Do the following netstat -alnp |grep 10001 and see whether it is actually running HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view

RE: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-17 Thread anupama . gangadhar
Hi, @Deepak I have used a separate user keytab(not hadoop services keytab) and able to connect to Hive via simple java program. I am able to connect to Hive from spark-shell as well. However when I submit a spark job using this same keytab, I see the issue. Do cache have a role to play here

Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-16 Thread Deepak Sharma
Hi Anupama To me it looks like issue with the SPN with which you are trying to connect to hive2 , i.e. hive@hostname. Are you able to connect to hive from spark-shell? Try getting the tkt using any other user keytab but not hadoop services keytab and then try running the spark submit. Thanks

Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-16 Thread Mich Talebzadeh
hnical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 16 September 2016 at 19:53, <anupama.gangad...@daimler.com> wrote: > Hi, > > > > I am trying to connect to Hive from Spar

Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-16 Thread anupama . gangadhar
Hi, I am trying to connect to Hive from Spark application in Kerborized cluster and get the following exception. Spark version is 1.4.1 and Hive is 1.2.1. Outside of spark the connection goes through fine. Am I missing any configuration parameters? ava.sql.SQLException: Could not open

Unable To access Hive From Spark

2016-04-15 Thread Amit Singh Hora
Hi All, I am trying to access hive from Spark but getting exception The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw- Code :- String logFile = "hdfs://hdp23ha/logs"; // Should be

create table in hive from spark-sql

2015-09-23 Thread Mohit Singh
Probably a noob question. But I am trying to create a hive table using spark-sql. Here is what I am trying to do: hc = HiveContext(sc) hdf = hc.parquetFile(output_path) data_types = hdf.dtypes schema = "(" + " ,".join(map(lambda x: x[0] + " " + x[1], data_types)) +")" hc.sql(" CREATE TABLE IF

Re: No suitable driver found error, Create table in hive from spark sql

2015-02-19 Thread Todd Nist
-in-hive-from-spark-sql-tp21714p21715.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

No suitable driver found error, Create table in hive from spark sql

2015-02-18 Thread Dhimant
No suitable driver found error, Create table in hive from spark sql. I am trying to execute following example. SPARKGIT: spark/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala My setup :- hadoop 1.6,spark 1.2, hive 1.0, mysql server (installed via yum install

Re: No suitable driver found error, Create table in hive from spark sql

2015-02-18 Thread Dhimant
/No-suitable-driver-found-error-Create-table-in-hive-from-spark-sql-tp21714p21715.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

RE: Hive From Spark

2014-08-25 Thread Andrew Lee
the results merged. Still seeing guava 14.0.1 so I don't think Spark 2848 has been merged yet. Will be great to have someone to confirm or clarify the expectation. From: l...@yahoo-inc.com.INVALID To: van...@cloudera.com; alee...@hotmail.com CC: user@spark.apache.org Subject: Re: Hive From Spark

Re: Hive From Spark

2014-08-22 Thread Du Li
people use spark-sql? I'm trying to understand the rationale and motivation behind this script, any idea? Date: Thu, 21 Aug 2014 16:31:08 -0700 Subject: Re: Hive From Spark From: van...@cloudera.com To: l...@yahoo-inc.com.invalid CC: user@spark.apache.org; u...@spark.incubator.apache.org

Re: Hive From Spark

2014-08-21 Thread Du Li
@spark.apache.orgmailto:user@spark.apache.org, u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: RE: Hive From Spark Hi All, Currently, if you are running Spark HiveContext API with Hive 0.12

Re: Hive From Spark

2014-08-21 Thread Marcelo Vanzin
@spark.apache.org user@spark.apache.org, u...@spark.incubator.apache.org u...@spark.incubator.apache.org Subject: RE: Hive From Spark Hi All, Currently, if you are running Spark HiveContext API with Hive 0.12, it won't work due to the following 2 libraries which are not consistent with Hive 0.12

RE: Hive From Spark

2014-07-22 Thread Andrew Lee
to work soon. Please let me know if there's any help or feedback I can provide. Thanks Sean. From: so...@cloudera.com Date: Mon, 21 Jul 2014 18:36:10 +0100 Subject: Re: Hive From Spark To: user@spark.apache.org I haven't seen anyone actively 'unwilling' -- I hope not. See discussion at https

RE: Hive From Spark

2014-07-21 Thread Andrew Lee
@spark.apache.org; u...@spark.incubator.apache.org Subject: RE: Hive From Spark Date: Mon, 21 Jul 2014 01:14:19 + JiaJia, I've checkout the latest 1.0 branch, and then do the following steps: SPAKR_HIVE=true sbt/sbt clean assembly cd examples ../bin/run-example sql.hive.HiveFromSpark It works

Re: Hive From Spark

2014-07-21 Thread Sean Owen
I haven't seen anyone actively 'unwilling' -- I hope not. See discussion at https://issues.apache.org/jira/browse/SPARK-2420 where I sketch what a downgrade means. I think it just hasn't gotten a looking over. Contrary to what I thought earlier, the conflict does in fact cause problems in theory,

RE: Hive From Spark

2014-07-20 Thread Cheng, Hao
Subject: RE: Hive From Spark Hi Cheng Hao, Thank you very much for your reply. Basically, the program runs on Spark 1.0.0 and Hive 0.12.0 . Some setups of the environment are done by running SPARK_HIVE=true sbt/sbt assembly/assembly, including the jar in all the workers, and copying the hive

RE: Hive From Spark

2014-07-18 Thread JiajiaJing
in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hive-From-Spark-tp10110p10215.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Hive From Spark

2014-07-17 Thread JiajiaJing
.nabble.com/Hive-From-Spark-tp10110.html Sent from the Apache Spark User List mailing list archive at Nabble.com.