Hey
Finally I improved a lot the spark-hive sql performances.
I had some problem with some topology_script.py that made huge log error
trace and reduced spark performances in python mode. I just corrected
the python2 scripts to be python3 ready.
I had some problem with broadcast variable while
Yes, my thought exactly. Kindly let me know if you need any help to port in
pyspark.
On Mon, Nov 6, 2017 at 8:54 AM, Nicolas Paris wrote:
> Le 05 nov. 2017 à 22:46, ayan guha écrivait :
> > Thank you for the clarification. That was my understanding too. However
> how to
> >
Le 05 nov. 2017 à 22:46, ayan guha écrivait :
> Thank you for the clarification. That was my understanding too. However how to
> provide the upper bound as it changes for every call in real life. For example
> it is not required for sqoop.
True. AFAIK sqoop begins with doing a
"select
Thank you for the clarification. That was my understanding too. However how
to provide the upper bound as it changes for every call in real life. For
example it is not required for sqoop.
On Mon, 6 Nov 2017 at 8:20 am, Nicolas Paris wrote:
> Le 05 nov. 2017 à 22:02, ayan
Le 05 nov. 2017 à 22:02, ayan guha écrivait :
> Can you confirm if JDBC DF Reader actually loads all data from source to
> driver
> memory and then distributes to the executors?
apparently yes when not using partition column
> And this is true even when a
> partition column is provided?
No,
Hi
Can you confirm if JDBC DF Reader actually loads all data from source to
driver memory and then distributes to the executors? And this is true even
when a partition column is provided?
Best
Ayan
On Mon, Nov 6, 2017 at 3:00 AM, David Hodeffi <
david.hode...@niceactimize.com> wrote:
> Testing
Testing Spark group e-mail
Confidentiality: This communication and any attachments are intended for the
above-named persons only and may be confidential and/or legally privileged. Any
opinions expressed in this communication are not necessarily those of NICE
Actimize. If this communication has
Le 05 nov. 2017 à 14:11, Gourav Sengupta écrivait :
> thanks a ton for your kind response. Have you used SPARK Session ? I think
> that
> hiveContext is a very old way of solving things in SPARK, and since then new
> algorithms have been introduced in SPARK.
I will give a try out sparkSession.
Hi Nicolas,
thanks a ton for your kind response. Have you used SPARK Session ? I think
that hiveContext is a very old way of solving things in SPARK, and since
then new algorithms have been introduced in SPARK.
It will be a lot of help, given how kind you have been by sharing your
experience,
Hi
After some testing, I have been quite disapointed with hiveContext way of
accessing hive tables.
The main problem is resource allocation: I have tons of users and they
get a limited subset of workers. Then this does not allow to query huge
datasetsn because to few memory allocated (or maybe I
Hi Nicolas,
without the hive thrift server, if you try to run a select * on a table
which has around 10,000 partitions, SPARK will give you some surprises.
PRESTO works fine in these scenarios, and I am sure SPARK community will
soon learn from their algorithms.
Regards,
Gourav
On Sun, Oct 15,
> I do not think that SPARK will automatically determine the partitions.
> Actually
> it does not automatically determine the partitions. In case a table has a few
> million records, it all goes through the driver.
Hi Gourav
Actualy spark jdbc driver is able to deal direclty with partitions.
Hi Gourav
> what if the table has partitions and sub-partitions?
well this also work with multiple orc files having same schema:
val people = sqlContext.read.format("orc").load("hdfs://cluster/people*")
Am I missing something?
> And you do not want to access the entire data?
This works for
Hi Nicolas,
what if the table has partitions and sub-partitions? And you do not want to
access the entire data?
Regards,
Gourav
On Sun, Oct 15, 2017 at 12:55 PM, Nicolas Paris wrote:
> Le 03 oct. 2017 à 20:08, Nicolas Paris écrivait :
> > I wonder the differences
Le 03 oct. 2017 à 20:08, Nicolas Paris écrivait :
> I wonder the differences accessing HIVE tables in two different ways:
> - with jdbc access
> - with sparkContext
Well there is also a third way to access the hive data from spark:
- with direct file access (here ORC format)
For example:
val
My take on this might sound a bit different. Here are few points to consider
below:
1. Going through Hive JDBC means that the application is restricted by the #
of queries that can be compiled. HS2 can only compile one SQL at a time and if
users have bad SQL, it can take a long time just to
> In case a table has a few
> million records, it all goes through the driver.
This sounds clear in JDBC mode, the driver get all the rows and then it
spreads the RDD over the executors.
I d'say that most use cases deal with SQL to aggregate huge datasets,
and retrieve small amount of rows to be
.
Regards,
Gourav Sengupta
On Tue, Oct 10, 2017 at 10:14 PM, weand <andreas.we...@gmail.com> wrote:
> Is Hive from Spark via JDBC working for you? In case it does, I would be
> interested in your setup :-)
>
> We can't get this working. See bug here, especially my last comment:
> http
To: user@spark.apache.org
Subject: Re: Hive From Spark: Jdbc VS sparkContext
[ External Email ]
Is Hive from Spark via JDBC working for you? In case it does, I would be
interested in your setup :-)
We can't get this working. See bug here, especially my last comment:
https://issues.apache.org
Is Hive from Spark via JDBC working for you? In case it does, I would be
interested in your setup :-)
We can't get this working. See bug here, especially my last comment:
https://issues.apache.org/jira/browse/SPARK-21063
Regards
Andreas
--
Sent from: http://apache-spark-user-list.1001560.n3
That is not correct, IMHO. If I am not wrong, Spark will still load data in
executor, by running some stats on the data itself to identify
partitions
On Tue, Oct 10, 2017 at 9:23 PM, 郭鹏飞 wrote:
>
> > 在 2017年10月4日,上午2:08,Nicolas Paris 写道:
> >
>
> 在 2017年10月4日,上午2:08,Nicolas Paris 写道:
>
> Hi
>
> I wonder the differences accessing HIVE tables in two different ways:
> - with jdbc access
> - with sparkContext
>
> I would say that jdbc is better since it uses HIVE that is based on
> map-reduce / TEZ and then works on
Well the obvious point is security. Ranger and Sentry can secure jdbc
endpoints only. For performance aspect, I am equally curious 邏
On Wed, 4 Oct 2017 at 10:30 pm, Gourav Sengupta
wrote:
> Hi,
>
> I am genuinely curious to see whether any one responds to this
Hi,
I am genuinely curious to see whether any one responds to this question.
Its very hard to shake off JAVA, OOPs and JDBC's :)
Regards,
Gourav Sengupta
On Tue, Oct 3, 2017 at 7:08 PM, Nicolas Paris wrote:
> Hi
>
> I wonder the differences accessing HIVE tables in two
Hi
I wonder the differences accessing HIVE tables in two different ways:
- with jdbc access
- with sparkContext
I would say that jdbc is better since it uses HIVE that is based on
map-reduce / TEZ and then works on disk.
Using spark rdd can lead to memory errors on very huge datasets.
Anybody
h may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
> On 16 September 2016 at 19:53, <anupama.gangad...@
ing to connect to Hive from Spark (Yarn-Cluster Mode)
Is your Hive Thrift Server up and running on port jdbc:hive2://10001?
Do the following
netstat -alnp |grep 10001
and see whether it is actually running
HTH
Dr Mich Talebzadeh
LinkedIn
https://www.linkedin.com/profile/view
Hi,
@Deepak
I have used a separate user keytab(not hadoop services keytab) and able to
connect to Hive via simple java program.
I am able to connect to Hive from spark-shell as well. However when I submit a
spark job using this same keytab, I see the issue.
Do cache have a role to play here
Hi Anupama
To me it looks like issue with the SPN with which you are trying to connect
to hive2 , i.e. hive@hostname.
Are you able to connect to hive from spark-shell?
Try getting the tkt using any other user keytab but not hadoop services
keytab and then try running the spark submit.
Thanks
hnical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On 16 September 2016 at 19:53, <anupama.gangad...@daimler.com> wrote:
> Hi,
>
>
>
> I am trying to connect to Hive from Spar
Hi,
I am trying to connect to Hive from Spark application in Kerborized cluster and
get the following exception. Spark version is 1.4.1 and Hive is 1.2.1. Outside
of spark the connection goes through fine.
Am I missing any configuration parameters?
ava.sql.SQLException: Could not open
Hi All,
I am trying to access hive from Spark but getting exception
The root scratch dir: /tmp/hive on HDFS should be writable. Current
permissions are: rw-rw-rw-
Code :-
String logFile = "hdfs://hdp23ha/logs"; // Should be
Probably a noob question.
But I am trying to create a hive table using spark-sql.
Here is what I am trying to do:
hc = HiveContext(sc)
hdf = hc.parquetFile(output_path)
data_types = hdf.dtypes
schema = "(" + " ,".join(map(lambda x: x[0] + " " + x[1], data_types)) +")"
hc.sql(" CREATE TABLE IF
-in-hive-from-spark-sql-tp21714p21715.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
No suitable driver found error, Create table in hive from spark sql.
I am trying to execute following example.
SPARKGIT:
spark/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala
My setup :- hadoop 1.6,spark 1.2, hive 1.0, mysql server (installed via yum
install
/No-suitable-driver-found-error-Create-table-in-hive-from-spark-sql-tp21714p21715.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
the results
merged. Still seeing guava 14.0.1 so I don't think Spark 2848 has been merged
yet.
Will be great to have someone to confirm or clarify the expectation.
From: l...@yahoo-inc.com.INVALID
To: van...@cloudera.com; alee...@hotmail.com
CC: user@spark.apache.org
Subject: Re: Hive From Spark
people use spark-sql? I'm trying to understand the rationale
and
motivation behind this script, any idea?
Date: Thu, 21 Aug 2014 16:31:08 -0700
Subject: Re: Hive From Spark
From: van...@cloudera.com
To: l...@yahoo-inc.com.invalid
CC: user@spark.apache.org; u...@spark.incubator.apache.org
@spark.apache.orgmailto:user@spark.apache.org,
u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org
u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org
Subject: RE: Hive From Spark
Hi All,
Currently, if you are running Spark HiveContext API with Hive 0.12
@spark.apache.org user@spark.apache.org,
u...@spark.incubator.apache.org u...@spark.incubator.apache.org
Subject: RE: Hive From Spark
Hi All,
Currently, if you are running Spark HiveContext API with Hive 0.12, it won't
work due to the following 2 libraries which are not consistent with Hive
0.12
to work soon. Please let me know if
there's any help or feedback I can provide.
Thanks Sean.
From: so...@cloudera.com
Date: Mon, 21 Jul 2014 18:36:10 +0100
Subject: Re: Hive From Spark
To: user@spark.apache.org
I haven't seen anyone actively 'unwilling' -- I hope not. See
discussion at https
@spark.apache.org; u...@spark.incubator.apache.org
Subject: RE: Hive From Spark
Date: Mon, 21 Jul 2014 01:14:19 +
JiaJia, I've checkout the latest 1.0 branch, and then do the following steps:
SPAKR_HIVE=true sbt/sbt clean assembly
cd examples
../bin/run-example sql.hive.HiveFromSpark
It works
I haven't seen anyone actively 'unwilling' -- I hope not. See
discussion at https://issues.apache.org/jira/browse/SPARK-2420 where I
sketch what a downgrade means. I think it just hasn't gotten a looking
over.
Contrary to what I thought earlier, the conflict does in fact cause
problems in theory,
Subject: RE: Hive From Spark
Hi Cheng Hao,
Thank you very much for your reply.
Basically, the program runs on Spark 1.0.0 and Hive 0.12.0 .
Some setups of the environment are done by running SPARK_HIVE=true sbt/sbt
assembly/assembly, including the jar in all the workers, and copying the
hive
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Hive-From-Spark-tp10110p10215.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
.nabble.com/Hive-From-Spark-tp10110.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
46 matches
Mail list logo