Re: Hive on Spark vs Spark on Hive(HiveContext)

2021-07-01 Thread Mich Talebzadeh
you can do from pyspark.sql import SparkSession from pyspark import SparkContext from pyspark.sql import SQLContext from pyspark.sql import HiveContext And use it like below sqltext = "" if (spark.sql("SHOW TABLES IN test like 'randomDataPy'").count() == 1): rows = spark.

Re: Hive on Spark vs Spark on Hive(HiveContext)

2021-07-01 Thread Pralabh Kumar
gt; >> 1. One is Hive on Spark , which is similar to changing the execution >>engine in hive queries like TEZ. >>2. Another one is migrating hive queries to Hivecontext/sparksql , an >>approach used by Facebook and presented in Spark conference. >>

Re: Hive on Spark vs Spark on Hive(HiveContext)

2021-07-01 Thread Mich Talebzadeh
approaches > > >1. One is Hive on Spark , which is similar to changing the execution >engine in hive queries like TEZ. >2. Another one is migrating hive queries to Hivecontext/sparksql , an >approach used by Facebook and presented in Spark conference. > &g

Hive on Spark vs Spark on Hive(HiveContext)

2021-07-01 Thread Pralabh Kumar
is migrating hive queries to Hivecontext/sparksql , an approach used by Facebook and presented in Spark conference. https://databricks.com/session/experiences-migrating-hive-workload-to-sparksql#:~:text=Spark%20SQL%20in%20Apache%20Spark,SQL%20with%20minimal%20user%20intervention . Can you

HiveContext on Spark 1.6 Linkage Error:ClassCastException

2017-02-14 Thread Enrico DUrso
Hello guys, hope all of you are ok. I am trying to use HiveContext on Spark 1.6, I am developing using Eclipse and I placed the hive-site.xml in the classPath, so doing I use the Hive instance running on my cluster instead of creating a local metastore and a local warehouse. So far so good

HiveContext on Spark 1.6 Linkage Error:ClassCastException

2017-02-14 Thread Enrico DUrso
Hello guys, hope all of you are ok. I am trying to use HiveContext on Spark 1.6, I am developing using Eclipse and I placed the hive-site.xml in the classPath, so doing I use the Hive instance running on my cluster instead of creating a local metastore and a local warehouse. So far so good

not table to connect to table using hiveContext

2016-11-01 Thread vinay parekar
Hi there, I am trying to get some table data using spark hiveContext. I am getting an exception as : org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table rnow_imports_text. null at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1158

Re: HiveContext is Serialized?

2016-10-26 Thread Mich Talebzadeh
Thanks Sean. I believe you are referring to below statement "You can't use the HiveContext or SparkContext in a distribution operation. It has nothing to do with for loops. The fact that they're serializable is misleading. It's there, I believe, because these objects may be inadvert

Re: HiveContext is Serialized?

2016-10-26 Thread Sean Owen
te: > > It is the driver that has the info needed to schedule and manage > distributed jobs and that is by design. > > This is narrowly about using the HiveContext or SparkContext directly. Of > course SQL operations are distributed. > > > On Wed, Oct 26, 2016, 10:03

Re: HiveContext is Serialized?

2016-10-26 Thread Mich Talebzadeh
liable for any monetary damages arising from such loss, damage or destruction. On 26 October 2016 at 09:06, Sean Owen <so...@cloudera.com> wrote: > It is the driver that has the info needed to schedule and manage > distributed jobs and that is by design. > > This is narrowly abo

Re: HiveContext is Serialized?

2016-10-26 Thread Sean Owen
It is the driver that has the info needed to schedule and manage distributed jobs and that is by design. This is narrowly about using the HiveContext or SparkContext directly. Of course SQL operations are distributed. On Wed, Oct 26, 2016, 10:03 Mich Talebzadeh <mich.talebza...@gmail.com>

Re: HiveContext is Serialized?

2016-10-26 Thread ayan guha
; wrote: > Sean, thank you for making it clear. It was helpful. > > Regards, > Ajay > > On Wednesday, October 26, 2016, Sean Owen <so...@cloudera.com> wrote: > >> This usage is fine, because you are only using the HiveContext locally on >> the driver. It's

Re: HiveContext is Serialized?

2016-10-26 Thread Mich Talebzadeh
Hi Sean, Your point: "You can't use the HiveContext or SparkContext in a distribution operation..." Is this because of design issue? Case in point if I created a DF from RDD and register it as a tempTable, does this imply that any sql calls on that table islocalised and not distrib

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Sean, thank you for making it clear. It was helpful. Regards, Ajay On Wednesday, October 26, 2016, Sean Owen <so...@cloudera.com> wrote: > This usage is fine, because you are only using the HiveContext locally on > the driver. It's applied in a function that's used on a Scal

Re: HiveContext is Serialized?

2016-10-25 Thread Sunita Arvind
nly using the HiveContext locally on > the driver. It's applied in a function that's used on a Scala collection. > > You can't use the HiveContext or SparkContext in a distribution operation. > It has nothing to do with for loops. > > The fact that they're serializable is misleading. It's

Re: HiveContext is Serialized?

2016-10-25 Thread Sean Owen
This usage is fine, because you are only using the HiveContext locally on the driver. It's applied in a function that's used on a Scala collection. You can't use the HiveContext or SparkContext in a distribution operation. It has nothing to do with for loops. The fact that they're serializable

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
> >> >> Jeff, Thanks for your response. I see below error in the logs. You think >> it has to do anything with hiveContext ? Do I have to serialize it before >> using inside foreach ? >> >> 16/10/19 15:16:23 ERROR schedul

Re: HiveContext is Serialized?

2016-10-25 Thread Sunita Arvind
I see below error in the logs. You think > it has to do anything with hiveContext ? Do I have to serialize it before > using inside foreach ? > > 16/10/19 15:16:23 ERROR scheduler.LiveListenerBus: Listener SQLListener > threw an exception >

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Jeff, Thanks for your response. I see below error in the logs. You think it has to do anything with hiveContext ? Do I have to serialize it before using inside foreach ? 16/10/19 15:16:23 ERROR scheduler.LiveListenerBus: Listener SQLListener threw an exception java.lang.NullPointerException

Re: HiveContext is Serialized?

2016-10-25 Thread Jeff Zhang
In your sample code, you can use hiveContext in the foreach as it is scala List foreach operation which runs in driver side. But you cannot use hiveContext in RDD.foreach Ajay Chander <itsche...@gmail.com>于2016年10月26日周三 上午11:28写道: > Hi Everyone, > > I was thinking if I can

HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Hi Everyone, I was thinking if I can use hiveContext inside foreach like below, object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf() val sc = new SparkContext(conf) val hiveContext = new HiveContext(sc) val dataElementsFile = args(0) val deDF

Re: Creating HiveContext withing Spark streaming

2016-09-08 Thread Mich Talebzadeh
t;) sparkConf.set("spark.streaming.driver.writeAheadLog.closeFileAfterWrite", "true") sparkConf.set("spark.streaming.receiver.writeAheadLog.closeFileAfterWrite", "true") val batchInterval = 2 // Create the streamingContext val streamingContext =

Re: Creating HiveContext withing Spark streaming

2016-09-08 Thread Todd Nist
= new StreamingContext(sc, Seconds(batchInterval)) *val HiveContext = new HiveContext(sc)* Or remove / replace the line in red from your code and just set the val sparkContext = streamingContext.sparkContext. val streamingContext = new StreamingContext(sparkConf, Seconds(batchInterval)) *val

Re: Creating HiveContext withing Spark streaming

2016-09-08 Thread Mich Talebzadeh
closeFileAfterWrite", "true") var sqltext = "" val batchInterval = 2 val streamingContext = new StreamingContext(sparkConf, Seconds(batchInterval)) With the above settings, Spark streaming works fine. *However, after adding the first line below (in red)*

Creating HiveContext withing Spark streaming

2016-09-08 Thread Mich Talebzadeh
Hi, This may not be feasible in Spark streaming. I am trying to create a HiveContext in Spark streaming within the streaming context // Create a local StreamingContext with two working thread and batch interval of 2 seconds. val sparkConf = new SparkConf(). setAppName

Re: Table registered using registerTempTable not found in HiveContext

2016-08-11 Thread Mich Talebzadeh
this is Spark 2 you create temp table from df using HiveContext val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> s.registerTempTable("tmp") scala> HiveContext.sql("select count(1) from tmp") res18: org.apache.spark.sql.DataFrame = [count(1): bigi

Re: Table registered using registerTempTable not found in HiveContext

2016-08-11 Thread Richard M
How are you calling registerTempTable from hiveContext? It appears to be a private method. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Table-registered-using-registerTempTable-not-found-in-HiveContext-tp26555p27514.html Sent from the Apache Spark User

Re: SPARKSQL with HiveContext My job fails

2016-08-04 Thread Mich Talebzadeh
Well the error states Exception in thread thread_name: java.lang.OutOfMemoryError: GC Overhead limit exceeded Cause: The detail message "GC overhead limit exceeded" indicates that the garbage collector is

SPARKSQL with HiveContext My job fails

2016-08-04 Thread Vasu Devan
Hi Team, My Spark job fails with below error : Could you please advice me what is the problem with my job. Below is my error stack: 16/08/04 05:11:06 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-14] shutting down ActorSystem [sparkDriver]

HiveContext , difficulties in accessing tables in hive schema's/database's other than default database.

2016-07-19 Thread satyajit vegesna
Hi All, I have been trying to access tables from other schema's , apart from default , to pull data into dataframe. i was successful in doing it using the default schema in hive database. But when i try any other schema/database in hive, i am getting below error.(Have also not seen any examples

Re: HiveContext

2016-07-01 Thread Mich Talebzadeh
tilizes these indexes to move the filter operation to the data > loading phase, by reading only data that potentially includes required rows. > > > My doubt is when we give some query to hiveContext in orc table using > spark with > > sqlContext.setConf("spark.sql.orc.fi

HiveContext

2016-07-01 Thread manish jaiswal
to the start of the row group. ORC utilizes these indexes to move the filter operation to the data loading phase, by reading only data that potentially includes required rows. My doubt is when we give some query to hiveContext in orc table using spark with sqlContext.setConf

HiveContext

2016-06-30 Thread manish jaiswal
-- Forwarded message -- From: "manish jaiswal" <manishsr...@gmail.com> Date: Jun 30, 2016 17:35 Subject: HiveContext To: <user@spark.apache.org>, <user-subscr...@spark.apache.org>, < user-h...@spark.apache.org> Cc: Hi, I am new to Spark.I foun

Re: hivecontext error

2016-06-14 Thread Ted Yu
Which release of Spark are you using ? Can you show the full error trace ? Thanks On Tue, Jun 14, 2016 at 6:33 PM, Tejaswini Buche < tejaswini.buche0...@gmail.com> wrote: > I am trying to use hivecontext in spark. The following statements are > running fine : > > from

hivecontext error

2016-06-14 Thread Tejaswini Buche
I am trying to use hivecontext in spark. The following statements are running fine : from pyspark.sql import HiveContext sqlContext = HiveContext(sc) But, when i run the below statement, sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") I get the following err

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-10 Thread Daniel Haviv
I'm using EC2 instances Thank you. Daniel > On 9 Jun 2016, at 16:49, Gourav Sengupta wrote: > > Hi, > > are you using EC2 instances or local cluster behind firewall. > > > Regards, > Gourav Sengupta > >> On Wed, Jun 8, 2016 at 4:34 PM, Daniel Haviv >>

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-09 Thread Gourav Sengupta
Hi, are you using EC2 instances or local cluster behind firewall. Regards, Gourav Sengupta On Wed, Jun 8, 2016 at 4:34 PM, Daniel Haviv < daniel.ha...@veracity-group.com> wrote: > Hi, > > I'm trying to create a table on s3a but I keep hitting the following error: > > Exception in thread

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-09 Thread Steve Loughran
On 9 Jun 2016, at 06:17, Daniel Haviv > wrote: Hi, I've set these properties both in core-site.xml and hdfs-site.xml with no luck. Thank you. Daniel That's not good. I'm afraid I don't know what version of s3a is in the

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-08 Thread Daniel Haviv
Hi, I've set these properties both in core-site.xml and hdfs-site.xml with no luck. Thank you. Daniel > On 9 Jun 2016, at 01:11, Steve Loughran wrote: > > >> On 8 Jun 2016, at 16:34, Daniel Haviv >> wrote: >> >> Hi, >> I'm trying to

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-08 Thread Steve Loughran
On 8 Jun 2016, at 16:34, Daniel Haviv > wrote: Hi, I'm trying to create a table on s3a but I keep hitting the following error: Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException:

Re: When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?

2016-06-08 Thread lalit sharma
To add on what Vikash said above, bit more internals : 1. There are 2 components which work together to achieve Hive + Spark integration a. HiveContext which extends SqlContext adds logic to add hive specific things e.g. loading jars to talk to underlying metastore db, load configs in hive

Re: When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?

2016-06-08 Thread Vikash Pareek
, tungsten to optimize queries) to execute query and generate result faster than hive (Map Reduce). Using HiveContext means connecting to hive meta store db. Thus, HiveContext can access hive meta data, and hive meta data includes location of data, serialization and de-serializations, compression codecs

HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-08 Thread Daniel Haviv
Hi, I'm trying to create a table on s3a but I keep hitting the following error: Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:com.cloudera.com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain) I

When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?

2016-06-08 Thread Himanshu Mehra
So what happens underneath when we query on a hive table using hiveContext? 1. Does Spark talks to metastore to get the data location on hdfs and read the data from there to perform those queries? 2. Spark passes those queries to hive and hive executes those queries on the table and returns

Re: hivecontext and date format

2016-06-01 Thread Mich Talebzadeh
?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 1 June 2016 at 12:16, pseudo oduesp <pseudo20...@gmail.com> wrote: > Hi , > can i ask you how we can convert string like dd/mm/ to date type in > hivecontext? > > i try with unix_timestemp and with

hivecontext and date format

2016-06-01 Thread pseudo oduesp
Hi , can i ask you how we can convert string like dd/mm/ to date type in hivecontext? i try with unix_timestemp and with format date but i get null . thank you.

Re: HiveContext standalone => without a Hive metastore

2016-05-30 Thread Michael Segel
t; <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 26 May 2016 at 19:09, Gerard Maas <gerard.m...@gmail.com > <mailto:gerard.m...@gmail.com>> wrot

Re: HiveContext standalone => without a Hive metastore

2016-05-30 Thread Gerard Maas
Michael, Mitch, Silvio, Thanks! The own directoy is the issue. We are running the Spark Notebook, which uses the same dir per server (i.e. for all notebooks). So this issue prevents us from running 2 notebooks using HiveContext. I'll look in a proper Hive installation and I'm glad to know

Re: HiveContext standalone => without a Hive metastore

2016-05-26 Thread Michael Armbrust
You can also just make sure that each user is using their own directory. A rough example can be found in TestHive. Note: in Spark 2.0 there should be no need to use HiveContext unless you need to talk to a metastore. On Thu, May 26, 2016 at 1:36 PM, Mich Talebzadeh <mich.talebza...@gmail.

Re: Problem instantiation of HiveContext

2016-05-26 Thread Ian
.nabble.com/Problem-instantiation-of-HiveContext-tp26999p27035.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e

Re: HiveContext standalone => without a Hive metastore

2016-05-26 Thread Gerard Maas
Thanks a lot for the advice!. I found out why the standalone hiveContext would not work: it was trying to deploy a derby db and the user had no rights to create the dir where there db is stored: Caused by: java.sql.SQLException: Failed to create database 'metastore_db', see the next exception

Re: HiveContext standalone => without a Hive metastore

2016-05-26 Thread Mich Talebzadeh
To use HiveContext witch is basically an sql api within Spark without proper hive set up does not make sense. It is a super set of Spark SQLContext In addition simple things like registerTempTable may not work. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: HiveContext standalone => without a Hive metastore

2016-05-26 Thread Silvio Fiorito
Hi Gerard, I’ve never had an issue using the HiveContext without a hive-site.xml configured. However, one issue you may have is if multiple users are starting the HiveContext from the same path, they’ll all be trying to store the default Derby metastore in the same location. Also, if you want

Re: HiveContext standalone => without a Hive metastore

2016-05-26 Thread Mich Talebzadeh
Hi Gerald, I am not sure the so called independence is will. I gather you want to use HiveContext for your SQL queries and sqlContext only provides a subset of HiveContext. try this val sc = new SparkContext(conf) // Create sqlContext based on HiveContext val sqlContext = new HiveContext(sc

HiveContext standalone => without a Hive metastore

2016-05-26 Thread Gerard Maas
Hi, I'm helping some folks setting up an analytics cluster with Spark. They want to use the HiveContext to enable the Window functions on DataFrames(*) but they don't have any Hive installation, nor they need one at the moment (if not necessary for this feature) When we try to create a Hive

Re: SQLContext and HiveContext parse a query string differently ?

2016-05-13 Thread Hao Ren
Basically, I want to run the following query: select 'a\'b', case(null as Array) However, neither HiveContext and SQLContext can execute it without exception. I have tried sql(select 'a\'b', case(null as Array)) and df.selectExpr("'a\'b'", "case(null as Array)") N

Re:Re:Re: Re:Re: Will the HiveContext cause memory leak ?

2016-05-12 Thread kramer2...@126.com
nctions in the querys. If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26940.html To unsubscribe from Will the HiveContext cause memory leak ?, click here. NAML

Re: Will the HiveContext cause memory leak ?

2016-05-12 Thread Ted Yu
ns in the > querys. > > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26940.html > To unsubscribe from Will the HiveContext cause memory leak

Re:Re: Re:Re: Will the HiveContext cause memory leak ?

2016-05-12 Thread kramer2...@126.com
s in the querys. If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26940.html To unsubscribe from Will the HiveContext cause memory leak ?, click here. NAML -- V

RE: SQLContext and HiveContext parse a query string differently ?

2016-05-12 Thread Yong Zhang
Not sure what do you mean? You want to have one exactly query running fine in both sqlContext and HiveContext? The query parser are different, why do you want to have this feature? Do I understand your question correctly? Yong Date: Thu, 12 May 2016 13:09:34 +0200 Subject: SQLContext

Re: SQLContext and HiveContext parse a query string differently ?

2016-05-12 Thread Mich Talebzadeh
; > object Test extends App { > > val sc = new SparkContext("local[2]", "test", new SparkConf) > val hiveContext = new HiveContext(sc) > val sqlContext = new SQLContext(sc) > > val context = hiveContext > // val context =

SQLContext and HiveContext parse a query string differently ?

2016-05-12 Thread Hao Ren
, SparkContext} object Test extends App { val sc = new SparkContext("local[2]", "test", new SparkConf) val hiveContext = new HiveContext(sc) val sqlContext = new SQLContext(sc) val context = hiveContext // val context = sqlContext import context.implicits._

Re:Re: Will the HiveContext cause memory leak ?

2016-05-11 Thread kramer2...@126.com
-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26930.html To unsubscribe from Will the HiveContext cause memory leak ?, click here. NAML -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-c

Re: Will the HiveContext cause memory leak ?

2016-05-11 Thread kramer2...@126.com
sorry I have to correction again. It may still a memory leak. Because at last the memory usage goes up again... eventually , the stream program crashed. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26933

Re: Will the HiveContext cause memory leak ?

2016-05-11 Thread kramer2...@126.com
is using almost 10 000 times of memory than my workload. Does that mean I need prepare 1TB RAM if the workload is 100M? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Will-the-HiveContext-cause-memory-leak-tp26921p26927.html Sent from the Apache Spark User

Re:Re: Will the HiveContext cause memory leak ?

2016-05-10 Thread 李明伟
ot;) It looks a little complicated but it is just some Window function on dataframe. I use the HiveContext because SQLContext do not support window function yet. Without the 4 line, my code can run all night. Adding them will cause the memory leak. Program will crash in a few hours. I will provided the whol

Re: Will the HiveContext cause memory leak ?

2016-05-10 Thread Ted Yu
e'],dataframe['bits'], > rank.alias('rank')).filter("rank<=2") > > It looks a little complicated but it is just some Window function on > dataframe. I use the HiveContext because SQLContext do not support window > function yet. Without the 4 line, my code can run all nig

Will the HiveContext cause memory leak ?

2016-05-10 Thread kramer2...@126.com
t;) It looks a little complicated but it is just some Window function on dataframe. I use the HiveContext because SQLContext do not support window function yet. Without the 4 line, my code can run all night. Adding them will cause the memory leak. Program will crash in a few hours. I will provided the w

Re: How to stop hivecontext

2016-04-15 Thread Ted Yu
You can call stop() method. > On Apr 15, 2016, at 5:21 AM, ram kumar <ramkumarro...@gmail.com> wrote: > > Hi, > I started hivecontext as, > > val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc); > > I want to sto

How to stop hivecontext

2016-04-15 Thread ram kumar
Hi, I started hivecontext as, val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc); I want to stop this sql context Thanks

HiveContext in spark

2016-04-12 Thread Selvam Raman
I Could not able to use Insert , update and delete command in HiveContext. i am using spark 1.6.1 version and hive 1.1.0 Please find the error below. ​scala> hc.sql("delete from trans_detail where counter=1"); 16/04/12 14:58:45 INFO ParseDriver: Parsing command: delete from

Re: HiveContext unable to recognize the delimiter of Hive table in textfile partitioned by date

2016-04-11 Thread Shiva Achari
Hi All, In the above scenario if the field delimiter is default of hive then Spark is able to parse the data as expected , hence i believe this is a bug. ​Regards, Shiva Achari​ On Tue, Apr 5, 2016 at 8:15 PM, Shiva Achari wrote: > Hi, > > I have created a hive

Spark demands HiveContext but I use only SqlContext

2016-04-11 Thread AlexModestov
None.org.apache.spark.sql.hive.HiveContext. : java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-demands-HiveContext-but-I

HiveContext unable to recognize the delimiter of Hive table in textfile partitioned by date

2016-04-05 Thread Shiva Achari
Hi, I have created a hive external table stored as textfile partitioned by event_date Date. How do we have to specify a specific format of csv while reading in spark from Hive table ? The environment is 1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version 2.10.4(Java HotSpot(TM) 64 - Bit

Spark SQL(Hive query through HiveContext) always creating 31 partitions

2016-04-04 Thread nitinkak001
I am running hive queries using HiveContext from my Spark code. No matter which query I run and how much data it is, it always generates 31 partitions. Anybody knows the reason? Is there a predefined/configurable setting for it? I essentially need more partitions. I using this code snippet

FW: How to get the singleton instance of SQLContext/HiveContext: val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)‏

2016-03-04 Thread Jelez Raditchkov
From: je...@hotmail.com To: yuzhih...@gmail.com Subject: RE: How to get the singleton instance of SQLContext/HiveContext: val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)‏ Date: Fri, 4 Mar 2016 14:09:20 -0800 Below code is from the soruces, is this what you ask? class

Re: How to get the singleton instance of SQLContext/HiveContext: val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)‏

2016-03-04 Thread Ted Yu
bq. However the method does not seem inherited to HiveContext. Can you clarify the above observation ? HiveContext extends SQLContext . On Fri, Mar 4, 2016 at 1:23 PM, jelez <je...@hotmail.com> wrote: > What is the best approach to use getOrCreate for streaming job with >

How to get the singleton instance of SQLContext/HiveContext: val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)‏

2016-03-04 Thread jelez
What is the best approach to use getOrCreate for streaming job with HiveContext. It seems for SQLContext the recommended approach is to use getOrCreate: https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations val sqlContext = SQLContext.getOrCreate

How to get the singleton instance of SQLContext/HiveContext: val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)

2016-03-04 Thread Jelez Raditchkov
What is the best approach to use getOrCreate for streaming job with HiveContext.It seems for SQLContext the recommended approach is to use getOrCreate: https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operationsval sqlContext =

Re: SPARK SQL HiveContext Error

2016-03-01 Thread Gourav Sengupta
ia articles >>> val count = succinctRDD.count("the") >>> >>> // Now suppose we want to find all offsets in the collection at which >>> ìBerkeleyî occurs; and >>> // create an RDD containing all resulting offsets >>> val o

Re: SPARK SQL HiveContext Error

2016-03-01 Thread Gourav Sengupta
me", StringType, false), > StructField("Length", IntegerType, true), > StructField("Area", DoubleType, false), > StructField("Airport", BooleanType, true))) > > // Create an RDD of Rows with some data > val cityRDD = sc.parallelize(Seq( > Row("San Francisco", 12, 44.52, true), > Row("Palo Alto", 12, 22.33, false), > Row("Munich", 8, 3.14, true))) > > > val hiveContext = new HiveContext(sc) > > //val sqlContext = new org.apache.spark.sql.SQLContext(sc) > > } > } > > > - > > > > Regards, > Gourav Sengupta >

SPARK SQL HiveContext Error

2016-03-01 Thread Gourav Sengupta
e, false), StructField("Length", IntegerType, true), StructField("Area", DoubleType, false), StructField("Airport", BooleanType, true))) // Create an RDD of Rows with some data val cityRDD = sc.parallelize(Seq( Row("San

Re: Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Gavin Yue
This sqlContext is one instance of hive context, do not be confused by the name. > On Feb 16, 2016, at 12:51, Prabhu Joseph <prabhujose.ga...@gmail.com> wrote: > > Hi All, > > On creating HiveContext in spark-shell, fails with > > Caused by: ERROR XSDB6: An

Re: Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Prabhu Joseph
he.spark.sql.hive.HiveContext] > > res0: Boolean = true > > > > On Mon, Feb 15, 2016 at 8:51 PM, Prabhu Joseph <prabhujose.ga...@gmail.com > > wrote: > >> Hi All, >> >> On creating HiveContext in spark-shell, fails with >> >> Caused by:

Re: Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Mark Hamstra
:help for more information. scala> sqlContext.isInstanceOf[org.apache.spark.sql.hive.HiveContext] res0: Boolean = true On Mon, Feb 15, 2016 at 8:51 PM, Prabhu Joseph <prabhujose.ga...@gmail.com> wrote: > Hi All, > > On creating HiveContext in spark-shell, fails with > >

Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Prabhu Joseph
Hi All, On creating HiveContext in spark-shell, fails with Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /SPARK/metastore_db. Spark-Shell already has created metastore_db for SqlContext. Spark context available as sc. SQL context available as sqlContext

Re: [External] Re: Spark 1.6.0 HiveContext NPE

2016-02-08 Thread Shipper, Jay [USA]
@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: [External] Re: Spark 1.6.0 HiveContext NPE Was there any other exception(s) in the client log ? Just want to find the cause for this NPE. Thanks On Wed, Feb 3, 2016

Re: Spark 1.6.0 HiveContext NPE

2016-02-05 Thread Ted Yu
ception from HiveContext. It’s happening while it > tries to load some tables via JDBC from an external database (not Hive), > using context.read().jdbc(): > > — > java.lang.NullPointerException > at > org.apache.spark.sql.hive.client.ClientWra

Re: [External] Re: Spark 1.6.0 HiveContext NPE

2016-02-04 Thread Ted Yu
27 PM, Shipper, Jay [USA] <shipper_...@bah.com> > wrote: > >> It was just renamed recently: https://github.com/apache/spark/pull/10981 >> >> As SessionState is entirely managed by Spark’s code, it still seems like >> this is a bug with Spark 1.6.0, and not with how

Re: [External] Re: Spark 1.6.0 HiveContext NPE

2016-02-03 Thread Shipper, Jay [USA]
It was just renamed recently: https://github.com/apache/spark/pull/10981 As SessionState is entirely managed by Spark’s code, it still seems like this is a bug with Spark 1.6.0, and not with how our application is using HiveContext. But I’d feel more confident filing a bug if someone else

Re: [External] Re: Spark 1.6.0 HiveContext NPE

2016-02-03 Thread Ted Yu
016 at 12:06 PM > To: "user@spark.apache.org" <user@spark.apache.org> > Subject: Re: [External] Re: Spark 1.6.0 HiveContext NPE > > Right, I could already tell that from the stack trace and looking at > Spark’s code. What I’m trying to determine is why that’s coming back a

Re: [External] Re: Spark 1.6.0 HiveContext NPE

2016-02-03 Thread Ted Yu
code, it still seems like > this is a bug with Spark 1.6.0, and not with how our application is using > HiveContext. But I’d feel more confident filing a bug if someone else > could confirm they’re having this issue with Spark 1.6.0. Ideally, we > should also have some simple proof of

Spark 1.6.0 HiveContext NPE

2016-02-03 Thread Shipper, Jay [USA]
I’m upgrading an application from Spark 1.4.1 to Spark 1.6.0, and I’m getting a NullPointerException from HiveContext. It’s happening while it tries to load some tables via JDBC from an external database (not Hive), using context.read().jdbc(): — java.lang.NullPointerException

Re: [External] Re: Spark 1.6.0 HiveContext NPE

2016-02-03 Thread Shipper, Jay [USA]
016 at 12:04 PM To: Jay Shipper <shipper_...@bah.com<mailto:shipper_...@bah.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: [External] Re: Spark 1.6.0 HiveContext NPE Looks li

Re: Spark 1.6.0 HiveContext NPE

2016-02-03 Thread Ted Yu
> getting a NullPointerException from HiveContext. It’s happening while it > tries to load some tables via JDBC from an external database (not Hive), > using context.read().jdbc(): > > — > java.lang.NullPointerException > at > org.apache.spark.sql.hive.client.ClientWra

Re: [External] Re: Spark 1.6.0 HiveContext NPE

2016-02-03 Thread Shipper, Jay [USA]
.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: [External] Re: Spark 1.6.0 HiveContext NPE Right, I could already tell that from the stack trace and looking at Spark’s code. What I’m trying to determine is why that’s coming back as null now, ju

Sharing HiveContext in Spark JobServer / getOrCreate

2016-01-25 Thread Deenar Toraskar
Hi I am using a shared sparkContext for all of my Spark jobs. Some of the jobs use HiveContext, but there isn't a getOrCreate method on HiveContext which will allow reuse of an existing HiveContext. Such a method exists on SQLContext only (def getOrCreate(sparkContext: SparkContext): SQLContext

Re: Sharing HiveContext in Spark JobServer / getOrCreate

2016-01-25 Thread Ted Yu
Have you noticed the following method of HiveContext ? * Returns a new HiveContext as new session, which will have separated SQLConf, UDF/UDAF, * temporary tables and SessionState, but sharing the same CacheManager, IsolatedClientLoader * and Hive client (both of execution and metadata

Re: Sharing HiveContext in Spark JobServer / getOrCreate

2016-01-25 Thread Deenar Toraskar
On 25 January 2016 at 21:09, Deenar Toraskar < deenar.toras...@thinkreactive.co.uk> wrote: > No I hadn't. This is useful, but in some cases we do want to share the > same temporary tables between jobs so really wanted a getOrCreate > equivalent on HIveContext. > > Deenar

How HiveContext can read subdirectories

2016-01-07 Thread Arkadiusz Bicz
Hi, Can Spark using HiveContext External Tables read sub-directories? Example: import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql._ import sqlContext.implicits._ //prepare data and create subdirectories with parquet val df = Seq("id1" -> 1, "id2"

  1   2   3   4   >