Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread Sean Owen
That isn't the issue - the table does not exist anyway, but the storage path does. On Tue, Aug 2, 2022 at 6:48 AM Stelios Philippou wrote: > HI Kumba. > > SQL Structure is a bit different for > CREATE OR REPLACE TABLE > > > You can only do the following > CREATE TABLE IF NOT EXISTS > > > > >

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread Stelios Philippou
HI Kumba. SQL Structure is a bit different for CREATE OR REPLACE TABLE You can only do the following CREATE TABLE IF NOT EXISTS https://spark.apache.org/docs/3.3.0/sql-ref-syntax-ddl-create-table-datasource.html On Tue, 2 Aug 2022 at 14:38, Sean Owen wrote: > I don't think "CREATE OR

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread Sean Owen
I don't think "CREATE OR REPLACE TABLE" exists (in SQL?); this isn't a VIEW. Delete the path first; that's simplest. On Tue, Aug 2, 2022 at 12:55 AM Kumba Janga wrote: > Thanks Sean! That was a simple fix. I changed it to "Create or Replace > Table" but now I am getting the following error. I

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread ayan guha
Hi I strongly suggest to use print prepared sqls and try them in raw form. The error you posted points to a syntax error. On Tue, 2 Aug 2022 at 3:56 pm, Kumba Janga wrote: > Thanks Sean! That was a simple fix. I changed it to "Create or Replace > Table" but now I am getting the following

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-01 Thread Kumba Janga
Thanks Sean! That was a simple fix. I changed it to "Create or Replace Table" but now I am getting the following error. I am still researching solutions but so far no luck. ParseException: mismatched input '' expecting {'ADD', 'AFTER', 'ALL', 'ALTER', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE',

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-01 Thread Sean Owen
Pretty much what it says? you are creating a table over a path that already has data in it. You can't do that without mode=overwrite at least, if that's what you intend. On Mon, Aug 1, 2022 at 7:29 PM Kumba Janga wrote: > > >- Component: Spark Delta, Spark SQL >- Level: Beginner >-

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-19 Thread Someshwar Kale
Hi Ram, Have you seen this stackoverflow query and response- https://stackoverflow.com/questions/39685744/apache-spark-how-to-cancel-job-in-code-and-kill-running-tasks if not, please have a look. seems to have a similar problem . *Regards,* *Someshwar Kale* On Fri, May 20, 2022 at 7:34 AM

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-19 Thread Artemis User
WAITFOR is part of the Transact-SQL and it's Microsoft SQL server specific, not supported by Spark SQL.  If you want to impose a delay in a Spark program, you may want to use the thread sleep function in Java or Scala.  Hope this helps... On 5/19/22 1:45 PM, K. N. Ramachandran wrote: Hi

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-19 Thread K. N. Ramachandran
Hi Sean, I'm trying to test a timeout feature in a tool that uses Spark SQL. Basically, if a long-running query exceeds a configured threshold, then the query should be canceled. I couldn't see a simple way to make a "sleep" SQL statement to test the timeout. Instead, I just ran a "select

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-17 Thread Sean Owen
I don't think that is standard SQL? what are you trying to do, and why not do it outside SQL? On Tue, May 17, 2022 at 6:03 PM K. N. Ramachandran wrote: > Gentle ping. Any info here would be great. > > Regards, > Ram > > On Sun, May 15, 2022 at 5:16 PM K. N. Ramachandran > wrote: > >> Hello

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-17 Thread K. N. Ramachandran
Gentle ping. Any info here would be great. Regards, Ram On Sun, May 15, 2022 at 5:16 PM K. N. Ramachandran wrote: > Hello Spark Users Group, > > I've just recently started working on tools that use Apache Spark. > When I try WAITFOR in the spark-sql command line, I just get: > > Error: Error

RE: DataSource API v2 & Spark-SQL

2021-07-02 Thread Lavelle, Shawn
Thanks for following up, I will give this a go! ~ Shawn -Original Message- From: roizaig Sent: Thursday, April 29, 2021 7:42 AM To: user@spark.apache.org Subject: Re: DataSource API v2 & Spark-SQL You can create a custom data source following this blog <http://roizaig.blogs

Re: DataSource API v2 & Spark-SQL

2021-04-29 Thread roizaig
You can create a custom data source following this blog . It shows how to read a java log file using spark v3 api as an example. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: [Spark SQL]: Does Spark SQL can have better performance?

2021-04-29 Thread Mich Talebzadeh
Hi, your query parquetFile = spark.read.parquet("path/to/hdfs") parquetFile.createOrReplaceTempView("parquetFile") spark.sql("SELECT * FROM parquetFile WHERE field1 = 'value' ORDER BY timestamp LIMIT 1") will be lazily evaluated and won't do anything until the sql statement is actioned with

Re: Integration testing Framework Spark SQL Scala

2020-11-02 Thread Lars Albertsson
Hi, Sorry for the very slow reply - I am far behind in my mailing list subscriptions. You'll find a few slides covering the topic in this presentation: https://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458 Video here: https://vimeo.com/192429554 Regards,

RE: DataSource API v2 & Spark-SQL

2020-08-03 Thread Lavelle, Shawn
v2 & Spark-SQL <<<< EXTERNAL email. Do not open links or attachments unless you recognize the sender. If suspicious report here<https://osiinet/Global/IMS/SitePages/Reporting.aspx>. >>>> That's a bad error message. Basically you can't make a spark native catalog

Re: DataSource API v2 & Spark-SQL

2020-08-03 Thread Russell Spitzer
That's a bad error message. Basically you can't make a spark native catalog reference for a dsv2 source. You have to use that Datasources catalog or use the programmatic API. Both dsv1 and dsv2 programattic apis work (plus or minus some options) On Mon, Aug 3, 2020, 7:28 AM Lavelle, Shawn wrote:

Re: Integration testing Framework Spark SQL Scala

2020-02-25 Thread Ruijing Li
Just wanted to follow up on this. If anyone has any advice, I’d be interested in learning more! On Thu, Feb 20, 2020 at 6:09 PM Ruijing Li wrote: > Hi all, > > I’m interested in hearing the community’s thoughts on best practices to do > integration testing for spark sql jobs. We run a lot of

Re: Using Percentile in Spark SQL

2019-11-11 Thread Jerry Vinokurov
I don't think the Spark configuration is what you want to focus on. It's hard to say without knowing the specifics of the job or the data volume, but you should be able to accomplish this with the percent_rank function in SparkSQL and a smart partitioning of the data. If your data has a lot of

Re: Using Percentile in Spark SQL

2019-11-11 Thread Tzahi File
Currently, I'm using the percentile approx function with Hive. I'm looking for a better way to run this function or another way to get the same result with spark, but faster and not using gigantic instances.. I'm trying to optimize this job by changing the Spark configuration. If you have any

Re: Using Percentile in Spark SQL

2019-11-11 Thread Muthu Jayakumar
If you would require higher precision, you may have to write a custom udaf. In my case, I ended up storing the data as a key-value ordered list of histograms. Thanks Muthu On Mon, Nov 11, 2019, 20:46 Patrick McCarthy wrote: > Depending on your tolerance for error you could also use >

Re: Using Percentile in Spark SQL

2019-11-11 Thread Patrick McCarthy
Depending on your tolerance for error you could also use percentile_approx(). On Mon, Nov 11, 2019 at 10:14 AM Jerry Vinokurov wrote: > Do you mean that you are trying to compute the percent rank of some data? > You can use the SparkSQL percent_rank function for that, but I don't think > that's

Re: Using Percentile in Spark SQL

2019-11-11 Thread Jerry Vinokurov
Do you mean that you are trying to compute the percent rank of some data? You can use the SparkSQL percent_rank function for that, but I don't think that's going to give you any improvement over calling the percentRank function on the data frame. Are you currently using a user-defined function for

Re: how to get spark-sql lineage

2019-05-16 Thread Arun Mahadevan
You can check out https://github.com/hortonworks-spark/spark-atlas-connector/ On Wed, 15 May 2019 at 19:44, lk_spark wrote: > hi,all: > When I use spark , if I run some SQL to do ETL how can I get > lineage info. I found that , CDH spark have some config about lineage : >

Re: how to get spark-sql lineage

2019-05-16 Thread Gabor Somogyi
Hi, spark.lineage.enabled is Cloudera specific and doesn't work with vanilla Spark. BR, G On Thu, May 16, 2019 at 4:44 AM lk_spark wrote: > hi,all: > When I use spark , if I run some SQL to do ETL how can I get > lineage info. I found that , CDH spark have some config about lineage :

Re: [Spark SQL]: Does Spark SQL 2.3+ suppor UDT?

2018-11-26 Thread Suny Tyagi
Thanks and Regards, Suny Tyagi Phone No : 9885027192 On Mon, Nov 26, 2018 at 10:31 PM Suny Tyagi wrote: > Hi Team, > > > I was going through this ticket > https://issues.apache.org/jira/browse/SPARK-7768?jql=text%20~%20%22public%20udt%22 > and > could not understand that if spark support UDT

Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

2018-09-28 Thread Thakrar, Jayesh
Not sure I get what you mean…. I ran the query that you had – and don’t get the same hash as you. From: Gokula Krishnan D Date: Friday, September 28, 2018 at 10:40 AM To: "Thakrar, Jayesh" Cc: user Subject: Re: [Spark SQL] why spark sql hash() are returns the same hash value thoug

Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

2018-09-28 Thread Gokula Krishnan D
Hello Jayesh, I have masked the input values with . Thanks & Regards, Gokula Krishnan* (Gokul)* On Wed, Sep 26, 2018 at 2:20 PM Thakrar, Jayesh < jthak...@conversantmedia.com> wrote: > Cannot reproduce your situation. > > Can you share Spark version? > > > > Welcome to > >

Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

2018-09-26 Thread Thakrar, Jayesh
Cannot reproduce your situation. Can you share Spark version? Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92) Type

Re: [SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-18 Thread Arun Khetarpal
Ping. I did some digging around in the code base - I see that this is not present currently. Just looking for an acknowledgement Regards, Arun > On 15-Sep-2017, at 8:43 PM, Arun Khetarpal wrote: > > Hi - > > Wanted to understand if spark sql has GRANT and

Re: [SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-16 Thread Jörn Franke
It depends on the permissions the user has on the local file system or HDFS, so there is no need to have grant/revoke. > On 15. Sep 2017, at 17:13, Arun Khetarpal wrote: > > Hi - > > Wanted to understand if spark sql has GRANT and REVOKE statements available? > Is

Re: [SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-16 Thread Akhil Das
I guess no. I came across a test case where they are marked as Unsupported, you can see it here. However, the one running inside Databricks has support for this.

Re: Apache Drill vs Spark SQL

2017-04-07 Thread Pierce Lamb
Hi Kant, If you are interested in using Spark alongside a database to serve real time queries, there are many options. Almost every popular database has built some sort of connector to Spark. I've listed a majority of them and tried to delineate them in some way in this StackOverflow answer:

Re: Un-exploding / denormalizing Spark SQL help

2017-02-08 Thread Everett Anderson
On Wed, Feb 8, 2017 at 1:14 PM, ayan guha wrote: > Will a sql solution will be acceptable? > I'm very curious to see how it'd be done in raw SQL if you're up for it! I think the 2 programmatic solutions so far are viable, though, too. (By the way, thanks everyone for the

Re: Un-exploding / denormalizing Spark SQL help

2017-02-08 Thread ayan guha
Will a sql solution will be acceptable? On Thu, 9 Feb 2017 at 4:01 am, Xiaomeng Wan wrote: > You could also try pivot. > > On 7 February 2017 at 16:13, Everett Anderson > wrote: > > > > On Tue, Feb 7, 2017 at 2:21 PM, Michael Armbrust

Re: Un-exploding / denormalizing Spark SQL help

2017-02-08 Thread Xiaomeng Wan
You could also try pivot. On 7 February 2017 at 16:13, Everett Anderson wrote: > > > On Tue, Feb 7, 2017 at 2:21 PM, Michael Armbrust > wrote: > >> I think the fastest way is likely to use a combination of conditionals >> (when / otherwise),

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Everett Anderson
On Tue, Feb 7, 2017 at 2:21 PM, Michael Armbrust wrote: > I think the fastest way is likely to use a combination of conditionals > (when / otherwise), first (ignoring nulls), while grouping by the id. > This should get the answer with only a single shuffle. > > Here is an

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Michael Armbrust
I think the fastest way is likely to use a combination of conditionals (when / otherwise), first (ignoring nulls), while grouping by the id. This should get the answer with only a single shuffle. Here is an example

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Jacek Laskowski
Hi Everett, That's pretty much what I'd do. Can't think of a way to beat your solution. Why do you "feel vaguely uneasy about it"? I'd also check out the execution plan (with explain) to see how it's gonna work at runtime. I may have seen groupBy + join be better than window (there were more

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Everett Anderson
On Tue, Feb 7, 2017 at 12:50 PM, Jacek Laskowski wrote: > Hi, > > Could groupBy and withColumn or UDAF work perhaps? I think window could > help here too. > This seems to work, but I do feel vaguely uneasy about it. :) // First add a 'rank' column which is priority order just

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Jacek Laskowski
Hi, Could groupBy and withColumn or UDAF work perhaps? I think window could help here too. Jacek On 7 Feb 2017 8:02 p.m., "Everett Anderson" wrote: > Hi, > > I'm trying to un-explode or denormalize a table like > > +---++-+--++ > |id

Re: Tableau BI on Spark SQL

2017-01-30 Thread Todd Nist
Hi Mich, You could look at http://www.exasol.com/. It works very well with Tableau without the need to extract the data. Also in V6, it has the virtual schemas which would allow you to access data in Spark, Hive, Oracle, or other sources. May be outside of what you are looking for, it works

Re: Tableau BI on Spark SQL

2017-01-30 Thread Jörn Franke
With a lot of data (TB) it is not that good, hence the extraction. Otherwise you have to wait every time you do drag and drop. With the extracts it is better. > On 30 Jan 2017, at 22:59, Mich Talebzadeh wrote: > > Thanks Jorn, > > So Tableau uses its own in-memory

Re: Tableau BI on Spark SQL

2017-01-30 Thread Mich Talebzadeh
Thanks Jorn, So Tableau uses its own in-memory representation as I guessed. Now the question is how is performance accessing data in Oracle tables> Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Tableau BI on Spark SQL

2017-01-30 Thread Jörn Franke
Depending on the size of the data i recommend to schedule regularly an extract in tableau. There tableau converts it to an internal in-memory representation outside of Spark (can also exist on disk if memory is too small) and then use it within Tableau. Accessing directly the database is not

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
Hi All, If I modify the code to below The hive UDF is working in spark-sql but it is giving different results..Please let me know difference between these two below codes.. 1) public Object get(Object name) { int pos = getPos((String)name); if(pos<0) return null;

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
How to debug Hive UDfs?! On Jan 24, 2017 5:29 PM, "Sirisha Cheruvu" wrote: > Hi Team, > > I am trying to keep below code in get method and calling that get mthod in > another hive UDF > and running the hive UDF using Hive Context.sql procedure.. > > > switch (f) { > case

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error: java.lang.Double cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable] Getting below error while running hive UDF on spark but the UDF is working perfectly fine in Hive.. public Object get(Object name) {

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-24 Thread Takeshi Yamamuro
Hi, Could you show us the whole code to reproduce that? // maropu On Wed, Jan 25, 2017 at 12:02 AM, Deepak Sharma wrote: > Can you try writing the UDF directly in spark and register it with spark > sql or hive context ? > Or do you want to reuse the existing UDF jar for

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-24 Thread Deepak Sharma
Can you try writing the UDF directly in spark and register it with spark sql or hive context ? Or do you want to reuse the existing UDF jar for hive in spark ? Thanks Deepak On Jan 24, 2017 5:29 PM, "Sirisha Cheruvu" wrote: > Hi Team, > > I am trying to keep below code in

Re: time to run Spark SQL query

2016-11-28 Thread ayan guha
They should take same time if everything else is constant On 28 Nov 2016 23:41, "Hitesh Goyal" wrote: > Hi team, I am using spark SQL for accessing the amazon S3 bucket data. > > If I run a sql query by using normal SQL syntax like below > > 1) DataFrame

Re: How to use Spark SQL to connect to Cassandra from Spark-Shell?

2016-11-11 Thread Russell Spitzer
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md Has all the information about Dataframes/ SparkSql On Fri, Nov 11, 2016 at 8:52 AM kant kodali wrote: > Wait I cannot create CassandraSQLContext from spark-shell. is this only > for

Re: How to use Spark SQL to connect to Cassandra from Spark-Shell?

2016-11-11 Thread kant kodali
Wait I cannot create CassandraSQLContext from spark-shell. is this only for enterprise versions? Thanks! On Fri, Nov 11, 2016 at 8:14 AM, kant kodali wrote: > https://academy.datastax.com/courses/ds320-analytics- > apache-spark/spark-sql-spark-sql-basics > > On Fri, Nov 11,

Re: How to use Spark SQL to connect to Cassandra from Spark-Shell?

2016-11-11 Thread kant kodali
https://academy.datastax.com/courses/ds320-analytics-apache-spark/spark-sql-spark-sql-basics On Fri, Nov 11, 2016 at 8:11 AM, kant kodali wrote: > Hi, > > This is spark-cassandra-connector > but I am looking > more for

Re: How to use Spark SQL to connect to Cassandra from Spark-Shell?

2016-11-11 Thread kant kodali
Hi, This is spark-cassandra-connector but I am looking more for how to use SPARK SQL and expose as a JDBC server for Cassandra. Thanks! On Fri, Nov 11, 2016 at 8:07 AM, Yong Zhang wrote: > Read the document on

Re: How to use Spark SQL to connect to Cassandra from Spark-Shell?

2016-11-11 Thread Yong Zhang
Read the document on https://github.com/datastax/spark-cassandra-connector Yong From: kant kodali Sent: Friday, November 11, 2016 11:04 AM To: user @spark Subject: How to use Spark SQL to connect to Cassandra from Spark-Shell? How to use

Re: Efficient filtering on Spark SQL dataframes with ordered keys

2016-11-02 Thread Michael David Pedersen
Awesome, thank you Michael for the detailed example! I'll look into whether I can use this approach for my use case. If so, I could avoid the overhead of repeatedly registering a temp table for one-off queries, instead registering the table once and relying on the injected strategy. Don't know

Re: Efficient filtering on Spark SQL dataframes with ordered keys

2016-11-01 Thread Michael Armbrust
registerTempTable is backed by an in-memory hash table that maps table name (a string) to a logical query plan. Fragments of that logical query plan may or may not be cached (but calling register alone will not result in any materialization of results). In Spark 2.0 we renamed this function to

Re: Efficient filtering on Spark SQL dataframes with ordered keys

2016-11-01 Thread Mich Talebzadeh
it would be great if we establish this. I know in Hive these temporary tables "CREATE TEMPRARY TABLE ..." are private to the session and are put in a hidden staging directory as below /user/hive/warehouse/.hive-staging_hive_2016-07-10_22-58-47_319_5605745346163312826-10 and removed when the

Re: Efficient filtering on Spark SQL dataframes with ordered keys

2016-11-01 Thread Michael David Pedersen
Thanks for the link, I hadn't come across this. According to https://forums.databricks.com/questions/400/what-is-the- > difference-between-registertemptable-a.html > > and I quote > > "registerTempTable() > > registerTempTable() creates an in-memory table that is scoped to the > cluster in which

Re: Efficient filtering on Spark SQL dataframes with ordered keys

2016-11-01 Thread Mich Talebzadeh
A bit of gray area here I am afraid, I was trying to experiment with it According to https://forums.databricks.com/questions/400/what-is-the-difference-between-registertemptable-a.html and I quote "registerTempTable() registerTempTable() creates an in-memory table that is scoped to the cluster

Re: Efficient filtering on Spark SQL dataframes with ordered keys

2016-11-01 Thread Michael David Pedersen
Hi again Mich, "But the thing is that I don't explicitly cache the tempTables ..". > > I believe tempTable is created in-memory and is already cached > That surprises me since there is a sqlContext.cacheTable method to explicitly cache a table in memory. Or am I missing something? This could

Re: Efficient filtering on Spark SQL dataframes with ordered keys

2016-10-31 Thread Mich Talebzadeh
well I suppose one can drop tempTable as below scala> df.registerTempTable("tmp") scala> spark.sql("select count(1) from tmp").show ++ |count(1)| ++ | 904180| ++ scala> spark.sql("drop table if exists tmp") res22: org.apache.spark.sql.DataFrame = [] Also your point

Re: Efficient filtering on Spark SQL dataframes with ordered keys

2016-10-31 Thread Michael David Pedersen
Hi Mich, Thank you again for your reply. As I see you are caching the table already sorted > > val keyValRDDSorted = keyValRDD.sortByKey().cache > > and the next stage is you are creating multiple tempTables (different > ranges) that cache a subset of rows already cached in RDD. The data stored

Re: Efficient filtering on Spark SQL dataframes with ordered keys

2016-10-31 Thread Mich Talebzadeh
Hi Michael, As I see you are caching the table already sorted val keyValRDDSorted = keyValRDD.sortByKey().cache and the next stage is you are creating multiple tempTables (different ranges) that cache a subset of rows already cached in RDD. The data stored in tempTable is in Hive columnar

Re: Efficient filtering on Spark SQL dataframes with ordered keys

2016-10-31 Thread Michael David Pedersen
Hi Mich, Thank you for your quick reply! What type of table is the underlying table? Is it Hbase, Hive ORC or what? > It is a custom datasource, but ultimately backed by HBase. > By Key you mean a UNIQUE ID or something similar and then you do multiple > scans on the tempTable which stores

Re: Efficient filtering on Spark SQL dataframes with ordered keys

2016-10-31 Thread Mich Talebzadeh
Hi Michael. What type of table is the underlying table? Is it Hbase, Hive ORC or what? By Key you mean a UNIQUE ID or something similar and then you do multiple scans on the tempTable which stores data using in-memory columnar format. That is the optimisation of tempTable storage as far as I

Re: PostgresSql queries vs spark sql

2016-10-23 Thread Selvam Raman
I found it. We can use pivot which is similar to cross tab In postgres. Thank you. On Oct 17, 2016 10:00 PM, "Selvam Raman" wrote: > Hi, > > Please share me some idea if you work on this earlier. > How can i develop postgres CROSSTAB function in spark. > > Postgres Example > >

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-14 Thread Cody Koeninger
I can't be sure, no. On Fri, Oct 14, 2016 at 3:06 AM, Julian Keppel wrote: > Okay, thank you! Can you say, when this feature will be released? > > 2016-10-13 16:29 GMT+02:00 Cody Koeninger : >> >> As Sean said, it's unreleased. If you want to try

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-14 Thread Julian Keppel
Okay, thank you! Can you say, when this feature will be released? 2016-10-13 16:29 GMT+02:00 Cody Koeninger : > As Sean said, it's unreleased. If you want to try it out, build spark > > http://spark.apache.org/docs/latest/building-spark.html > > The easiest way to include

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-13 Thread Cody Koeninger
As Sean said, it's unreleased. If you want to try it out, build spark http://spark.apache.org/docs/latest/building-spark.html The easiest way to include the jar is probably to use mvn install to put it in your local repository, then link it in your application's mvn or sbt build file as

Re: OOM when running Spark SQL by PySpark on Java 8

2016-10-13 Thread Shady Xu
All nodes of my YARN cluster is running on Java 7, but I submit the job from a Java 8 client. I realised I run the job in yarn cluster mode and that's why setting ' --driver-java-options' is effective. Now the problem is, why submitting a job from a Java 8 client to a Java 7 cluster causes a

Re: OOM when running Spark SQL by PySpark on Java 8

2016-10-13 Thread Sean Owen
You can specify it; it just doesn't do anything but cause a warning in Java 8. It won't work in general to have such a tiny PermGen. If it's working it means you're on Java 8 because it's ignored. You should set MaxPermSize if anything, not PermSize. However the error indicates you are not using

Re: OOM when running Spark SQL by PySpark on Java 8

2016-10-13 Thread Shady Xu
Solved the problem by specifying the PermGen size when submitting the job (even to just a few MB). Seems Java 8 has removed the Permanent Generation space, thus corresponding JVM arguments are ignored. But I can still use --driver-java-options "-XX:PermSize=80M -XX:MaxPermSize=100m" to specify

Re: OOM when running Spark SQL by PySpark on Java 8

2016-10-13 Thread Sean Owen
The error doesn't say you're out of memory, but says you're out of PermGen. If you see this, you aren't running Java 8 AFAIK, because 8 has no PermGen. But if you're running Java 7, and you go investigate what this error means, you'll find you need to increase PermGen. This is mentioned in the

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-13 Thread Mich Talebzadeh
add --jars /spark-streaming-kafka_2.10-1.5.1.jar (may need to download the jar file or any newer version) to spark-shell. I also have spark-streaming-kafka-assembly_2.10-1.6.1.jar as well on --jar list HTH Dr Mich Talebzadeh LinkedIn *

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-13 Thread Sean Owen
I don't believe that's been released yet. It looks like it was merged into branches about a week ago. You're looking at unreleased docs too - have a look at http://spark.apache.org/docs/latest/ for the latest released docs. On Thu, Oct 13, 2016 at 9:24 AM JayKay

Re: java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset

2016-10-07 Thread kant kodali
perfect! That fixes it all! On Fri, Oct 7, 2016 1:29 AM, Denis Bolshakov bolshakov.de...@gmail.com wrote: You need to have spark-sql, now you are missing it. 7 Окт 2016 г. 11:12 пользователь "kant kodali" написал: Here are the jar files on my classpath after doing a

Re: java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset

2016-10-05 Thread kant kodali
I am running locally so they all are on one host On Wed, Oct 5, 2016 3:12 PM, Jakob Odersky ja...@odersky.com wrote: Are all spark and scala versions the same? By "all" I mean the master, worker and driver instances.

Re: ArrayType support in Spark SQL

2016-09-25 Thread Koert Kuipers
not pretty but this works: import org.apache.spark.sql.functions.udf df.withColumn("array", sqlf.udf({ () => Seq(1, 2, 3) }).apply()) On Sun, Sep 25, 2016 at 6:13 PM, Jason White wrote: > It seems that `functions.lit` doesn't support ArrayTypes. To reproduce: > >

Re: Reporting errors from spark sql

2016-08-21 Thread Jacek Laskowski
Hi, See https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala#L65 to learn how Spark SQL parses SQL texts. It could give you a way out. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache

Re: read parquetfile in spark-sql error

2016-07-25 Thread cj
..@qq.com>; Cc: "user"<user@spark.apache.org>; "lian.cs.zju"<lian.cs@gmail.com>; Subject: Re: read parquetfile in spark-sql error I hope the below sample helps you: val parquetDF = hiveContext.read.parquet("hdfs://.parquet") parquetDF.re

Re: read parquetfile in spark-sql error

2016-07-25 Thread Takeshi Yamamuro
Hi, Seems your query was not consist with the HQL syntax. you'd better off re-checking the definitions: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable // maropu On Mon, Jul 25, 2016 at 11:36 PM, Kabeer Ahmed wrote: >

Re: read parquetfile in spark-sql error

2016-07-25 Thread Kabeer Ahmed
I hope the below sample helps you: val parquetDF = hiveContext.read.parquet("hdfs://.parquet") parquetDF.registerTempTable("parquetTable") sql("SELECT * FROM parquetTable").collect().foreach(println) Kabeer. Sent from Nylas

Re: XML Processing using Spark SQL

2016-05-12 Thread Mail.com
Hi Arun, Could you try using Stax or JaxB. Thanks, Pradeep > On May 12, 2016, at 8:35 PM, Hyukjin Kwon wrote: > > Hi Arunkumar, > > > I guess your records are self-closing ones. > > There is an issue open here, https://github.com/databricks/spark-xml/issues/92 > >

Re: XML Processing using Spark SQL

2016-05-12 Thread Hyukjin Kwon
Hi Arunkumar, I guess your records are self-closing ones. There is an issue open here, https://github.com/databricks/spark-xml/issues/92 This is about XmlInputFormat.scala and it seems a bit tricky to handle the case so I left open until now. Thanks! 2016-05-13 5:03 GMT+09:00 Arunkumar

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Kevin Peng
counts. They are basically > pulling > >>>>>> out > >>>>>> selected columns from the query, but there is no roll up happening > or > >>>>>> anything that would possible make it suspicious that there is any > >>>>

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Michael Segel
re >>>>>> producing >>>>>> the same counts, the natural suspicions is that the tables are >>>>>> identical, >>>>>> but I when I run the following two queries: >>>>>> >>>>>> scala> sqlCont

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Kevin Peng
tered by date being above 2016-01-03. Since all the joins are > >>>> > producing > >>>> > the same counts, the natural suspicions is that the tables are > >>>> > identical, > >>>> > but I when I run the following two q

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Davies Liu
n_promo_lt where date >>>> >>='2016-01-03'").count >>>> > >>>> > res14: Long = 34158 >>>> > >>>> > scala> sqlContext.sql("select * from dps_pin_promo_lt where date >>>> >>='2016-01-03'").count >&

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Cesar Flores
gt; >>> > The above two queries filter out the data based on date used by the >>> joins of >>> > 2016-01-03 and you can see the row count between the two tables are >>> > different, which is why I am suspecting something is wrong with the >>> outer &g

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Gourav Sengupta
wrong counts for >> > dps.count, the real value is res16: Long = 42694 >> > >> > >> > Thanks, >> > >> > >> > KP >> > >> > >> > >> > >> > On Mon, May 2, 2016 at 12:50 PM, Yong Zhang <java8...@hotmail.

Re: parquet table in spark-sql

2016-05-03 Thread Sandeep Nemuri
We don't need any delimiters for Parquet file format. ᐧ On Tue, May 3, 2016 at 5:31 AM, Varadharajan Mukundan wrote: > Hi, > > Yes, it is not needed. Delimiters are need only for text files. > > On Tue, May 3, 2016 at 12:49 PM, 喜之郎 <251922...@qq.com> wrote: > >> hi, I

Re: parquet table in spark-sql

2016-05-03 Thread Varadharajan Mukundan
Hi, Yes, it is not needed. Delimiters are need only for text files. On Tue, May 3, 2016 at 12:49 PM, 喜之郎 <251922...@qq.com> wrote: > hi, I want to ask a question about parquet table in spark-sql table. > > I think that parquet have schema information in its own file. > so you don't need define

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Kevin Peng
gt; >> For dps with 42632 rows, and swig with 42034 rows, if dps full outer > join > >> with swig on 3 columns; with additional filters, get the same resultSet > row > >> count as dps lefter outer join with swig on 3 columns, with additional > >> filters,

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Davies Liu
same resultSet row count as dps right outer join >> with swig on 3 columns, with same additional filters. >> >> Without knowing your data, I cannot see the reason that has to be a bug in >> the spark. >> >> Am I misunderstanding your bug? >> >> Yong >

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Gourav Sengupta
gt;> >> Without knowing your data, I cannot see the reason that has to be a bug >> in the spark. >> >> Am I misunderstanding your bug? >> >> Yong >> >> -- >> From: kpe...@gmail.com >> Date: Mon, 2 May 201

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Kevin Peng
ight outer join > with swig on 3 columns, with same additional filters. > > Without knowing your data, I cannot see the reason that has to be a bug in > the spark. > > Am I misunderstanding your bug? > > Yong > > ------ > From: kpe...@gmail.com

RE: Weird results with Spark SQL Outer joins

2016-05-02 Thread Yong Zhang
: Mon, 2 May 2016 12:11:18 -0700 Subject: Re: Weird results with Spark SQL Outer joins To: gourav.sengu...@gmail.com CC: user@spark.apache.org Gourav, I wish that was case, but I have done a select count on each of the two tables individually and they return back different number of rows

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Kevin Peng
Gourav, I wish that was case, but I have done a select count on each of the two tables individually and they return back different number of rows: dps.registerTempTable("dps_pin_promo_lt") swig.registerTempTable("swig_pin_promo_lt") dps.count() RESULT: 42632 swig.count() RESULT: 42034

  1   2   3   >