Re: Issue with Materialized Views in Spark SQL

2024-05-03 Thread Mich Talebzadeh
provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun&

Re: Issue with Materialized Views in Spark SQL

2024-05-03 Thread Mich Talebzadeh
quote "one test result is worth one-thousand expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". On Fri, 3 May 2024 at 00:54, Mich Talebzadeh wrote: > An issue I encountered while wor

Re: Issue with Materialized Views in Spark SQL

2024-05-02 Thread Jungtaek Lim
e. You can initiate a feature request and wish the community to include that into the roadmap. On Fri, May 3, 2024 at 12:01 PM Mich Talebzadeh wrote: > An issue I encountered while working with Materialized Views in Spark SQL. > It appears that there is an inconsistency between the behavior o

Re: Issue with Materialized Views in Spark SQL

2024-05-02 Thread Walaa Eldin Moustafa
Thanks, Walaa. On Thu, May 2, 2024 at 4:55 PM Mich Talebzadeh wrote: > An issue I encountered while working with Materialized Views in Spark SQL. > It appears that there is an inconsistency between the behavior of > Materialized Views in Spark SQL and Hive. > > When attemp

Issue with Materialized Views in Spark SQL

2024-05-02 Thread Mich Talebzadeh
An issue I encountered while working with Materialized Views in Spark SQL. It appears that there is an inconsistency between the behavior of Materialized Views in Spark SQL and Hive. When attempting to execute a statement like DROP MATERIALIZED VIEW IF EXISTS test.mv in Spark SQL, I encountered

How to use Structured Streaming in Spark SQL

2024-04-22 Thread ????
In Flink, you can create flow calculation tables using Flink SQL, and directly connect with SQL through CDC and Kafka. How to use SQL for flow calculation in Spark 308027...@qq.com

回复:Validate spark sql

2023-12-25 Thread tianlangstudio
s://www.tianlang.tech/ > -- 发件人:ram manickam 发送时间:2023年12月25日(星期一) 12:58 收件人:Mich Talebzadeh 抄 送:Nicholas Chammas; user 主 题:Re: Validate spark sql Thanks Mich, Nicholas. I tried looking over the stack overflow post and none of them Seems to cov

SQL GROUP BY alias with dots, was: Spark SQL question

2023-02-07 Thread Enrico Minack
Hi, you are right, that is an interesting question. Looks like GROUP BY is doing something funny / magic here (spark-shell 3.3.1 and 3.5.0-SNAPSHOT): With an alias, it behaves as you have pointed out: spark.range(3).createTempView("ids_without_dots") spark.sql("SELECT * FROM

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-18 Thread Sean Owen
Taking this of list Start here: https://github.com/apache/spark/blob/70ec696bce7012b25ed6d8acec5e2f3b3e127f11/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala#L144 Look at subclasses of JdbcDialect too, like TeradataDialect. Note that you are using an old unsupported version

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-17 Thread Sean Owen
>> >>>>> Its impact the performance. Can we any alternate solution for this. >>>>> >>>>> Thanks, >>>>> Rama >>>>> >>>>> >>>>> On Thu, Nov 17, 2022, 10:17 PM Sean Owen wrote: >>>>>

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-17 Thread Sean Owen
xistence of the table upfront. >>>> It is nearly a no-op query; can it have a perf impact? >>>> >>>> On Thu, Nov 17, 2022 at 10:42 AM Ramakrishna Rayudu < >>>> ramakrishna560.ray...@gmail.com> wrote: >>>> >>>>> Hi Team, >>>

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-17 Thread Ramakrishna Rayudu
t;> On Thu, Nov 17, 2022 at 10:42 AM Ramakrishna Rayudu < >>> ramakrishna560.ray...@gmail.com> wrote: >>> >>>> Hi Team, >>>> >>>> I am facing one issue. Can you please help me on this. >>>> >>>> <https://stackoverflow

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-17 Thread Sean Owen
on this. >>> >>> <https://stackoverflow.com/> >>> >>>1. >>> >>> >>> <https://stackoverflow.com/posts/74477662/timeline> >>> >>> We are connecting Tera data from spark SQL with below API >>> >&g

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-17 Thread Sean Owen
his. > > <https://stackoverflow.com/> > >1. > > > <https://stackoverflow.com/posts/74477662/timeline> > > We are connecting Tera data from spark SQL with below API > > Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, > connectionPropertie

[Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

2022-11-17 Thread Ramakrishna Rayudu
Hi Team, I am facing one issue. Can you please help me on this. <https://stackoverflow.com/> 1. <https://stackoverflow.com/posts/74477662/timeline> We are connecting Tera data from spark SQL with below API Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, connecti

Re: Why the same INSERT OVERWRITE sql , final table file produced by spark sql is larger than hive sql?

2022-10-12 Thread Chartist
| From | Sadha Chilukoori | | Date | 10/12/2022 08:27 | | To | Chartist<13289341...@163.com> | | Cc | | | Subject | Re: Why the same INSERT OVERWRITE sql , final table file produced by spark sql is larger than hive sql? | I have faced the same problem, where hive and spark orc were using the

Re: Why the same INSERT OVERWRITE sql , final table file produced by spark sql is larger than hive sql?

2022-10-11 Thread Sadha Chilukoori
arts'='1', > 'spark.sql.sources.schema.part.0'=‘xxx SOME OMITTED CONTENT xxx', > 'spark.sql.sources.schema.partCol.0'='pt', > 'transient_lastDdlTime'='1653484849’) > > *ENV:* > hive version 2.1.1 > spark version 2.4.4 > > *hadoop fs -du -h Result:* > *[hive sql]: * > *735.2 M /user/

Why the same INSERT OVERWRITE sql , final table file produced by spark sql is larger than hive sql?

2022-10-11 Thread Chartist
’) ENV: hive version 2.1.1 spark version 2.4.4 hadoop fs -du -h Result: [hive sql]: 735.2 M /user/hive/warehouse/mytable/pt=20220518 [spark sql]: 1.1 G /user/hive/warehouse/mytable/pt=20220518 How could this happened? And if this is caused by the different version of orc? Any replies

depolying stage-level scheduling for Spark SQL and how to expose RDD code from Spark SQL?

2022-09-29 Thread Chenghao Lyu
Hi, I am trying to deploy the stage-level scheduling for Spark SQL. Since the current stage-level scheduling only supports the RDD-APIs, I want to expose the RDD transformation code from my Spark SQL code (with SQL syntax). Can you provide any pointers on how to do it? Stage level scheduling

Re: EXT: Re: Spark SQL

2022-09-15 Thread Vibhor Gupta
nction, does the underlying thread get killed when a TimeoutExc... stackoverflow.com  Regards, Vibhor From: Gourav Sengupta Sent: Thursday, September 15, 2022 10:22 PM To: Mayur Benodekar Cc: user ; i...@spark.apache.org Subject: EXT: Re: Spark SQL EXTERNAL: Repor

Re: [Spark SQL] Omit Create Table Statement in Spark Sql

2022-08-09 Thread pengyh
you have to saveAsTable or view to make a SQL query. As the title, does Spark Sql have a feature like Flink Catalog to omit `Create Table` statement, and write sql query directly ? - To unsubscribe e-mail: user-unsubscr

[Spark SQL] Omit Create Table Statement in Spark Sql

2022-08-09 Thread 阿强
As the title, does Spark Sql have a feature like Flink Catalog to omit `Create Table` statement, and write sql query directly ? | | comeonyfzhu | | comeonyf...@163.com |

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread Sean Owen
> 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', >>> 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', IDENTIFIER, >>> BACKQUOTED_IDENTIFIER}(line 1, pos 23) >>> >>> == SQL == >>> CREATE OR REPLACE TABL

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread Stelios Philippou
EW', 'VIEWS', >> 'WHEN', 'WHERE', 'WINDOW', 'WITH', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line >> 1, pos 23) >> >> == SQL == >> CREATE OR REPLACE TABLE >> >> >> On Mon, Aug 1, 2022 at 8:32 PM Sean Owen wrote: >> >>> Pretty muc

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread Sean Owen
gt; > On Mon, Aug 1, 2022 at 8:32 PM Sean Owen wrote: > >> Pretty much what it says? you are creating a table over a path that >> already has data in it. You can't do that without mode=overwrite at least, >> if that's what you intend. >> >> On Mon, Aug 1, 2022

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-02 Thread ayan guha
> > On Mon, Aug 1, 2022 at 8:32 PM Sean Owen wrote: > >> Pretty much what it says? you are creating a table over a path that >> already has data in it. You can't do that without mode=overwrite at least, >> if that's what you intend. >> >> On Mon, Aug 1, 2022 at 7:29 PM

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-01 Thread Kumba Janga
data in it. You can't do that without mode=overwrite at least, > if that's what you intend. > > On Mon, Aug 1, 2022 at 7:29 PM Kumba Janga wrote: > >> >> >>- Component: Spark Delta, Spark SQL >>- Level: Beginner >>- Scenario: Debug, How-to

Re: [pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-01 Thread Sean Owen
Pretty much what it says? you are creating a table over a path that already has data in it. You can't do that without mode=overwrite at least, if that's what you intend. On Mon, Aug 1, 2022 at 7:29 PM Kumba Janga wrote: > > >- Component: Spark Delta, Spark SQL >- Lev

[pyspark delta] [delta][Spark SQL]: Getting an Analysis Exception. The associated location (path) is not empty

2022-08-01 Thread Kumba Janga
- Component: Spark Delta, Spark SQL - Level: Beginner - Scenario: Debug, How-to *Python in Jupyter:* import pyspark import pyspark.sql.functions from pyspark.sql import SparkSession spark = ( SparkSession .builder .appName("programming") .mas

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-19 Thread Someshwar Kale
Artemis User wrote: > WAITFOR is part of the Transact-SQL and it's Microsoft SQL server > specific, not supported by Spark SQL. If you want to impose a delay in a > Spark program, you may want to use the thread sleep function in Java or > Scala. Hope this helps... > > On 5/1

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-19 Thread Artemis User
WAITFOR is part of the Transact-SQL and it's Microsoft SQL server specific, not supported by Spark SQL.  If you want to impose a delay in a Spark program, you may want to use the thread sleep function in Java or Scala.  Hope this helps... On 5/19/22 1:45 PM, K. N. Ramachandran wrote: Hi Sean

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-19 Thread K. N. Ramachandran
Hi Sean, I'm trying to test a timeout feature in a tool that uses Spark SQL. Basically, if a long-running query exceeds a configured threshold, then the query should be canceled. I couldn't see a simple way to make a "sleep" SQL statement to test the timeout. Instead, I just ran a &qu

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-17 Thread Sean Owen
gt; wrote: > >> Hello Spark Users Group, >> >> I've just recently started working on tools that use Apache Spark. >> When I try WAITFOR in the spark-sql command line, I just get: >> >> Error: Error running query: >> org.apache.spark.sql.catalyst.parser.P

Re: [Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-17 Thread K. N. Ramachandran
Gentle ping. Any info here would be great. Regards, Ram On Sun, May 15, 2022 at 5:16 PM K. N. Ramachandran wrote: > Hello Spark Users Group, > > I've just recently started working on tools that use Apache Spark. > When I try WAITFOR in the spark-sql command line, I just get: >

[Spark SQL]: Does Spark SQL support WAITFOR?

2022-05-15 Thread K. N. Ramachandran
Hello Spark Users Group, I've just recently started working on tools that use Apache Spark. When I try WAITFOR in the spark-sql command line, I just get: Error: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'WAITFOR' expecting (.. list of allowed

Re: {EXT} Re: Spark sql slowness in Spark 3.0.1

2022-04-15 Thread Anil Dasari
Hello, DF is checkpointed here. So it is written to HDFS. DF is written in paraquet format and used default parallelism. Thanks. From: wilson Date: Thursday, April 14, 2022 at 2:54 PM To: user@spark.apache.org Subject: {EXT} Re: Spark sql slowness in Spark 3.0.1 just curious, where to write

Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-12 Thread Venkatesan Muniappan
hi, Does anybody else have a better suggestion for my problem?. Thanks, Venkat 2016173438 On Fri, Mar 11, 2022 at 4:43 PM Venkatesan Muniappan < m.venkatbe...@gmail.com> wrote: > ok. I work for an org where such upgrades take a few months. Not an > immediate task. > > Thanks, > Venkat >

Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-11 Thread Venkatesan Muniappan
ok. I work for an org where such upgrades take a few months. Not an immediate task. Thanks, Venkat 2016173438 On Fri, Mar 11, 2022 at 4:38 PM Mich Talebzadeh wrote: > yes in spark 3.1.1. Best to upgrade it to spark 3+. > > > >view my Linkedin profile >

Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-11 Thread Venkatesan Muniappan
Thank you. I am trying to get the table definition for the existing tables. BTW, the create and show command that you executed, was it on Spark 3.x ? . Thanks, Venkat 2016173438 On Fri, Mar 11, 2022 at 4:28 PM Mich Talebzadeh wrote: > Well I do not know what has changed. However, this should

Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-11 Thread Mich Talebzadeh
yes in spark 3.1.1. Best to upgrade it to spark 3+. view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of

Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-11 Thread Mich Talebzadeh
Well I do not know what has changed. However, this should not affect your work. Try to create table in Spark sqltext: String = CREATE TABLE if not exists test.etcs( ID INT , CLUSTERED INT , SCATTERED INT , RANDOMISED INT , RANDOM_STRING VARCHAR(50) , SMALL_VC

Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-11 Thread Venkatesan Muniappan
Thank you Mich Talebzadeh for your answer. It's good to know that VARCHAR and CHAR are properly showing in Spark 3. Do you know what changed in Spark 3 that made this possible?. Or how can I achieve the same output in Spark 2.4.1? If there are some conf options, that would be helpful. Thanks,

Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-11 Thread Mich Talebzadeh
Hive 3.1.1 Spark 3.1.1 Your stack overflow issue raised and I quote: "I have a need to generate DDL statements for Hive tables & views programmatically. I tried using Spark and Beeline for this task. Beeline takes around 5-10 seconds for each of the statements whereas Spark completes the same

Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

2022-03-11 Thread Venkatesan Muniappan
hi Spark Team, I have raised a question on Spark through Stackoverflow. When you get a chance, can you please take a look and help me ?. https://stackoverflow.com/q/71431757/5927843 Thanks, Venkat 2016173438

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-03-10 Thread Saurabh Gulati
clause compulsory in Spark SQL Hi, are all users using the same cluster of data proc? Regards, Gourav On Mon, Mar 7, 2022 at 9:28 AM Saurabh Gulati mailto:saurabh.gul...@fedex.com>> wrote: Thanks for the response, Gourav. Queries range from simple to large joins. We expose the data to our ana

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-03-07 Thread Gourav Sengupta
.@gmail.com>; user@spark.apache.org > *Subject:* Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in > Spark SQL > > Hi, > > I completely agree with Saurabh, the use of BQ with SPARK does not make > sense at all, if you are trying to cut down your costs. I think that

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-03-07 Thread Saurabh Gulati
Talebzadeh ; Kidong Lee ; user@spark.apache.org Subject: Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL Hi, I completely agree with Saurabh, the use of BQ with SPARK does not make sense at all, if you are trying to cut down your costs. I think that costs do matter to a few

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-03-05 Thread Gourav Sengupta
t; > *Cc:* user@spark.apache.org > *Subject:* Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in > Spark SQL > > Ok interesting. > > I am surprised why you are not using BigQuery and using Hive. My > assumption is that your Spark is version 3.1.1 with standard GK

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-02-23 Thread Mich Talebzadeh
.com> > *Cc:* user@spark.apache.org > *Subject:* Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in > Spark SQL > > Ok interesting. > > I am surprised why you are not using BigQuery and using Hive. My > assumption is that your Spark is version 3.1.1 with standard GK

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-02-22 Thread Saurabh Gulati
Subject: Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL Ok interesting. I am surprised why you are not using BigQuery and using Hive. My assumption is that your Spark is version 3.1.1 with standard GKE on auto-scaler. What benefits are you getting from Using Hive here? As you

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-02-22 Thread Mich Talebzadeh
-- > *From:* Mich Talebzadeh > *Sent:* 22 February 2022 16:05 > *To:* Saurabh Gulati > *Cc:* user@spark.apache.org > *Subject:* [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark > SQL > > *Caution! This email originated outside of FedEx. Please do not open

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-02-22 Thread Saurabh Gulati
: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL Thanks Sean for your response. @Mich Talebzadeh<mailto:mich.talebza...@gmail.com> We run all workloads on GKE as docker containers. So to answer your questions, Hive is running in a container as K8S service and spark thrift-

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-02-22 Thread Saurabh Gulati
: Need to make WHERE clause compulsory in Spark SQL Caution! This email originated outside of FedEx. Please do not open attachments or click links from an unknown or suspicious origin. Is your hive on prem with external tables in cloud storage? Where is your spark running from and what cloud b

Re: Need to make WHERE clause compulsory in Spark SQL

2022-02-22 Thread Mich Talebzadeh
Is your hive on prem with external tables in cloud storage? Where is your spark running from and what cloud buckets are you using? HTH On Tue, 22 Feb 2022 at 12:36, Saurabh Gulati wrote: > Hello, > We are trying to setup Spark as the execution engine for exposing our data > stored in lake. We

Re: Need to make WHERE clause compulsory in Spark SQL

2022-02-22 Thread Sean Owen
Spark does not use Hive for execution, so Hive params will not have an effect. I don't think you can enforce that in Spark. Typically you enforce things like that at a layer above your SQL engine, or can do so, because there is probably other access you need to lock down. On Tue, Feb 22, 2022 at

Need to make WHERE clause compulsory in Spark SQL

2022-02-22 Thread Saurabh Gulati
Hello, We are trying to setup Spark as the execution engine for exposing our data stored in lake. We have hive metastore running along with Spark thrift server and are using Superset as the UI. We save all tables as External tables in hive metastore with storge being on Cloud. We see that

Re: [Spark CORE][Spark SQL][Advanced]: Why dynamic partition pruning optimization does not work in this scenario?

2021-12-04 Thread Mohamadreza Rostami
Thank you for your response. It was a good point "under the broadcast join threshold." We test it on real data sets with tables size TBs, but instead, Spark uses merge sort join without DPP. Anyway, you said that the DPP is not implemented for broadcast joins? So, I wonder how DPP can be

Re: [Spark CORE][Spark SQL][Advanced]: Why dynamic partition pruning optimization does not work in this scenario?

2021-12-04 Thread Russell Spitzer
This is probably because your data size is well under the broadcastJoin threshold so at the planning phase it decides to do a BroadcastJoin instead of a Join which could take advantage of dynamic partition pruning. For testing like this you can always disable that with

Re: [apache-spark][Spark SQL][Debug] Maven Spark build fails while compiling spark-hive-thriftserver_2.12 for Hadoop 2.10.1

2021-09-17 Thread Sean Owen
I don't think that has ever showed up in the CI/CD builds and can't recall someone reporting this. What did you change? it may be some local env issue On Fri, Sep 17, 2021 at 7:09 AM Enrico Minardi wrote: > > Hello, > > > the Maven build of Apache Spark 3.1.2 for user-provided Hadoop 2.10.1

Register an Aggregator as an UDAF for Spark SQL 3

2021-08-11 Thread AlstonWilliams
HI all, I use Spark 3.0.2, I have written an Aggregator function, and I wanna register it to Spark SQL, so I can call it by ThriftServer. In Spark 2.4, I can extends `UserDefinedAggregationFunction`, and use the following statement to register it in Spark SQL shell: ``` CREATE

[Spark SQL] Why doesn't Spark SQL use code generation for my custom expression?

2021-07-23 Thread Han You
Hello, I’m writing a custom Spark catalyst Expression with custom codegen, but it seems that Spark (3.0.0) doesn’t want to generate code, and falls back to interpreted mode. I created my SparkSession with spark.sql.codegen.factoryMode=CODEGEN_ONLY and spark.sql.codegen.fallback=false, hoping that

RE: DataSource API v2 & Spark-SQL

2021-07-02 Thread Lavelle, Shawn
Thanks for following up, I will give this a go! ~ Shawn -Original Message- From: roizaig Sent: Thursday, April 29, 2021 7:42 AM To: user@spark.apache.org Subject: Re: DataSource API v2 & Spark-SQL You can create a custom data source following this blog <http://roizaig.blogs

Re: DataSource API v2 & Spark-SQL

2021-04-29 Thread roizaig
You can create a custom data source following this blog . It shows how to read a java log file using spark v3 api as an example. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: [Spark SQL]: Does Spark SQL can have better performance?

2021-04-29 Thread Mich Talebzadeh
Hi, your query parquetFile = spark.read.parquet("path/to/hdfs") parquetFile.createOrReplaceTempView("parquetFile") spark.sql("SELECT * FROM parquetFile WHERE field1 = 'value' ORDER BY timestamp LIMIT 1") will be lazily evaluated and won't do anything until the sql statement is actioned with

[Spark SQL]: Does Spark SQL can have better performance?

2021-04-29 Thread Amin Borjian
Hi. We use spark 3.0.1 in HDFS cluster and we store our files as parquet with snappy compression and enabled dictionary. We try to perform a simple query: parquetFile = spark.read.parquet("path/to/hadf") parquetFile.createOrReplaceTempView("parquetFile") spark.sql("SELECT * FROM parquetFile

Re: Is a Hive installation necessary for Spark SQL?

2021-04-25 Thread Dennis Suhari
Hi, you can also load other data source without Hive using spark read format into a spark Dataframe . From there you can also combine the results using the Dataframe world. The use cases of hive is to have a common Abstraction layer when you want to do data tagging, access management under

Is a Hive installation necessary for Spark SQL?

2021-04-25 Thread krchia
Does it make sense to keep a Hive installation when your parquet files come with a transactional metadata layer like Delta Lake / Apache Iceberg? My understanding from this: https://github.com/delta-io/delta/issues/85 is that Hive is no longer necessary other than discovering where the table is

Re: [Spark SQL]: Can complex oracle views be created using Spark SQL

2021-03-23 Thread Mich Talebzadeh
ying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 22 Mar 2021 at 05:38, Gaurav Singh wrote: > Hi Team, > > We have lots of complex oracle views ( containing multiple tables, joins, > analytical and aggregate functions, sub queries etc) and we are wondering > if Spark can help us execute those views faster. > > Also we want to know if those complex views can be implemented using Spark > SQL? > > Thanks and regards, > Gaurav Singh > +91 8600852256 > >

Re: [Spark SQL]: Can complex oracle views be created using Spark SQL

2021-03-22 Thread Mich Talebzadeh
nt to know if those complex views can be implemented using Spark > SQL? > > Thanks and regards, > Gaurav Singh > +91 8600852256 > >

[Spark SQL]: Can complex oracle views be created using Spark SQL

2021-03-21 Thread Gaurav Singh
Hi Team, We have lots of complex oracle views ( containing multiple tables, joins, analytical and aggregate functions, sub queries etc) and we are wondering if Spark can help us execute those views faster. Also we want to know if those complex views can be implemented using Spark SQL? Thanks

Re: Column-level encryption in Spark SQL

2021-01-21 Thread Gourav Sengupta
use >> case). I'm curious how you'd like it to work? (no idea how Hive does this >> either) >> >> Pozdrawiam, >> Jacek Laskowski >> >> https://about.me/JacekLaskowski >> "The Internals Of" Online Books <https://books.japila.pl/> &

Re: Column-level encryption in Spark SQL

2021-01-21 Thread Mich Talebzadeh
.com/jaceklaskowski > > <https://twitter.com/jaceklaskowski> > > > On Sat, Dec 19, 2020 at 2:38 AM john washington > wrote: > >> Dear Spark team members, >> >> Can you please advise if Column-level encryption is available in Spark >> SQL? >> I am aware that HIVE supports column level encryption. >> >> Appreciate your response. >> >> Thanks, >> John >> >

Re: Column-level encryption in Spark SQL

2021-01-21 Thread Jacek Laskowski
l/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sat, Dec 19, 2020 at 2:38 AM john washington wrote: > Dear Spark team members, > > Can you please advise if Column-level encryption is available in Spark SQL? > I am aware that HIV

Re: [Spark SQL]HiveQL and Spark SQL producing different results

2021-01-12 Thread Terry Kim
Ying, Can you share a query that produces different results? Thanks, Terry On Sun, Jan 10, 2021 at 1:48 PM Ying Zhou wrote: > Hi, > > I run some SQL using both Hive and Spark. Usually we get the same results. > However when a window function is in the script Hive and Spark can produce >

[Spark SQL]HiveQL and Spark SQL producing different results

2021-01-10 Thread Ying Zhou
Hi, I run some SQL using both Hive and Spark. Usually we get the same results. However when a window function is in the script Hive and Spark can produce different results. Is this intended behavior or either Hive or Spark has a bug? Thanks, Ying

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Sean Owen
Why not just use STDDEV_SAMP? it's probably more accurate than the differences-of-squares calculation. You can write an aggregate UDF that calls numpy and register it for SQL, but, it is already a built-in. On Thu, Dec 24, 2020 at 8:12 AM Mich Talebzadeh wrote: > Thanks for the feedback. > > I

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Mich Talebzadeh
; > On Thu, Dec 24, 2020 at 3:17 AM Mich Talebzadeh > wrote: > >> >> Well the truth is that we had this discussion in 2016 :(. what Hive >> calls Standard Deviation Function STDDEV is a pointer to STDDEV_POP. This >> is incorrect and has not been rectified yet! >

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Sean Owen
, 2020 at 3:17 AM Mich Talebzadeh wrote: > > Well the truth is that we had this discussion in 2016 :(. what Hive calls > Standard Deviation Function STDDEV is a pointer to STDDEV_POP. This is > incorrect and has not been rectified yet! > > > Spark-sql, Oracle and Sybase point

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Mich Talebzadeh
Well the truth is that we had this discussion in 2016 :(. what Hive calls Standard Deviation Function STDDEV is a pointer to STDDEV_POP. This is incorrect and has not been rectified yet! Spark-sql, Oracle and Sybase point STDDEV to STDDEV_SAMP and not STDDEV_POP. Run a test on *Hive* SELECT

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Sean Owen
gt; """ > > spark.sql(sqltext) > > Now if I wanted to use UDF based on numpy STD function, I can do > > import numpy as np > from pyspark.sql.functions import UserDefinedFunction > from pyspark.sql.types import DoubleType > udf = User

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Mich Talebzadeh
OK Thanks for the tip. I found this link useful for Python from Databricks User-defined functions - Python — Databricks Documentation <https://docs.databricks.com/spark/latest/spark-sql/udf-python.html> <https://docs.databricks.com/spark/latest/spark-sql/udf-python.html> Linke

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Peyman Mohajerian
gt; 3 DESC > > """ > > spark.sql(sqltext) > > Now if I wanted to use UDF based on numpy STD function, I can do > > import numpy as np > from pyspark.sql.functions import UserDefinedFunction > from pyspark.sql.types import DoubleType > udf = User

Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Mich Talebzadeh
STD function, I can do import numpy as np from pyspark.sql.functions import UserDefinedFunction from pyspark.sql.types import DoubleType udf = UserDefinedFunction(np.std, DoubleType()) How can I use that udf with spark SQL? I gather this is only possible through functional programming?

Re: Re: Is Spark SQL able to auto update partition stats like hive by setting hive.stats.autogather=true

2020-12-19 Thread Mich Talebzadeh
which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Sat, 19 Dec 2020 at 07:51, 疯狂的哈丘 wrote: > thx,but `hive.stats.autogather` is not work for spa

回复:Re: Is Spark SQL able to auto update partition stats like hive by setting hive.stats.autogather=true

2020-12-18 Thread 疯狂的哈丘
thx,but `hive.stats.autogather` is not work for sparkSQL. - 原始邮件 - 发件人:Mich Talebzadeh 收件人:kongt...@sina.com 抄送人:user 主题:Re: Is Spark SQL able to auto update partition stats like hive by setting hive.stats.autogather=true 日期:2020年12月19日 06点45分 Hi, A fellow forum member kindly spotted

Column-level encryption in Spark SQL

2020-12-18 Thread john washington
Dear Spark team members, Can you please advise if Column-level encryption is available in Spark SQL? I am aware that HIVE supports column level encryption. Appreciate your response. Thanks, John

Can all the parameters of hive be used on spark sql?

2020-11-17 Thread Gang Li
eg: set hive.merge.smallfiles.avgsize=1600; SET hive.auto.convert.join = true; SET hive.exec.compress.intermediate=true; SET hive.exec.compress.output=true; SET hive.exec.parallel=true; thank you very much!!! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Integration testing Framework Spark SQL Scala

2020-11-02 Thread Lars Albertsson
u, Feb 20, 2020 at 6:09 PM Ruijing Li wrote: >> >> Hi all, >> >> I’m interested in hearing the community’s thoughts on best practices to do >> integration testing for spark sql jobs. We run a lot of our jobs with cloud >> infrastructure and hdfs - this

Multi insert with join in Spark SQL

2020-08-05 Thread moqi
Hi, I am trying to migrate Hive SQL to Spark SQL. When I execute the Multi insert with join statement, Spark SQL will scan the same table multiple times, while Hive SQL will only scan once. In the actual production environment, this table is relatively large, which causes the running time

RE: DataSource API v2 & Spark-SQL

2020-08-03 Thread Lavelle, Shawn
v2 & Spark-SQL <<<< EXTERNAL email. Do not open links or attachments unless you recognize the sender. If suspicious report here<https://osiinet/Global/IMS/SitePages/Reporting.aspx>. >>>> That's a bad error message. Basically you can't make a spark native catalog

Re: DataSource API v2 & Spark-SQL

2020-08-03 Thread Russell Spitzer
or saying my class (package) > isn’t a valid data source Can you help me out? > > > > Spark versions are 3.0.0 w/scala 2.12, artifacts are Spark-core, > spark-sql, spark-hive, spark-hive-thriftserver, spark-catalyst > > > > Here’s what the dataSource definition: *public

DataSource API v2 & Spark-SQL

2020-08-03 Thread Lavelle, Shawn
) isn't a valid data source Can you help me out? Spark versions are 3.0.0 w/scala 2.12, artifacts are Spark-core, spark-sql, spark-hive, spark-hive-thriftserver, spark-catalyst Here's what the dataSource definition: public class LogTableSource implements TableProvider, SupportsRead

Re: How to populate all possible combination values in columns using Spark SQL

2020-05-09 Thread Edgardo Szrajber
Technologies  On Thu, May 7, 2020 at 10:26 AM Aakash Basu wrote: Hi, I've described the problem in Stack Overflow with a lot of detailing, can you kindly check and help if possible? https://stackoverflow.com/q/61643910/5536733 I'd be absolutely fine if someone solves it using Spark SQL APIs

Re: How to populate all possible combination values in columns using Spark SQL

2020-05-09 Thread Aakash Basu
> > I've described the problem in Stack Overflow with a lot of detailing, can > you kindly check and help if possible? > > https://stackoverflow.com/q/61643910/5536733 > > I'd be absolutely fine if someone solves it using Spark SQL APIs rather > than plain spark SQL query. > > Thanks, > Aakash. > >

Re: How to populate all possible combination values in columns using Spark SQL

2020-05-08 Thread Edgardo Szrajber
Basu wrote: Hi, I've described the problem in Stack Overflow with a lot of detailing, can you kindly check and help if possible? https://stackoverflow.com/q/61643910/5536733 I'd be absolutely fine if someone solves it using Spark SQL APIs rather than plain spark SQL query. Thanks,Aakash.

Re: How to populate all possible combination values in columns using Spark SQL

2020-05-07 Thread Aakash Basu
AM Aakash Basu > wrote: > >> Hi, >> >> I've described the problem in Stack Overflow with a lot of detailing, can >> you kindly check and help if possible? >> >> https://stackoverflow.com/q/61643910/5536733 >> >> I'd be absolutely fine if someone solves it using Spark SQL APIs rather >> than plain spark SQL query. >> >> Thanks, >> Aakash. >> >

Re: How to populate all possible combination values in columns using Spark SQL

2020-05-06 Thread Sonal Goyal
at 10:26 AM Aakash Basu wrote: > Hi, > > I've described the problem in Stack Overflow with a lot of detailing, can > you kindly check and help if possible? > > https://stackoverflow.com/q/61643910/5536733 > > I'd be absolutely fine if someone solves it using Spark SQL APIs

How to populate all possible combination values in columns using Spark SQL

2020-05-06 Thread Aakash Basu
Hi, I've described the problem in Stack Overflow with a lot of detailing, can you kindly check and help if possible? https://stackoverflow.com/q/61643910/5536733 I'd be absolutely fine if someone solves it using Spark SQL APIs rather than plain spark SQL query. Thanks, Aakash.

Re: Which SQL flavor does Spark SQL follow?

2020-05-06 Thread Mich Talebzadeh
it closely follows Hive sql. from the analytical functions its is similar to Oracle. Anyway if you know good SQL as opposed to Java programmer turned to SQL writer you should be OK. HTH Dr Mich Talebzadeh LinkedIn *

Re: Which SQL flavor does Spark SQL follow?

2020-05-06 Thread Jeff Evans
https://docs.databricks.com/spark/latest/spark-sql/language-manual/index.html https://spark.apache.org/docs/latest/api/sql/index.html On Wed, May 6, 2020 at 3:35 PM Aakash Basu wrote: > Hi, > > Wish to know, which type of SQL syntax is followed when we write a plain > SQL

  1   2   3   4   5   6   7   8   9   10   >