Re: Validate spark sql

2023-12-26 Thread Gourav Sengupta
Dear friend,

thanks a ton was looking for linting for SQL for a long time, looks like
https://sqlfluff.com/ is something that can be used :)

Thank you so much, and wish you all a wonderful new year.

Regards,
Gourav

On Tue, Dec 26, 2023 at 4:42 AM Bjørn Jørgensen 
wrote:

> You can try sqlfluff  it's a linter for SQL code
> and it seems to have support for sparksql
> 
>
>
> man. 25. des. 2023 kl. 17:13 skrev ram manickam :
>
>> Thanks Mich, Nicholas. I tried looking over the stack overflow post and
>> none of them
>> Seems to cover the syntax validation. Do you know if it's even possible
>> to do syntax validation in spark?
>>
>> Thanks
>> Ram
>>
>> On Sun, Dec 24, 2023 at 12:49 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Well not to put too finer point on it, in a public forum, one ought to
>>> respect the importance of open communication. Everyone has the right to ask
>>> questions, seek information, and engage in discussions without facing
>>> unnecessary patronization.
>>>
>>>
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 This is a user-list question, not a dev-list question. Moving this
 conversation to the user list and BCC-ing the dev list.

 Also, this statement

 > We are not validating against table or column existence.

 is not correct. When you call spark.sql(…), Spark will lookup the table
 references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them.

 Also, when you run DDL via spark.sql(…), Spark will actually run it. So
 spark.sql(“drop table my_table”) will actually drop my_table. It’s not a
 validation-only operation.

 This question of validating SQL is already discussed on Stack Overflow
 . You may find some
 useful tips there.

 Nick


 On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh 
 wrote:


 Yes, you can validate the syntax of your PySpark SQL queries without
 connecting to an actual dataset or running the queries on a cluster.
 PySpark provides a method for syntax validation without executing the
 query. Something like below
   __
  / __/__  ___ _/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/__ / .__/\_,_/_/ /_/\_\   version 3.4.0
   /_/

 Using Python version 3.9.16 (main, Apr 24 2023 10:36:11)
 Spark context Web UI available at http://rhes75:4040
 Spark context available as 'sc' (master = local[*], app id =
 local-1703410019374).
 SparkSession available as 'spark'.
 >>> from pyspark.sql import SparkSession
 >>> spark = SparkSession.builder.appName("validate").getOrCreate()
 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session;
 only runtime SQL configurations will take effect.
 >>> sql = "SELECT * FROM  WHERE  = some value"
 >>> try:
 ...   spark.sql(sql)
 ...   print("is working")
 ... except Exception as e:
 ...   print(f"Syntax error: {e}")
 ...
 Syntax error:
 [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14)

 == SQL ==
 SELECT * FROM  WHERE  = some value
 --^^^

 Here we only check for syntax errors and not the actual existence of
 query semantics. We are not validating against table or column existence.

 This method is useful when you want to catch obvious syntax errors
 before submitting your PySpark job to a cluster, especially when you don't
 have access to the actual data.

 In summary

- Theis method validates syntax but will not catch semantic errors
- If you need more comprehensive validation, consider using a
testing framework and a small dataset.
- For complex queries, using a linter or code analysis tool can
help identify potential issues.

 HTH


 Mich Talebzadeh,
 Dad | Technologist | Solutions Architect | Engineer
 London
 United Kingdom

view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh


 *Disclaimer:* Use it at 

Re: Validate spark sql

2023-12-26 Thread Mich Talebzadeh
Worth trying EXPLAIN
<https://spark.apache.org/docs/latest/sql-ref-syntax-qry-explain.html>statement
as suggested by @tianlangstudio
HTH
Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 25 Dec 2023 at 11:15, tianlangstudio 
wrote:

>
> What about EXPLAIN?
> https://spark.apache.org/docs/3.5.0/sql-ref-syntax-qry-explain.html#content
>
>
>
><https://www.upwork.com/fl/huanqingzhu>
> <https://www.tianlang.tech/>Fusion Zhu <https://www.tianlang.tech/>
>
> --
> 发件人:ram manickam 
> 发送时间:2023年12月25日(星期一) 12:58
> 收件人:Mich Talebzadeh
> 抄 送:Nicholas Chammas; user<
> user@spark.apache.org>
> 主 题:Re: Validate spark sql
>
> Thanks Mich, Nicholas. I tried looking over the stack overflow post and
> none of them
> Seems to cover the syntax validation. Do you know if it's even possible to
> do syntax validation in spark?
>
> Thanks
> Ram
>
> On Sun, Dec 24, 2023 at 12:49 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
> Well not to put too finer point on it, in a public forum, one ought to
> respect the importance of open communication. Everyone has the right to ask
> questions, seek information, and engage in discussions without facing
> unnecessary patronization.
>
>
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> Disclaimer: Use it at your own risk.Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas 
> wrote:
> This is a user-list question, not a dev-list question. Moving this
> conversation to the user list and BCC-ing the dev list.
>
> Also, this statement
>
> > We are not validating against table or column existence.
>
> is not correct. When you call spark.sql(…), Spark will lookup the table
> references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them.
>
> Also, when you run DDL via spark.sql(…), Spark will actually run it. So
> spark.sql(“drop table my_table”) will actually drop my_table. It’s not a
> validation-only operation.
>
> This question of validating SQL is already discussed on Stack Overflow
> <https://stackoverflow.com/q/46973729/877069>. You may find some useful
> tips there.
>
> Nick
>
>
> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh 
> wrote:
>
>
> Yes, you can validate the syntax of your PySpark SQL queries without
> connecting to an actual dataset or running the queries on a cluster.
> PySpark provides a method for syntax validation without executing the
> query. Something like below
>   __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 3.4.0
>   /_/
> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11)
> Spark context Web UI available at http://rhes75:4040
> Spark context available as 'sc' (master = local[*], app id =
> local-1703410019374).
> SparkSession available as 'spark'.
> >>> from pyspark.sql import SparkSession
> >>> spark = SparkSession.builder.appName("validate").getOrCreate()
> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session; only
> runtime SQL configurations will take effect.
> >>> sql = "SELECT * FROM  WHERE  = some value"
> >>> try:
> ...   spark.sql(sql)
> ...   print("is working")
> ... except Exception as e:
> ...   print(f"Syntax error: {e}")
> ...
> Syntax error:
> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14)
> == SQL ==
> SELECT * FROM  WHERE  = some value
> --^^^
>
> Here we only check for syntax errors and not the actual existence of query
> semantics. We are not 

Re: Validate spark sql

2023-12-25 Thread Bjørn Jørgensen
You can try sqlfluff  it's a linter for SQL code and
it seems to have support for sparksql 


man. 25. des. 2023 kl. 17:13 skrev ram manickam :

> Thanks Mich, Nicholas. I tried looking over the stack overflow post and
> none of them
> Seems to cover the syntax validation. Do you know if it's even possible to
> do syntax validation in spark?
>
> Thanks
> Ram
>
> On Sun, Dec 24, 2023 at 12:49 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Well not to put too finer point on it, in a public forum, one ought to
>> respect the importance of open communication. Everyone has the right to ask
>> questions, seek information, and engage in discussions without facing
>> unnecessary patronization.
>>
>>
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> This is a user-list question, not a dev-list question. Moving this
>>> conversation to the user list and BCC-ing the dev list.
>>>
>>> Also, this statement
>>>
>>> > We are not validating against table or column existence.
>>>
>>> is not correct. When you call spark.sql(…), Spark will lookup the table
>>> references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them.
>>>
>>> Also, when you run DDL via spark.sql(…), Spark will actually run it. So
>>> spark.sql(“drop table my_table”) will actually drop my_table. It’s not a
>>> validation-only operation.
>>>
>>> This question of validating SQL is already discussed on Stack Overflow
>>> . You may find some useful
>>> tips there.
>>>
>>> Nick
>>>
>>>
>>> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh 
>>> wrote:
>>>
>>>
>>> Yes, you can validate the syntax of your PySpark SQL queries without
>>> connecting to an actual dataset or running the queries on a cluster.
>>> PySpark provides a method for syntax validation without executing the
>>> query. Something like below
>>>   __
>>>  / __/__  ___ _/ /__
>>> _\ \/ _ \/ _ `/ __/  '_/
>>>/__ / .__/\_,_/_/ /_/\_\   version 3.4.0
>>>   /_/
>>>
>>> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11)
>>> Spark context Web UI available at http://rhes75:4040
>>> Spark context available as 'sc' (master = local[*], app id =
>>> local-1703410019374).
>>> SparkSession available as 'spark'.
>>> >>> from pyspark.sql import SparkSession
>>> >>> spark = SparkSession.builder.appName("validate").getOrCreate()
>>> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session;
>>> only runtime SQL configurations will take effect.
>>> >>> sql = "SELECT * FROM  WHERE  = some value"
>>> >>> try:
>>> ...   spark.sql(sql)
>>> ...   print("is working")
>>> ... except Exception as e:
>>> ...   print(f"Syntax error: {e}")
>>> ...
>>> Syntax error:
>>> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14)
>>>
>>> == SQL ==
>>> SELECT * FROM  WHERE  = some value
>>> --^^^
>>>
>>> Here we only check for syntax errors and not the actual existence of
>>> query semantics. We are not validating against table or column existence.
>>>
>>> This method is useful when you want to catch obvious syntax errors
>>> before submitting your PySpark job to a cluster, especially when you don't
>>> have access to the actual data.
>>>
>>> In summary
>>>
>>>- Theis method validates syntax but will not catch semantic errors
>>>- If you need more comprehensive validation, consider using a
>>>testing framework and a small dataset.
>>>- For complex queries, using a linter or code analysis tool can help
>>>identify potential issues.
>>>
>>> HTH
>>>
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Sun, 24 Dec 2023 at 07:57, ram manickam  wrote:
>>>
 Hello,
 Is 

Re: Validate spark sql

2023-12-25 Thread Bjørn Jørgensen
Mailing lists

For broad, opinion based, ask for external resources, debug issues, bugs,
contributing to the project, and scenarios, it is recommended you use the
user@spark.apache.org mailing list.

   - user@spark.apache.org
    is for usage
   questions, help, and announcements. (subscribe)
   

(unsubscribe)
   

(archives) 
   - d...@spark.apache.org
    is for people
   who want to contribute code to Spark. (subscribe)
   

(unsubscribe)
   

(archives) 



man. 25. des. 2023 kl. 04:58 skrev Mich Talebzadeh <
mich.talebza...@gmail.com>:

> Well not to put too finer point on it, in a public forum, one ought to
> respect the importance of open communication. Everyone has the right to ask
> questions, seek information, and engage in discussions without facing
> unnecessary patronization.
>
>
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas 
> wrote:
>
>> This is a user-list question, not a dev-list question. Moving this
>> conversation to the user list and BCC-ing the dev list.
>>
>> Also, this statement
>>
>> > We are not validating against table or column existence.
>>
>> is not correct. When you call spark.sql(…), Spark will lookup the table
>> references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them.
>>
>> Also, when you run DDL via spark.sql(…), Spark will actually run it. So
>> spark.sql(“drop table my_table”) will actually drop my_table. It’s not a
>> validation-only operation.
>>
>> This question of validating SQL is already discussed on Stack Overflow
>> . You may find some useful
>> tips there.
>>
>> Nick
>>
>>
>> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh 
>> wrote:
>>
>>
>> Yes, you can validate the syntax of your PySpark SQL queries without
>> connecting to an actual dataset or running the queries on a cluster.
>> PySpark provides a method for syntax validation without executing the
>> query. Something like below
>>   __
>>  / __/__  ___ _/ /__
>> _\ \/ _ \/ _ `/ __/  '_/
>>/__ / .__/\_,_/_/ /_/\_\   version 3.4.0
>>   /_/
>>
>> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11)
>> Spark context Web UI available at http://rhes75:4040
>> Spark context available as 'sc' (master = local[*], app id =
>> local-1703410019374).
>> SparkSession available as 'spark'.
>> >>> from pyspark.sql import SparkSession
>> >>> spark = SparkSession.builder.appName("validate").getOrCreate()
>> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session;
>> only runtime SQL configurations will take effect.
>> >>> sql = "SELECT * FROM  WHERE  = some value"
>> >>> try:
>> ...   spark.sql(sql)
>> ...   print("is working")
>> ... except Exception as e:
>> ...   print(f"Syntax error: {e}")
>> ...
>> Syntax error:
>> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14)
>>
>> == SQL ==
>> SELECT * FROM  WHERE  = some value
>> --^^^
>>
>> Here we only check for syntax errors and not the actual existence of
>> query semantics. We are not validating against table or column existence.
>>
>> This method is useful when you want to catch obvious syntax errors before
>> submitting your PySpark job to a cluster, especially when you don't have
>> access to the actual data.
>>
>> In summary
>>
>>- Theis method validates syntax but will not catch semantic errors
>>- If you need more comprehensive validation, consider using a testing
>>framework and a small dataset.
>>- For complex queries, using a linter or code analysis tool can help
>>identify potential issues.
>>
>> HTH
>>
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from 

Re: Validate spark sql

2023-12-24 Thread ram manickam
Thanks Mich, Nicholas. I tried looking over the stack overflow post and
none of them
Seems to cover the syntax validation. Do you know if it's even possible to
do syntax validation in spark?

Thanks
Ram

On Sun, Dec 24, 2023 at 12:49 PM Mich Talebzadeh 
wrote:

> Well not to put too finer point on it, in a public forum, one ought to
> respect the importance of open communication. Everyone has the right to ask
> questions, seek information, and engage in discussions without facing
> unnecessary patronization.
>
>
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas 
> wrote:
>
>> This is a user-list question, not a dev-list question. Moving this
>> conversation to the user list and BCC-ing the dev list.
>>
>> Also, this statement
>>
>> > We are not validating against table or column existence.
>>
>> is not correct. When you call spark.sql(…), Spark will lookup the table
>> references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them.
>>
>> Also, when you run DDL via spark.sql(…), Spark will actually run it. So
>> spark.sql(“drop table my_table”) will actually drop my_table. It’s not a
>> validation-only operation.
>>
>> This question of validating SQL is already discussed on Stack Overflow
>> . You may find some useful
>> tips there.
>>
>> Nick
>>
>>
>> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh 
>> wrote:
>>
>>
>> Yes, you can validate the syntax of your PySpark SQL queries without
>> connecting to an actual dataset or running the queries on a cluster.
>> PySpark provides a method for syntax validation without executing the
>> query. Something like below
>>   __
>>  / __/__  ___ _/ /__
>> _\ \/ _ \/ _ `/ __/  '_/
>>/__ / .__/\_,_/_/ /_/\_\   version 3.4.0
>>   /_/
>>
>> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11)
>> Spark context Web UI available at http://rhes75:4040
>> Spark context available as 'sc' (master = local[*], app id =
>> local-1703410019374).
>> SparkSession available as 'spark'.
>> >>> from pyspark.sql import SparkSession
>> >>> spark = SparkSession.builder.appName("validate").getOrCreate()
>> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session;
>> only runtime SQL configurations will take effect.
>> >>> sql = "SELECT * FROM  WHERE  = some value"
>> >>> try:
>> ...   spark.sql(sql)
>> ...   print("is working")
>> ... except Exception as e:
>> ...   print(f"Syntax error: {e}")
>> ...
>> Syntax error:
>> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14)
>>
>> == SQL ==
>> SELECT * FROM  WHERE  = some value
>> --^^^
>>
>> Here we only check for syntax errors and not the actual existence of
>> query semantics. We are not validating against table or column existence.
>>
>> This method is useful when you want to catch obvious syntax errors before
>> submitting your PySpark job to a cluster, especially when you don't have
>> access to the actual data.
>>
>> In summary
>>
>>- Theis method validates syntax but will not catch semantic errors
>>- If you need more comprehensive validation, consider using a testing
>>framework and a small dataset.
>>- For complex queries, using a linter or code analysis tool can help
>>identify potential issues.
>>
>> HTH
>>
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sun, 24 Dec 2023 at 07:57, ram manickam  wrote:
>>
>>> Hello,
>>> Is there a way to validate pyspark sql to validate only syntax errors?.
>>> I cannot connect do actual data set to perform this validation.  Any
>>> help would be appreciated.
>>>
>>>
>>> Thanks
>>> Ram
>>>
>>
>>


Re: Validate spark sql

2023-12-24 Thread Mich Talebzadeh
Well not to put too finer point on it, in a public forum, one ought to
respect the importance of open communication. Everyone has the right to ask
questions, seek information, and engage in discussions without facing
unnecessary patronization.



Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas 
wrote:

> This is a user-list question, not a dev-list question. Moving this
> conversation to the user list and BCC-ing the dev list.
>
> Also, this statement
>
> > We are not validating against table or column existence.
>
> is not correct. When you call spark.sql(…), Spark will lookup the table
> references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them.
>
> Also, when you run DDL via spark.sql(…), Spark will actually run it. So
> spark.sql(“drop table my_table”) will actually drop my_table. It’s not a
> validation-only operation.
>
> This question of validating SQL is already discussed on Stack Overflow
> . You may find some useful
> tips there.
>
> Nick
>
>
> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh 
> wrote:
>
>
> Yes, you can validate the syntax of your PySpark SQL queries without
> connecting to an actual dataset or running the queries on a cluster.
> PySpark provides a method for syntax validation without executing the
> query. Something like below
>   __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 3.4.0
>   /_/
>
> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11)
> Spark context Web UI available at http://rhes75:4040
> Spark context available as 'sc' (master = local[*], app id =
> local-1703410019374).
> SparkSession available as 'spark'.
> >>> from pyspark.sql import SparkSession
> >>> spark = SparkSession.builder.appName("validate").getOrCreate()
> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session; only
> runtime SQL configurations will take effect.
> >>> sql = "SELECT * FROM  WHERE  = some value"
> >>> try:
> ...   spark.sql(sql)
> ...   print("is working")
> ... except Exception as e:
> ...   print(f"Syntax error: {e}")
> ...
> Syntax error:
> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14)
>
> == SQL ==
> SELECT * FROM  WHERE  = some value
> --^^^
>
> Here we only check for syntax errors and not the actual existence of query
> semantics. We are not validating against table or column existence.
>
> This method is useful when you want to catch obvious syntax errors before
> submitting your PySpark job to a cluster, especially when you don't have
> access to the actual data.
>
> In summary
>
>- Theis method validates syntax but will not catch semantic errors
>- If you need more comprehensive validation, consider using a testing
>framework and a small dataset.
>- For complex queries, using a linter or code analysis tool can help
>identify potential issues.
>
> HTH
>
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 24 Dec 2023 at 07:57, ram manickam  wrote:
>
>> Hello,
>> Is there a way to validate pyspark sql to validate only syntax errors?. I
>> cannot connect do actual data set to perform this validation.  Any
>> help would be appreciated.
>>
>>
>> Thanks
>> Ram
>>
>
>


Re: Validate spark sql

2023-12-24 Thread Nicholas Chammas
This is a user-list question, not a dev-list question. Moving this conversation 
to the user list and BCC-ing the dev list.

Also, this statement

> We are not validating against table or column existence.

is not correct. When you call spark.sql(…), Spark will lookup the table 
references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them.

Also, when you run DDL via spark.sql(…), Spark will actually run it. So 
spark.sql(“drop table my_table”) will actually drop my_table. It’s not a 
validation-only operation.

This question of validating SQL is already discussed on Stack Overflow 
. You may find some useful tips 
there.

Nick


> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh  
> wrote:
> 
>   
> Yes, you can validate the syntax of your PySpark SQL queries without 
> connecting to an actual dataset or running the queries on a cluster.
> PySpark provides a method for syntax validation without executing the query. 
> Something like below
>   __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 3.4.0
>   /_/
> 
> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11)
> Spark context Web UI available at http://rhes75:4040 
> Spark context available as 'sc' (master = local[*], app id = 
> local-1703410019374).
> SparkSession available as 'spark'.
> >>> from pyspark.sql import SparkSession
> >>> spark = SparkSession.builder.appName("validate").getOrCreate()
> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session; only 
> runtime SQL configurations will take effect.
> >>> sql = "SELECT * FROM  WHERE  = some value"
> >>> try:
> ...   spark.sql(sql)
> ...   print("is working")
> ... except Exception as e:
> ...   print(f"Syntax error: {e}")
> ...
> Syntax error:
> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14)
> 
> == SQL ==
> SELECT * FROM  WHERE  = some value
> --^^^
> 
> Here we only check for syntax errors and not the actual existence of query 
> semantics. We are not validating against table or column existence.
> 
> This method is useful when you want to catch obvious syntax errors before 
> submitting your PySpark job to a cluster, especially when you don't have 
> access to the actual data.
> In summary
> Theis method validates syntax but will not catch semantic errors
> If you need more comprehensive validation, consider using a testing framework 
> and a small dataset.
> For complex queries, using a linter or code analysis tool can help identify 
> potential issues.
> HTH
> 
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
> 
>view my Linkedin profile 
> 
> 
>  https://en.everybodywiki.com/Mich_Talebzadeh
> 
>  
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> 
> On Sun, 24 Dec 2023 at 07:57, ram manickam  > wrote:
>> Hello,
>> Is there a way to validate pyspark sql to validate only syntax errors?. I 
>> cannot connect do actual data set to perform this validation.  Any help 
>> would be appreciated.
>> 
>> 
>> Thanks
>> Ram