You can try sqlfluff <https://sqlfluff.com/> it's a linter for SQL code and
it seems to have support for sparksql <https://pypi.org/project/sqlfluff/>


man. 25. des. 2023 kl. 17:13 skrev ram manickam <ramsidm...@gmail.com>:

> Thanks Mich, Nicholas. I tried looking over the stack overflow post and
> none of them
> Seems to cover the syntax validation. Do you know if it's even possible to
> do syntax validation in spark?
>
> Thanks
> Ram
>
> On Sun, Dec 24, 2023 at 12:49 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Well not to put too finer point on it, in a public forum, one ought to
>> respect the importance of open communication. Everyone has the right to ask
>> questions, seek information, and engage in discussions without facing
>> unnecessary patronization.
>>
>>
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> This is a user-list question, not a dev-list question. Moving this
>>> conversation to the user list and BCC-ing the dev list.
>>>
>>> Also, this statement
>>>
>>> > We are not validating against table or column existence.
>>>
>>> is not correct. When you call spark.sql(…), Spark will lookup the table
>>> references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them.
>>>
>>> Also, when you run DDL via spark.sql(…), Spark will actually run it. So
>>> spark.sql(“drop table my_table”) will actually drop my_table. It’s not a
>>> validation-only operation.
>>>
>>> This question of validating SQL is already discussed on Stack Overflow
>>> <https://stackoverflow.com/q/46973729/877069>. You may find some useful
>>> tips there.
>>>
>>> Nick
>>>
>>>
>>> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>
>>> Yes, you can validate the syntax of your PySpark SQL queries without
>>> connecting to an actual dataset or running the queries on a cluster.
>>> PySpark provides a method for syntax validation without executing the
>>> query. Something like below
>>> ____              __
>>>      / __/__  ___ _____/ /__
>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>    /__ / .__/\_,_/_/ /_/\_\   version 3.4.0
>>>       /_/
>>>
>>> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11)
>>> Spark context Web UI available at http://rhes75:4040
>>> Spark context available as 'sc' (master = local[*], app id =
>>> local-1703410019374).
>>> SparkSession available as 'spark'.
>>> >>> from pyspark.sql import SparkSession
>>> >>> spark = SparkSession.builder.appName("validate").getOrCreate()
>>> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session;
>>> only runtime SQL configurations will take effect.
>>> >>> sql = "SELECT * FROM <TABLE> WHERE <COLUMN> = some value"
>>> >>> try:
>>> ...   spark.sql(sql)
>>> ...   print("is working")
>>> ... except Exception as e:
>>> ...   print(f"Syntax error: {e}")
>>> ...
>>> Syntax error:
>>> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14)
>>>
>>> == SQL ==
>>> SELECT * FROM <TABLE> WHERE <COLUMN> = some value
>>> --------------^^^
>>>
>>> Here we only check for syntax errors and not the actual existence of
>>> query semantics. We are not validating against table or column existence.
>>>
>>> This method is useful when you want to catch obvious syntax errors
>>> before submitting your PySpark job to a cluster, especially when you don't
>>> have access to the actual data.
>>>
>>> In summary
>>>
>>>    - Theis method validates syntax but will not catch semantic errors
>>>    - If you need more comprehensive validation, consider using a
>>>    testing framework and a small dataset.
>>>    - For complex queries, using a linter or code analysis tool can help
>>>    identify potential issues.
>>>
>>> HTH
>>>
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Sun, 24 Dec 2023 at 07:57, ram manickam <ramsidm...@gmail.com> wrote:
>>>
>>>> Hello,
>>>> Is there a way to validate pyspark sql to validate only syntax errors?.
>>>> I cannot connect do actual data set to perform this validation.  Any
>>>> help would be appreciated.
>>>>
>>>>
>>>> Thanks
>>>> Ram
>>>>
>>>
>>>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Reply via email to