You can try sqlfluff <https://sqlfluff.com/> it's a linter for SQL code and it seems to have support for sparksql <https://pypi.org/project/sqlfluff/>
man. 25. des. 2023 kl. 17:13 skrev ram manickam <ramsidm...@gmail.com>: > Thanks Mich, Nicholas. I tried looking over the stack overflow post and > none of them > Seems to cover the syntax validation. Do you know if it's even possible to > do syntax validation in spark? > > Thanks > Ram > > On Sun, Dec 24, 2023 at 12:49 PM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Well not to put too finer point on it, in a public forum, one ought to >> respect the importance of open communication. Everyone has the right to ask >> questions, seek information, and engage in discussions without facing >> unnecessary patronization. >> >> >> >> Mich Talebzadeh, >> Dad | Technologist | Solutions Architect | Engineer >> London >> United Kingdom >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >>> This is a user-list question, not a dev-list question. Moving this >>> conversation to the user list and BCC-ing the dev list. >>> >>> Also, this statement >>> >>> > We are not validating against table or column existence. >>> >>> is not correct. When you call spark.sql(…), Spark will lookup the table >>> references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them. >>> >>> Also, when you run DDL via spark.sql(…), Spark will actually run it. So >>> spark.sql(“drop table my_table”) will actually drop my_table. It’s not a >>> validation-only operation. >>> >>> This question of validating SQL is already discussed on Stack Overflow >>> <https://stackoverflow.com/q/46973729/877069>. You may find some useful >>> tips there. >>> >>> Nick >>> >>> >>> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>> >>> Yes, you can validate the syntax of your PySpark SQL queries without >>> connecting to an actual dataset or running the queries on a cluster. >>> PySpark provides a method for syntax validation without executing the >>> query. Something like below >>> ____ __ >>> / __/__ ___ _____/ /__ >>> _\ \/ _ \/ _ `/ __/ '_/ >>> /__ / .__/\_,_/_/ /_/\_\ version 3.4.0 >>> /_/ >>> >>> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11) >>> Spark context Web UI available at http://rhes75:4040 >>> Spark context available as 'sc' (master = local[*], app id = >>> local-1703410019374). >>> SparkSession available as 'spark'. >>> >>> from pyspark.sql import SparkSession >>> >>> spark = SparkSession.builder.appName("validate").getOrCreate() >>> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session; >>> only runtime SQL configurations will take effect. >>> >>> sql = "SELECT * FROM <TABLE> WHERE <COLUMN> = some value" >>> >>> try: >>> ... spark.sql(sql) >>> ... print("is working") >>> ... except Exception as e: >>> ... print(f"Syntax error: {e}") >>> ... >>> Syntax error: >>> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14) >>> >>> == SQL == >>> SELECT * FROM <TABLE> WHERE <COLUMN> = some value >>> --------------^^^ >>> >>> Here we only check for syntax errors and not the actual existence of >>> query semantics. We are not validating against table or column existence. >>> >>> This method is useful when you want to catch obvious syntax errors >>> before submitting your PySpark job to a cluster, especially when you don't >>> have access to the actual data. >>> >>> In summary >>> >>> - Theis method validates syntax but will not catch semantic errors >>> - If you need more comprehensive validation, consider using a >>> testing framework and a small dataset. >>> - For complex queries, using a linter or code analysis tool can help >>> identify potential issues. >>> >>> HTH >>> >>> >>> Mich Talebzadeh, >>> Dad | Technologist | Solutions Architect | Engineer >>> London >>> United Kingdom >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Sun, 24 Dec 2023 at 07:57, ram manickam <ramsidm...@gmail.com> wrote: >>> >>>> Hello, >>>> Is there a way to validate pyspark sql to validate only syntax errors?. >>>> I cannot connect do actual data set to perform this validation. Any >>>> help would be appreciated. >>>> >>>> >>>> Thanks >>>> Ram >>>> >>> >>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297