[ https://issues.apache.org/jira/browse/SPARK-41666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650898#comment-17650898 ]
Apache Spark commented on SPARK-41666: -------------------------------------- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/39159 > Support parameterized SQL in PySpark > ------------------------------------ > > Key: SPARK-41666 > URL: https://issues.apache.org/jira/browse/SPARK-41666 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.4.0 > Reporter: Max Gekk > Assignee: Max Gekk > Priority: Major > Fix For: 3.4.0 > > > Enhance the PySpark SQL API with support for parameterized SQL statements to > improve security and reusability. Application developers will be able to > write SQL with parameter markers whose values will be passed separately from > the SQL code and interpreted as literals. This will help prevent SQL > injection attacks for applications that generate SQL based on a user’s > selections, which is often done via a user interface. > PySpark has already supported formatting of sqlText using the syntax {...}. > Need to leave the API the same: > {code:python} > def sql(self, sqlQuery: str, **kwargs: Any) -> DataFrame: > {code} > and support new parameters by the same API. > PySpark *sql()* should passes unused parameters to the JVM side where the > Java sql() method handles them. For example: > {code:python} > >>> mydf = spark.range(10) > >>> spark.sql("SELECT id FROM {mydf} WHERE id % @param1 = 0", mydf=mydf, > >>> param1='3').show() > +---+ > | id| > +---+ > | 0| > | 3| > | 6| > | 9| > +---+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org