[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client
HyukjinKwon commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1029950210 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -302,6 +301,31 @@ def test_to_pandas(self): self.spark.sql(query).toPandas(), ) +def test_select_expr(self): +# SPARK-41201: test selectExpr API. +self.assert_eq( +self.connect.read.table(self.tbl_name).selectExpr("id * 2").toPandas(), +self.spark.read.table(self.tbl_name).selectExpr("id * 2").toPandas(), +) +self.assert_eq( +self.connect.read.table(self.tbl_name) +.selectExpr(["id * 2", "cast(name as long) as name"]) +.toPandas(), +self.spark.read.table(self.tbl_name) +.selectExpr(["id * 2", "cast(name as long) as name"]) +.toPandas(), +) + +self.assert_eq( +self.connect.read.table(self.tbl_name) +.selectExpr("id * 2", "cast(name as long) as name") +.toPandas(), +self.spark.read.table(self.tbl_name) +.selectExpr("id * 2", "cast(name as long) as name") +.toPandas(), +) + +@unittest.skip("test_fill_na is flaky") Review Comment: ah .. I didn't notice this. Can we enable this back? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client
HyukjinKwon commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1028931691 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -220,6 +220,29 @@ def test_create_global_temp_view(self): with self.assertRaises(_MultiThreadedRendezvous): self.connect.sql("SELECT 1 AS X LIMIT 0").createGlobalTempView("view_1") +def test_select_expr(self): +self.assert_eq( Review Comment: comment with a JIRA ID? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client
HyukjinKwon commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1028931101 ## python/pyspark/sql/connect/column.py: ## @@ -263,6 +263,22 @@ def __str__(self) -> str: return f"Column({self._unparsed_identifier})" +class SQLExpression(Expression): Review Comment: I still don't like kind of naming .. but this is at least somewhat consistent with what we have in DSv2 so I am fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client
HyukjinKwon commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1028930295 ## python/pyspark/sql/connect/column.py: ## @@ -263,6 +263,22 @@ def __str__(self) -> str: return f"Column({self._unparsed_identifier})" +class SQLExpression(Expression): +"""Returns Expression which contains a string which is a SQL expression +and server side will parse it by Catalyst +""" + +def __init__(self, expr: str) -> None: +super().__init__() Review Comment: I think that's fine. One point of having type hints is to avoid `assert`s on those types too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org