[GitHub] [spark] zhengruifeng commented on a diff in pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

2023-09-20 Thread via GitHub


zhengruifeng commented on code in PR #43014:
URL: https://github.com/apache/spark/pull/43014#discussion_r1331636427


##
python/pyspark/sql/connect/plan.py:
##
@@ -1049,21 +1049,23 @@ def __init__(self, query: str, args: 
Optional[Union[Dict[str, Any], List]] = Non
 self._query = query
 self._args = args
 
+def __to_expr(self, session: "SparkConnectClient", v: Any) -> 
proto.Expression:

Review Comment:
   I think we'd better rename it `_to_expr`.
   
   [the python 
standard](https://peps.python.org/pep-0008/#method-names-and-instance-variables)
 say:
   
   
   > Note: there is some controversy about the use of __names
   
   
   > In addition, the following special forms using leading or trailing 
underscores are recognized (these can generally be combined with any case 
convention):
   
   > `_single_leading_underscore`: weak “internal use” indicator. E.g. from M 
import * does not import objects whose names start with an underscore.
   > `single_trailing_underscore_`: used by convention to avoid conflicts with 
Python keyword, e.g.
   tkinter.Toplevel(master, class_='ClassName')
   > `__double_leading_underscore`: when naming a class attribute, invokes name 
mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).
   > `__double_leading_and_trailing_underscore__`: “magic” objects or 
attributes that live in user-controlled namespaces. E.g. __init__, __import__ 
or __file__. Never invent such names; only use them as documented.
   



##
python/pyspark/sql/tests/connect/test_connect_basic.py:
##
@@ -1237,13 +1237,23 @@ def test_sql(self):
 self.assertEqual(1, len(pdf.index))
 
 def test_sql_with_named_args(self):
-df = self.connect.sql("SELECT * FROM range(10) WHERE id > :minId", 
args={"minId": 7})
-df2 = self.spark.sql("SELECT * FROM range(10) WHERE id > :minId", 
args={"minId": 7})
+from pyspark.sql.functions import create_map, lit
+from pyspark.sql.connect.functions import lit as clit
+from pyspark.sql.connect.functions import create_map as ccreate_map

Review Comment:
   
https://github.com/apache/spark/blob/8c27de68756d4b0e5940211340a0b323d808aead/python/pyspark/sql/tests/connect/test_connect_basic.py#L78-L79
   
   I think you can use already imported `SF` and `CF`, to be consistent with 
other tests



##
python/pyspark/sql/connect/plan.py:
##
@@ -1049,21 +1049,23 @@ def __init__(self, query: str, args: 
Optional[Union[Dict[str, Any], List]] = Non
 self._query = query
 self._args = args
 
+def __to_expr(self, session: "SparkConnectClient", v: Any) -> 
proto.Expression:

Review Comment:
   I think we'd better rename it `_to_expr`.
   
   [the python 
standard](https://peps.python.org/pep-0008/#method-names-and-instance-variables)
 say:
   
   
   > Note: there is some controversy about the use of __names
   
   
   > In addition, the following special forms using leading or trailing 
underscores are recognized (these can generally be combined with any case 
convention):
   
   > `_single_leading_underscore`: weak “internal use” indicator. E.g. from M 
import * does not import objects whose names start with an underscore.
   > `single_trailing_underscore_`: used by convention to avoid conflicts with 
Python keyword, e.g.
   tkinter.Toplevel(master, class_='ClassName')
   > `__double_leading_underscore`: when naming a class attribute, invokes name 
mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).
   > `__double_leading_and_trailing_underscore__`: “magic” objects or 
attributes that live in user-controlled namespaces. E.g. __init__, __import__ 
or __file__. Never invent such names; only use them as documented.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

2023-09-20 Thread via GitHub


zhengruifeng commented on code in PR #43014:
URL: https://github.com/apache/spark/pull/43014#discussion_r1332291118


##
python/pyspark/sql/tests/connect/test_connect_basic.py:
##
@@ -1237,13 +1237,23 @@ def test_sql(self):
 self.assertEqual(1, len(pdf.index))
 
 def test_sql_with_named_args(self):
-df = self.connect.sql("SELECT * FROM range(10) WHERE id > :minId", 
args={"minId": 7})
-df2 = self.spark.sql("SELECT * FROM range(10) WHERE id > :minId", 
args={"minId": 7})
+from pyspark.sql.functions import create_map, lit
+from pyspark.sql.connect.functions import lit as clit
+from pyspark.sql.connect.functions import create_map as ccreate_map

Review Comment:
   @allisonwang-db since they are just used in tests, so I guess doesn't matter.
   
   for new test file, I think we can start with `sf` and `cf`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org