grundprinzip commented on code in PR #38984:
URL: https://github.com/apache/spark/pull/38984#discussion_r1051222085


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -829,6 +829,13 @@ def test_with_columns(self):
             .toPandas(),
         )
 
+    def test_hint(self):
+        # SPARK-41349: Test hint

Review Comment:
   Please add additional tests for:
   
     * unsupported param types
     * unsupported hint name
     * invalid combination of hint and param
   
   Please check if there is additional coverage needed.



##########
python/pyspark/sql/connect/plan.py:
##########
@@ -343,6 +343,51 @@ def _repr_html_(self) -> str:
         """
 
 
+class Hint(LogicalPlan):
+    """Logical plan object for a Hint operation."""
+
+    def __init__(self, child: Optional["LogicalPlan"], name: str, params: 
List[Any]) -> None:
+        super().__init__(child)
+        self.name = name
+        self.params = params
+
+    def _convert_value(self, v: Any) -> proto.Expression.Literal:
+        value = proto.Expression.Literal()
+        if v is None:
+            value.null = True
+        elif isinstance(v, int):
+            value.integer = v
+        else:
+            value.string = v
+        return value

Review Comment:
   This code has weird error behavior if v is not `None` or `int`. If for 
example, I were to assign a float, I would receive an error message from 
protobuf that is not actionable for the user. I think it would be good to 
either use the existing Python to Literal conversion code that we have or throw 
an exception.



##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -875,6 +875,30 @@ def to_jcols(
 
     melt = unpivot
 
+    def hint(self, name: str, *params: Any) -> "DataFrame":
+        """
+        Specifies some hint on the current DataFrame. As an example, the 
following code specifies
+        that one of the plan can be broadcasted: 
`df1.join(df2.hint("broadcast"))`
+
+        .. versionadded:: 3.4.0
+
+        Parameters
+        ----------
+        name: str
+            the name of the hint, for example, "broadcast", "SHUFFLE_MERGE" 
and "shuffle_hash".
+        params: tuple
+            the parameters of the hint

Review Comment:
   I know that the documentation is most likeley directly from PySpark, but I'm 
wondering if we can add more context around what types can the params have? If 
I read through the code it can be `any` here but later only 
`Optional[Union[str, int]]`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to