grundprinzip commented on code in PR #38984: URL: https://github.com/apache/spark/pull/38984#discussion_r1051222085
########## python/pyspark/sql/tests/connect/test_connect_basic.py: ########## @@ -829,6 +829,13 @@ def test_with_columns(self): .toPandas(), ) + def test_hint(self): + # SPARK-41349: Test hint Review Comment: Please add additional tests for: * unsupported param types * unsupported hint name * invalid combination of hint and param Please check if there is additional coverage needed. ########## python/pyspark/sql/connect/plan.py: ########## @@ -343,6 +343,51 @@ def _repr_html_(self) -> str: """ +class Hint(LogicalPlan): + """Logical plan object for a Hint operation.""" + + def __init__(self, child: Optional["LogicalPlan"], name: str, params: List[Any]) -> None: + super().__init__(child) + self.name = name + self.params = params + + def _convert_value(self, v: Any) -> proto.Expression.Literal: + value = proto.Expression.Literal() + if v is None: + value.null = True + elif isinstance(v, int): + value.integer = v + else: + value.string = v + return value Review Comment: This code has weird error behavior if v is not `None` or `int`. If for example, I were to assign a float, I would receive an error message from protobuf that is not actionable for the user. I think it would be good to either use the existing Python to Literal conversion code that we have or throw an exception. ########## python/pyspark/sql/connect/dataframe.py: ########## @@ -875,6 +875,30 @@ def to_jcols( melt = unpivot + def hint(self, name: str, *params: Any) -> "DataFrame": + """ + Specifies some hint on the current DataFrame. As an example, the following code specifies + that one of the plan can be broadcasted: `df1.join(df2.hint("broadcast"))` + + .. versionadded:: 3.4.0 + + Parameters + ---------- + name: str + the name of the hint, for example, "broadcast", "SHUFFLE_MERGE" and "shuffle_hash". + params: tuple + the parameters of the hint Review Comment: I know that the documentation is most likeley directly from PySpark, but I'm wondering if we can add more context around what types can the params have? If I read through the code it can be `any` here but later only `Optional[Union[str, int]]`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org