zhengruifeng opened a new pull request, #46583: URL: https://github.com/apache/spark/pull/46583
### What changes were proposed in this pull request? 1, Add the missing `__repr__` method for `SQLExpression` 2, also adjust the output of `lit(None)`: `None` -> `NULL` to be more consistent with the Spark Classic ### Why are the changes needed? bug fix, each expression/plan should implement the `__repr__` method. ``` In [2]: from pyspark.sql.functions import when, lit, expr In [3]: expression = expr("foo") In [4]: when(expression, lit(None)) Out[4]: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) File ~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/core/formatters.py:711, in PlainTextFormatter.__call__(self, obj) 704 stream = StringIO() 705 printer = pretty.RepresentationPrinter(stream, self.verbose, 706 self.max_width, self.newline, 707 max_seq_length=self.max_seq_length, 708 singleton_pprinters=self.singleton_printers, 709 type_pprinters=self.type_printers, 710 deferred_pprinters=self.deferred_printers) --> 711 printer.pretty(obj) 712 printer.flush() 713 return stream.getvalue() File ~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/lib/pretty.py:411, in RepresentationPrinter.pretty(self, obj) 408 return meth(obj, self, cycle) 409 if cls is not object \ 410 and callable(cls.__dict__.get('__repr__')): --> 411 return _repr_pprint(obj, self, cycle) 413 return _default_pprint(obj, self, cycle) 414 finally: File ~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/lib/pretty.py:779, in _repr_pprint(obj, p, cycle) 777 """A pprint that just redirects to the normal repr function.""" 778 # Find newlines and replace them with p.break_() --> 779 output = repr(obj) 780 lines = output.splitlines() 781 with p.group(): File ~/Dev/spark/python/pyspark/sql/connect/column.py:441, in Column.__repr__(self) 440 def __repr__(self) -> str: --> 441 return "Column<'%s'>" % self._expr.__repr__() File ~/Dev/spark/python/pyspark/sql/connect/expressions.py:148, in CaseWhen.__repr__(self) 147 def __repr__(self) -> str: --> 148 _cases = "".join([f" WHEN {c} THEN {v}" for c, v in self._branches]) 149 _else = f" ELSE {self._else_value}" if self._else_value is not None else "" 150 return "CASE" + _cases + _else + " END" TypeError: __str__ returned non-string (type NoneType) ``` ### Does this PR introduce _any_ user-facing change? yes ``` In [3]: from pyspark.sql.functions import when, lit, expr In [4]: expression = expr("foo") In [5]: when_cond = when(expression, lit(None)) In [6]: when_cond Out[6]: Column<'CASE WHEN foo THEN NULL END'> In [7]: str(when_cond) Out[7]: "Column<'CASE WHEN foo THEN NULL END'>" ``` ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org