zhengruifeng opened a new pull request, #46583:
URL: https://github.com/apache/spark/pull/46583

   ### What changes were proposed in this pull request?
   1, Add the missing `__repr__` method for `SQLExpression`
   2, also adjust the output of `lit(None)`: `None` -> `NULL` to be more 
consistent with the Spark Classic
   
   ### Why are the changes needed?
   bug fix, each expression/plan should implement the `__repr__` method.
   
   ```
   In [2]: from pyspark.sql.functions import when, lit, expr
   
   In [3]: expression = expr("foo")
   
   In [4]: when(expression, lit(None))
   Out[4]: 
---------------------------------------------------------------------------
   TypeError                                 Traceback (most recent call last)
   File 
~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/core/formatters.py:711,
 in PlainTextFormatter.__call__(self, obj)
       704 stream = StringIO()
       705 printer = pretty.RepresentationPrinter(stream, self.verbose,
       706     self.max_width, self.newline,
       707     max_seq_length=self.max_seq_length,
       708     singleton_pprinters=self.singleton_printers,
       709     type_pprinters=self.type_printers,
       710     deferred_pprinters=self.deferred_printers)
   --> 711 printer.pretty(obj)
       712 printer.flush()
       713 return stream.getvalue()
   
   File 
~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/lib/pretty.py:411,
 in RepresentationPrinter.pretty(self, obj)
       408                         return meth(obj, self, cycle)
       409                 if cls is not object \
       410                         and callable(cls.__dict__.get('__repr__')):
   --> 411                     return _repr_pprint(obj, self, cycle)
       413     return _default_pprint(obj, self, cycle)
       414 finally:
   
   File 
~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/lib/pretty.py:779,
 in _repr_pprint(obj, p, cycle)
       777 """A pprint that just redirects to the normal repr function."""
       778 # Find newlines and replace them with p.break_()
   --> 779 output = repr(obj)
       780 lines = output.splitlines()
       781 with p.group():
   
   File ~/Dev/spark/python/pyspark/sql/connect/column.py:441, in 
Column.__repr__(self)
       440 def __repr__(self) -> str:
   --> 441     return "Column<'%s'>" % self._expr.__repr__()
   
   File ~/Dev/spark/python/pyspark/sql/connect/expressions.py:148, in 
CaseWhen.__repr__(self)
       147 def __repr__(self) -> str:
   --> 148     _cases = "".join([f" WHEN {c} THEN {v}" for c, v in 
self._branches])
       149     _else = f" ELSE {self._else_value}" if self._else_value is not 
None else ""
       150     return "CASE" + _cases + _else + " END"
   
   TypeError: __str__ returned non-string (type NoneType)
   ```
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   ```
   In [3]: from pyspark.sql.functions import when, lit, expr
   
   In [4]: expression = expr("foo")
   
   In [5]: when_cond = when(expression, lit(None))
   
   In [6]: when_cond
   Out[6]: Column<'CASE WHEN foo THEN NULL END'>
   
   In [7]: str(when_cond)
   Out[7]: "Column<'CASE WHEN foo THEN NULL END'>"
   ```
   
   ### How was this patch tested?
   added ut
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to