[ 
https://issues.apache.org/jira/browse/SPARK-54498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040760#comment-18040760
 ] 

Tian Gao commented on SPARK-54498:
----------------------------------

I think as long as we can agree on a behavior and stick to it, we are good. The 
current problem is inconsistency. What I want is the consistency for all data 
types, and the consistency between run-time behavior and type annotation.

> `toInternal` and `fromInternal` in `types.py` has weird type annotations
> ------------------------------------------------------------------------
>
>                 Key: SPARK-54498
>                 URL: https://issues.apache.org/jira/browse/SPARK-54498
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 4.2.0
>            Reporter: Tian Gao
>            Priority: Minor
>
> All `toInternals` and `fromInternals` in `pyspark/sql/types.py` have really 
> weird type annotation and run-time behavior.
> For example:
> {code:java}
>     def toInternal(self, dt: datetime.datetime) -> int:
>         if dt is not None:
>             seconds = calendar.timegm(dt.timetuple())
>             return int(seconds) * 1000000 + dt.microsecond {code}
> Do we expect `dt` to be anything other than `datetime.datetime`? What is the 
> behavior when the user passes something that's not `datetime.datetime`? This 
> is a public API so user can do whatever they want with it.
> Now it could raise an exception if the user passes something weird, or 
> implicitly return a None if user passes None.
> We should have a consistent standard for all datatypes - do we return `None` 
> if we can't convert? Do we only return `None` when the input is `None` (which 
> is then a valid input) and raise exceptions when it's other types?
> My suggestion:
>  * Only deal with data when the data is exactly the type we expected
>  * Raise an exception if the type is wrong
> It's super clear and matches the current type hint - otherwise we need to 
> modify the type hint.
> Not sure if we have backward compatibility issues here (like do we really 
> need to support `None` as an input).
> [~gurwls223] and [~podongfeng] ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to