[ 
https://issues.apache.org/jira/browse/SPARK-54498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040759#comment-18040759
 ] 

Ruifeng Zheng commented on SPARK-54498:
---------------------------------------

The `toInternals` and `fromInternals` are (unintentionally?) exposed to users, 
but I think they should be rarely used by end users.

 

I think the behavior should be:

None -> None

Incorrect types -> Raise Error

 

We check the data types in some callsites like 
LocalDataToArrowConversion/LiteralExpression, probabaly we should move such 
type checks to `toInternal`

> `toInternal` and `fromInternal` in `types.py` has weird type annotations
> ------------------------------------------------------------------------
>
>                 Key: SPARK-54498
>                 URL: https://issues.apache.org/jira/browse/SPARK-54498
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 4.2.0
>            Reporter: Tian Gao
>            Priority: Minor
>
> All `toInternals` and `fromInternals` in `pyspark/sql/types.py` have really 
> weird type annotation and run-time behavior.
> For example:
> {code:java}
>     def toInternal(self, dt: datetime.datetime) -> int:
>         if dt is not None:
>             seconds = calendar.timegm(dt.timetuple())
>             return int(seconds) * 1000000 + dt.microsecond {code}
> Do we expect `dt` to be anything other than `datetime.datetime`? What is the 
> behavior when the user passes something that's not `datetime.datetime`? This 
> is a public API so user can do whatever they want with it.
> Now it could raise an exception if the user passes something weird, or 
> implicitly return a None if user passes None.
> We should have a consistent standard for all datatypes - do we return `None` 
> if we can't convert? Do we only return `None` when the input is `None` (which 
> is then a valid input) and raise exceptions when it's other types?
> My suggestion:
>  * Only deal with data when the data is exactly the type we expected
>  * Raise an exception if the type is wrong
> It's super clear and matches the current type hint - otherwise we need to 
> modify the type hint.
> Not sure if we have backward compatibility issues here (like do we really 
> need to support `None` as an input).
> [~gurwls223] and [~podongfeng] ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to