[jira] [Updated] (SPARK-54936) Monitor behaviour changes from upstream

Ruifeng Zheng (Jira) Wed, 07 Jan 2026 00:14:05 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-54936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ruifeng Zheng updated SPARK-54936:
----------------------------------
    Summary: Monitor behaviour changes from upstream   (was: Monitor upstream 
behaviour changes)

> Monitor behaviour changes from upstream 
> ----------------------------------------
>
>                 Key: SPARK-54936
>                 URL: https://issues.apache.org/jira/browse/SPARK-54936
>             Project: Spark
>          Issue Type: Umbrella
>          Components: PySpark, Tests
>    Affects Versions: 4.2.0
>            Reporter: Ruifeng Zheng
>            Priority: Major
>
> PySpark suffers a lot from behaviour changes of dependencies, like Pandas, 
> PyArrow, Numpy.
> We should add tests to monitor the behaviour of key functions/features, like:
>  * pa.array
>  * pa.scalar
>  * pa.Array.from_pandas
>  * pa.Array.to_pandas
>  * pa.Array.cast
>  * pa.Table.from_pandas
>  * pa.Table.from_batches
>  * pa.Table.from_arrays
>  * pa.Table.from_pydict
>  * pa.Table.to_pandas
>  * pa.Table.cast
>  * pa.RecordBatch.from_arrays
>  * pa.RecordBatch.from_struct_array
>  * pa.RecordBatch.from_pylist
>  * pd.Series(data=arrow_data)
>  * time zone handling in pyarrow, pandas
>  * zero copy in pandas<->pyarrow data conversions
>  * etc
>  
> The new tests should be dedicated for upstream, spark stuffs should not be 
> involved.
>  
> The test data should consider:
> 1, Missing values, like Nullable data, NaN, pd.Nat, etc;
> 2, Empty instance (e.g. empty list);
> 3, invalid values to check the support and error;
> 4, Python instance (list,tuple,array,etc);
> 5, Pandas instance
> 6, Numpy instance 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-54936) Monitor behaviour changes from upstream

Reply via email to