Ruifeng Zheng created SPARK-54936:
-------------------------------------

             Summary: Monitor upstream behaviour changes
                 Key: SPARK-54936
                 URL: https://issues.apache.org/jira/browse/SPARK-54936
             Project: Spark
          Issue Type: Umbrella
          Components: PySpark, Tests
    Affects Versions: 4.2.0
            Reporter: Ruifeng Zheng


PySpark suffers a lot from behaviour changes from upstream communities, like 
Pandas, PyArrow, Numpy.

We should add tests to monitor the behaviour of key functions/features, like:
 * pa.Array.to_pandas
 * pa.Array.from_pandas
 * pa.Array.cast
 * pa.array
 * pa.scalar
 * pd.Series(data=array_data)
 * time zone handling in pyarrow, pandas
 * zero copy in pandas<->pyarrow data conversions
 * etc

 

The new tests should be dedicated for upstream, spark stuffs should not be 
involved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to