[ 
https://issues.apache.org/jira/browse/SPARK-42647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695520#comment-17695520
 ] 

Apache Spark commented on SPARK-42647:
--------------------------------------

User 'aimtsou' has created a pull request for this issue:
https://github.com/apache/spark/pull/40220

> Remove aliases from deprecated numpy data types
> -----------------------------------------------
>
>                 Key: SPARK-42647
>                 URL: https://issues.apache.org/jira/browse/SPARK-42647
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0, 3.4.1
>            Reporter: Aimilios Tsouvelekakis
>            Priority: Major
>
> Numpy has started changing the alias to some of its data-types. This means 
> that users with the latest version of numpy they will face either warnings or 
> errors according to the type that they are using. This affects all the users 
> using numoy > 1.20.0. One of the types was fixed back in September with this 
> [pull|https://github.com/apache/spark/pull/37817] request.
> The problem can be split into 2 types:
> [numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type 
> aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, 
> np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually 
> be removed. At this point in numpy 1.25.0 they give a awarning
> [numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases 
> of builtin types like np.int is deprecated and removed since numpy version 
> 1.24.0
> The changes are needed so pyspark can be compatible with the latest numpy and 
> avoid
>  * attribute errors on data types being deprecated from version 1.20.0: 
> [https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations]
>  * warnings on deprecated data types from version 1.24.0: 
> [https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations]
>  
> From my main research I see the following:
> The only changes that are functional are related with the conversion.py file. 
> The rest of the changes are inside tests in the user_guide or in some 
> docstrings describing specific functions. Since I am not an expert in these 
> tests I wait for the reviewer and some people with more experience in the 
> pyspark code.
> These types are aliases for classic python types so yes they should work with 
> all the numpy versions 
> [1|https://numpy.org/devdocs/release/1.20.0-notes.html], 
> [2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python].
>  The error or warning comes from the call to the numpy.
>  
> For the versions I chose to include from 3.3 and onwards but I see that 3.2 
> also is still in the 18 month maintenace cadence as it was released in 
> October 2021.
>  
> The pull request: [https://github.com/apache/spark/pull/40220]
> Best Regards,
> Aimilios



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to