[ 
https://issues.apache.org/jira/browse/SPARK-42647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aimilios Tsouvelekakis updated SPARK-42647:
-------------------------------------------
    Description: 
Numpy has started changing the alias to some of its data-types. This means that 
users with the latest version of numpy they will face either warnings or errors 
according to the type that they are using. This affects all the users using 
numoy > 1.20.0. One of the types was fixed back in September with this 
[pull|https://github.com/apache/spark/pull/37817] request.

The problem can be split into 2 types:

[numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type 
aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, 
np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually be 
removed. At this point in numpy 1.25.0 they give a awarning
[numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases of 
builtin types like np.int is deprecated and removed since numpy version 1.24.0

The changes are needed so pyspark can be compatible with the latest numpy and 
avoid
 * attribute errors on data types being deprecated from version 1.20.0: 
[https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations]
 * warnings on deprecated data types from version 1.24.0: 
[https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations]

 

>From my main research I see the following:

The only changes that are functional are related with the conversion.py file. 
The rest of the changes are inside tests in the user_guide or in some 
docstrings describing specific functions. Since I am not an expert in these 
tests I wait for the reviewer and some people with more experience in the 
pyspark code.

These types are aliases for classic python types so yes they should work with 
all the numpy versions [1|https://numpy.org/devdocs/release/1.20.0-notes.html], 
[2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python].
 The error or warning comes from the call to the numpy.

 

For the versions I chose to include from 3.3 and onwards but I see that 3.2 
also is still in the 18 month maintenace cadence as it was released in October 
2021.

 

The pull request: [https://github.com/apache/spark/pull/40220]

Best Regards,
Aimilios

  was:
Numpy has started changing the alias to some of its data-types. This means that 
users with the latest version of numpy they will face either warnings or errors 
according to the type that they are using. This affects all the users using 
numoy > 1.20.0. One of the types was fixed back in September with this 
[pull|https://github.com/apache/spark/pull/37817] request.

The problem can be split into 2 types:

[numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type 
aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, 
np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually be 
removed. At this point in numpy 1.25.0 they give a awarning
[numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases of 
builtin types like np.int is deprecated and removed since numpy version 1.24.0

The changes are needed so pyspark can be compatible with the latest numpy and 
avoid
 * attribute errors on data types being deprecated from version 1.20.0: 
[https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations]
 * warnings on deprecated data types from version 1.24.0: 
[https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations]

 

>From my main research I see the following:

The only changes that are functional are related with the conversion.py file. 
The rest of the changes are inside tests in the user_guide or in some 
docstrings describing specific functions. Since I am not an expert in these 
tests I wait for the reviewer and some people with more experience in the 
pyspark code.

These types are aliases for classic python types so yes they should work with 
all the numpy versions [1|https://numpy.org/devdocs/release/1.20.0-notes.html], 
[2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python].
 The error or warning comes from the call to the numpy.

 

The pull request: [https://github.com/apache/spark/pull/40220]

Best Regards,
Aimilios


> Remove aliases from deprecated numpy data types
> -----------------------------------------------
>
>                 Key: SPARK-42647
>                 URL: https://issues.apache.org/jira/browse/SPARK-42647
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0, 3.4.1
>            Reporter: Aimilios Tsouvelekakis
>            Priority: Major
>
> Numpy has started changing the alias to some of its data-types. This means 
> that users with the latest version of numpy they will face either warnings or 
> errors according to the type that they are using. This affects all the users 
> using numoy > 1.20.0. One of the types was fixed back in September with this 
> [pull|https://github.com/apache/spark/pull/37817] request.
> The problem can be split into 2 types:
> [numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type 
> aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, 
> np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually 
> be removed. At this point in numpy 1.25.0 they give a awarning
> [numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases 
> of builtin types like np.int is deprecated and removed since numpy version 
> 1.24.0
> The changes are needed so pyspark can be compatible with the latest numpy and 
> avoid
>  * attribute errors on data types being deprecated from version 1.20.0: 
> [https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations]
>  * warnings on deprecated data types from version 1.24.0: 
> [https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations]
>  
> From my main research I see the following:
> The only changes that are functional are related with the conversion.py file. 
> The rest of the changes are inside tests in the user_guide or in some 
> docstrings describing specific functions. Since I am not an expert in these 
> tests I wait for the reviewer and some people with more experience in the 
> pyspark code.
> These types are aliases for classic python types so yes they should work with 
> all the numpy versions 
> [1|https://numpy.org/devdocs/release/1.20.0-notes.html], 
> [2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python].
>  The error or warning comes from the call to the numpy.
>  
> For the versions I chose to include from 3.3 and onwards but I see that 3.2 
> also is still in the 18 month maintenace cadence as it was released in 
> October 2021.
>  
> The pull request: [https://github.com/apache/spark/pull/40220]
> Best Regards,
> Aimilios



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to