[ 
https://issues.apache.org/jira/browse/SPARK-39405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627984#comment-17627984
 ] 

Xinrong Meng commented on SPARK-39405:
--------------------------------------

Hi [~douglas.mo...@databricks.com] the commit is in.

> NumPy input support in PySpark SQL
> ----------------------------------
>
>                 Key: SPARK-39405
>                 URL: https://issues.apache.org/jira/browse/SPARK-39405
>             Project: Spark
>          Issue Type: Umbrella
>          Components: PySpark
>    Affects Versions: 3.4.0
>            Reporter: Xinrong Meng
>            Assignee: Xinrong Meng
>            Priority: Major
>
> NumPy is the fundamental package for scientific computing with Python. It is 
> very commonly used, especially in the data science world. For example, Pandas 
> is backed by NumPy, and Tensors also supports interchangeable conversion 
> from/to NumPy arrays. 
>  
> However, PySpark only supports Python built-in types with the exception of 
> “SparkSession.createDataFrame(pandas.DataFrame)” and “DataFrame.toPandas”. 
>  
> This issue has been raised multiple times internally and externally, see also 
> SPARK-2012, SPARK-37697, SPARK-31776, and SPARK-6857.
>  
> With the NumPy support in SQL, we expect more adaptations from naive data 
> scientists and newcomers leveraging their existing background and codebase 
> with NumPy.
>  
> See more 
> [https://docs.google.com/document/d/1WsBiHoQB3UWERP47C47n_frffxZ9YIoGRwXSwIeMank/edit#]
> .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to