[ https://issues.apache.org/jira/browse/SPARK-39405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627984#comment-17627984 ]
Xinrong Meng commented on SPARK-39405: -------------------------------------- Hi [~douglas.mo...@databricks.com] the commit is in. > NumPy input support in PySpark SQL > ---------------------------------- > > Key: SPARK-39405 > URL: https://issues.apache.org/jira/browse/SPARK-39405 > Project: Spark > Issue Type: Umbrella > Components: PySpark > Affects Versions: 3.4.0 > Reporter: Xinrong Meng > Assignee: Xinrong Meng > Priority: Major > > NumPy is the fundamental package for scientific computing with Python. It is > very commonly used, especially in the data science world. For example, Pandas > is backed by NumPy, and Tensors also supports interchangeable conversion > from/to NumPy arrays. > > However, PySpark only supports Python built-in types with the exception of > “SparkSession.createDataFrame(pandas.DataFrame)” and “DataFrame.toPandas”. > > This issue has been raised multiple times internally and externally, see also > SPARK-2012, SPARK-37697, SPARK-31776, and SPARK-6857. > > With the NumPy support in SQL, we expect more adaptations from naive data > scientists and newcomers leveraging their existing background and codebase > with NumPy. > > See more > [https://docs.google.com/document/d/1WsBiHoQB3UWERP47C47n_frffxZ9YIoGRwXSwIeMank/edit#] > . -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org