Xinrong Meng created SPARK-39405: ------------------------------------ Summary: NumPy support in SQL Key: SPARK-39405 URL: https://issues.apache.org/jira/browse/SPARK-39405 Project: Spark Issue Type: Umbrella Components: PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng
NumPy is the fundamental package for scientific computing with Python. It is very commonly used, especially in the data science world. For example, Pandas is backed by NumPy, and Tensors also supports interchangeable conversion from/to NumPy arrays. However, PySpark only supports Python built-in types with the exception of “SparkSession.createDataFrame(pandas.DataFrame)” and “DataFrame.toPandas”. This issue has been raised multiple times internally and externally, see also SPARK-2012, SPARK-37697, SPARK-31776, and SPARK-6857. With the NumPy support in SQL, we expect more adaptations from naive data scientists and newcomers leveraging their existing background and codebase with NumPy. See more at []. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org