[ https://issues.apache.org/jira/browse/SPARK-39494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xinrong Meng updated SPARK-39494: --------------------------------- Description: Currently, DataFrame creation from a list of scalars is unsupported as below: |>>> spark.createDataFrame([1, 2]) Traceback (most recent call last): ... *raise* TypeError("Can not infer schema for type: %s" % type(row)) TypeError: Can *not* infer schema *for* type: <{*}class{*} '{*}int{*}'>| However, cases below are supported. |>>> spark.createDataFrame([(1,), (2,)]).collect() [Row(_1=1), Row(_1=2)]| |>>> schema StructType([StructField('_1', LongType(), \{*}True\{*})]) >>> spark.createDataFrame([1, 2], schema=schema).collect() [Row(_1=1), Row(_1=2)]| In addition, Spark DataFrame Scala API supports creating a DataFrame from a list of scalars as below: |scala> Seq(1, 2).toDF().collect() res6: Array[org.apache.spark.sql.Row] = Array([1], [2])| To maintain API consistency, we propose to support DataFrame creation from a list of scalars. See more at [https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing.|https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing] was: Currently, DataFrame creation from a list of scalars is unsupported as below: |>>> spark.createDataFrame([1, 2]) Traceback (most recent call last): ... *raise* TypeError("Can not infer schema for type: %s" % type(row)) TypeError: Can *not* infer schema *for* type: <{*}class{*} '{*}int{*}'>| However, cases below are supported. |>>> spark.createDataFrame([(1,), (2,)]).collect() [Row(_1=1), Row(_1=2)]| |>>> schema StructType([StructField('_1', LongType(), {*}True{*})]) >>> spark.createDataFrame([1, 2], schema=schema).collect() [Row(_1=1), Row(_1=2)]| In addition, Spark DataFrame Scala API supports creating a DataFrame from a list of scalars as below: |scala> Seq(1, 2).toDF().collect() res6: Array[org.apache.spark.sql.Row] = Array([1], [2]| To maintain API consistency, we propose to support DataFrame creation from a list of scalars. See more at [https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing.|https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing] > Support `createDataFrame` from a list of scalars > ------------------------------------------------ > > Key: SPARK-39494 > URL: https://issues.apache.org/jira/browse/SPARK-39494 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.4.0 > Reporter: Xinrong Meng > Priority: Major > > Currently, DataFrame creation from a list of scalars is unsupported as below: > |>>> spark.createDataFrame([1, 2]) > Traceback (most recent call last): > ... > *raise* TypeError("Can not infer schema for type: %s" % type(row)) > TypeError: Can *not* infer schema *for* type: <{*}class{*} '{*}int{*}'>| > > However, cases below are supported. > |>>> spark.createDataFrame([(1,), (2,)]).collect() > [Row(_1=1), Row(_1=2)]| > > |>>> schema > StructType([StructField('_1', LongType(), \{*}True\{*})]) > >>> spark.createDataFrame([1, 2], schema=schema).collect() > [Row(_1=1), Row(_1=2)]| > > In addition, Spark DataFrame Scala API supports creating a DataFrame from a > list of scalars as below: > |scala> Seq(1, 2).toDF().collect() > res6: Array[org.apache.spark.sql.Row] = Array([1], [2])| > > To maintain API consistency, we propose to support DataFrame creation from a > list of scalars. See more at > [https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing.|https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing] > -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org