[ 
https://issues.apache.org/jira/browse/SPARK-39494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-39494:
---------------------------------
    Description: 
Currently, DataFrame creation from a list of scalars is unsupported as below:
|>>> spark.createDataFrame([1, 2])
Traceback (most recent call last):
...
    *raise* TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can *not* infer schema *for* type: <{*}class{*} '{*}int{*}'>|

 

However, cases below are supported.
|>>> spark.createDataFrame([(1,), (2,)]).collect()
[Row(_1=1), Row(_1=2)]|

 
|>>> schema
StructType([StructField('_1', LongType(), \{*}True\{*})])
>>> spark.createDataFrame([1, 2], schema=schema).collect()
[Row(_1=1), Row(_1=2)]|

 

In addition, Spark DataFrame Scala API supports creating a DataFrame from a  
list of scalars as below:
|scala> Seq(1, 2).toDF().collect()
res6: Array[org.apache.spark.sql.Row] = Array([1], [2])|

 

To maintain API consistency, we propose to support DataFrame creation from a 
list of scalars. See more at 

[https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing.|https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing]

 

  was:
Currently, DataFrame creation from a list of scalars is unsupported as below:
|>>> spark.createDataFrame([1, 2])
Traceback (most recent call last):
...
    *raise* TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can *not* infer schema *for* type: <{*}class{*} '{*}int{*}'>|

 

However, cases below are supported.
|>>> spark.createDataFrame([(1,), (2,)]).collect()
[Row(_1=1), Row(_1=2)]|

 
|>>> schema
StructType([StructField('_1', LongType(), {*}True{*})])
>>> spark.createDataFrame([1, 2], schema=schema).collect()
[Row(_1=1), Row(_1=2)]|

 

In addition, Spark DataFrame Scala API supports creating a DataFrame from a  
list of scalars as below:
|scala> Seq(1, 2).toDF().collect()
res6: Array[org.apache.spark.sql.Row] = Array([1], [2]|

 

To maintain API consistency, we propose to support DataFrame creation from a 
list of scalars. See more at 

[https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing.|https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing]

 


> Support `createDataFrame` from a list of scalars
> ------------------------------------------------
>
>                 Key: SPARK-39494
>                 URL: https://issues.apache.org/jira/browse/SPARK-39494
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.4.0
>            Reporter: Xinrong Meng
>            Priority: Major
>
> Currently, DataFrame creation from a list of scalars is unsupported as below:
> |>>> spark.createDataFrame([1, 2])
> Traceback (most recent call last):
> ...
>     *raise* TypeError("Can not infer schema for type: %s" % type(row))
> TypeError: Can *not* infer schema *for* type: <{*}class{*} '{*}int{*}'>|
>  
> However, cases below are supported.
> |>>> spark.createDataFrame([(1,), (2,)]).collect()
> [Row(_1=1), Row(_1=2)]|
>  
> |>>> schema
> StructType([StructField('_1', LongType(), \{*}True\{*})])
> >>> spark.createDataFrame([1, 2], schema=schema).collect()
> [Row(_1=1), Row(_1=2)]|
>  
> In addition, Spark DataFrame Scala API supports creating a DataFrame from a  
> list of scalars as below:
> |scala> Seq(1, 2).toDF().collect()
> res6: Array[org.apache.spark.sql.Row] = Array([1], [2])|
>  
> To maintain API consistency, we propose to support DataFrame creation from a 
> list of scalars. See more at 
> [https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing.|https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to