[ 
https://issues.apache.org/jira/browse/SPARK-38839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel deCordoba updated SPARK-38839:
-------------------------------------
    Description: 
When creating a dataframe using createDataFrame that contains a float inside a 
struct, the float is set to null. This only happens if using a list of 
dictionaries as data type, if I use a list of Rows it works fine:
{code:java}
data = [{"MyStruct": {"MyInt": 10, "MyFloat": 10.1}, "MyFloat": 10.1}]

spark.createDataFrame(data).show()
# +-------+------------------------------+
# |MyFloat|MyStruct                      |
# +-------+------------------------------+
# |10.1   |{MyInt -> 10, MyFloat -> null}|
# +-------+------------------------------+ 


data = [Row(MyStruct=Row(MyInt=10, MyFloat=10.1), MyFloat=10.1)]

spark.createDataFrame(data).show()
# +-------+------------------------------+
# |MyFloat|MyStruct                      |
# +-------+------------------------------+
# |10.1   |{MyInt -> 10, MyFloat -> 10.1}|
# +-------+------------------------------+ {code}

Note MyFloat inside MyStruct is set to null in the first example. Interestingly 
enough, when I do the same with Row, or if I specify the schema, then this does 
not happen (second example).

  was:
When creating a dataframe using createDataFrame that contains a float inside a 
struct, the float is set to null. This only happens if using a list of 
dictionaries as data type, if I use a list of Rows it works fine:
{code:java}
data = [{"MyStruct": {"MyInt": 10, "MyFloat": 10.1}, "MyFloat": 10.1}]

spark.createDataFrame(data).show()
# +-------+------------------------------+
# |MyFloat|MyStruct                      |
# +-------+------------------------------+
# |10.1   |{MyInt -> 10, MyFloat -> null}|
# +-------+------------------------------+ 


data = [Row(MyStruct=Row(MyInt=10, MyFloat=10.1), MyFloat=10.1)]

spark.createDataFrame(data).show()
# +-------+------------------------------+
# |MyFloat|MyStruct                      |
# +-------+------------------------------+
# |10.1   |{MyInt -> 10, MyFloat -> 10.1}|
# +-------+------------------------------+ {code}
```python
data = [{"MyStruct":

{"MyInt": 10, "MyFloat": 10.1}

, "MyFloat": 10.1}]

spark.createDataFrame(data).show()
 # 
|MyFloat|MyStruct                      |

 # 
|10.1|{MyInt -> 10, MyFloat -> null}|

data = [Row(MyStruct=Row(MyInt=10, MyFloat=10.1), MyFloat=10.1)]

spark.createDataFrame(data).show()
 # 
|MyFloat|MyStruct                      |

 # 
|10.1  |{MyInt -> 10, MyFloat -> 10.1}|

```
Note MyFloat inside MyStruct is set to null in the first example. Interestingly 
enough, when I do the same with Row, or if I specify the schema, then this does 
not happen (second example).


> Creating a struct with a float inside 
> --------------------------------------
>
>                 Key: SPARK-38839
>                 URL: https://issues.apache.org/jira/browse/SPARK-38839
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.1
>            Reporter: Daniel deCordoba
>            Priority: Minor
>
> When creating a dataframe using createDataFrame that contains a float inside 
> a struct, the float is set to null. This only happens if using a list of 
> dictionaries as data type, if I use a list of Rows it works fine:
> {code:java}
> data = [{"MyStruct": {"MyInt": 10, "MyFloat": 10.1}, "MyFloat": 10.1}]
> spark.createDataFrame(data).show()
> # +-------+------------------------------+
> # |MyFloat|MyStruct                      |
> # +-------+------------------------------+
> # |10.1   |{MyInt -> 10, MyFloat -> null}|
> # +-------+------------------------------+ 
> data = [Row(MyStruct=Row(MyInt=10, MyFloat=10.1), MyFloat=10.1)]
> spark.createDataFrame(data).show()
> # +-------+------------------------------+
> # |MyFloat|MyStruct                      |
> # +-------+------------------------------+
> # |10.1   |{MyInt -> 10, MyFloat -> 10.1}|
> # +-------+------------------------------+ {code}
> Note MyFloat inside MyStruct is set to null in the first example. 
> Interestingly enough, when I do the same with Row, or if I specify the 
> schema, then this does not happen (second example).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to