[ 
https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832712#comment-16832712
 ] 

Bryan Cutler commented on SPARK-27612:
--------------------------------------

Thanks for checking this out [~viirya] and [~hyukjin.kwon]. I agree that if we 
can fix it in cloudpickle and do another upgrade before 3.0.0, that would be 
best. The last upgrade to 0.6.2 has not been in any released versions of Spark 
right?

> Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays 
> of None
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-27612
>                 URL: https://issues.apache.org/jira/browse/SPARK-27612
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 3.0.0
>            Reporter: Bryan Cutler
>            Assignee: Hyukjin Kwon
>            Priority: Blocker
>              Labels: correctness
>             Fix For: 3.0.0
>
>
> This seems to only affect Python 3.
> When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there 
> ends up being rows that are filled with None.
>  
> {code:java}
> In [1]: from pyspark.sql.types import ArrayType, IntegerType                  
>                                                
> In [2]: df = spark.createDataFrame([[1, 2, 3, 4]] * 100, 
> ArrayType(IntegerType(), True))                                     
> In [3]: df.distinct().collect()                                               
>                                                
> Out[3]: [Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])]
> {code}
>  
> From this example, it is consistently at elements 97, 98:
> {code}
> In [5]: df.collect()[-5:]                                                     
>                                                
> Out[5]: 
> [Row(value=[1, 2, 3, 4]),
>  Row(value=[1, 2, 3, 4]),
>  Row(value=[None, None, None, None]),
>  Row(value=[None, None, None, None]),
>  Row(value=[1, 2, 3, 4])]
> {code}
> This also happens with a type of {{ArrayType(ArrayType(IntegerType(), True))}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to