[jira] [Updated] (SPARK-39605) PySpark df.count() operation works fine on DBR 7.3 LTS but fails in DBR 10.4 LTS

Yuming Wang (Jira) Sun, 26 Jun 2022 03:54:27 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-39605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yuming Wang updated SPARK-39605:
--------------------------------
    Fix Version/s:     (was: 3.0.1)

> PySpark df.count() operation works fine on DBR 7.3 LTS but fails in DBR 10.4 
> LTS
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-39605
>                 URL: https://issues.apache.org/jira/browse/SPARK-39605
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.1
>            Reporter: Manoj Chandrashekar
>            Priority: Major
>
> I have a job that infers schema from mongodb and does operations such as 
> flattening and unwinding because there are nested fields. After performing 
> various transformations, finally when the count() is performed, it works 
> perfectly fine in databricks runtime version 7.3 LTS but fails to perform the 
> same in 10.4 LTS.
> *Below is the image that shows successful run in 7.3 LTS:*
> !https://docs.microsoft.com/answers/storage/attachments/215035-image.png|width=672,height=80!
> *Below is the image that shows failure in 10.4 LTS:*
> !https://docs.microsoft.com/answers/storage/attachments/215026-image.png|width=668,height=69!
> And I have validated that there is no field in our schema that has NullType. 
> In fact when the schema was inferred, there were Null & void type fields 
> which were converted to string using my UDF. This issue will persists even 
> when I infer schema on complete dataset, that is, samplePoolSize is on full 
> data set.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39605) PySpark df.count() operation works fine on DBR 7.3 LTS but fails in DBR 10.4 LTS

Reply via email to