[ https://issues.apache.org/jira/browse/SPARK-39605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang updated SPARK-39605: -------------------------------- Fix Version/s: (was: 3.0.1) > PySpark df.count() operation works fine on DBR 7.3 LTS but fails in DBR 10.4 > LTS > -------------------------------------------------------------------------------- > > Key: SPARK-39605 > URL: https://issues.apache.org/jira/browse/SPARK-39605 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.2.1 > Reporter: Manoj Chandrashekar > Priority: Major > > I have a job that infers schema from mongodb and does operations such as > flattening and unwinding because there are nested fields. After performing > various transformations, finally when the count() is performed, it works > perfectly fine in databricks runtime version 7.3 LTS but fails to perform the > same in 10.4 LTS. > *Below is the image that shows successful run in 7.3 LTS:* > !https://docs.microsoft.com/answers/storage/attachments/215035-image.png|width=672,height=80! > *Below is the image that shows failure in 10.4 LTS:* > !https://docs.microsoft.com/answers/storage/attachments/215026-image.png|width=668,height=69! > And I have validated that there is no field in our schema that has NullType. > In fact when the schema was inferred, there were Null & void type fields > which were converted to string using my UDF. This issue will persists even > when I infer schema on complete dataset, that is, samplePoolSize is on full > data set. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org