Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22104 I mean, the current code will still break partitioned tables: ``` == Physical Plan == *(3) Project [_c0#223, pythonUDF0#231 AS v1#226] +- BatchEvalPython [<lambda>(0)], [_c0#223, pythonUDF0#231] +- *(2) Project [_c0#223] +- *(2) Filter (pythonUDF0#230 = 0) +- BatchEvalPython [<lambda>(0)], [_c0#223, pythonUDF0#230] +- *(1) FileScan csv [_c0#223] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/tmp/tab3], PartitionFilters: [(<lambda>(0) = 0)], PushedFilters: [], ReadSchema: struct<_c0:string> ``` For instance: ```python from pyspark.sql.functions import udf, lit, col spark.range(1).selectExpr("id", "id as value").write.mode("overwrite").format('csv').partitionBy("id").save("/tmp/tab3") df = spark.read.csv('/tmp/tab3') df2 = df.withColumn('v1', udf(lambda x: x, 'int')(lit(0))) df2 = df2.filter(df2['v1'] == 0) df2.explain() ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org