Github user iddoav commented on the issue: https://github.com/apache/spark/pull/21070 Our R&D in SimilarWeb have hard times with PARQUET-686, and merging this PR will help us a lot. Note, that unlike Spark 2.1+ readers which have read-time mitigations (SPARK-17213 et al), other systems like CDH5.X's spark and AWS athena (probably also presto) do predicate pushdown on Spark 2.3 parquet outputs, and return wrong answers when string columns are involved. @gatorsmile @rdblue
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org