Github user alexbaretta commented on the pull request: https://github.com/apache/spark/pull/4039#issuecomment-70019159 @squito This is not new functionality for which it would make sense to write a unit test. This is a hotfix for a bug. I am completely unfamiliar with this code, but I understand pretty well that although Int is a subtype of Any, MutableInt is not a subtype of MutableAny; hence, whereas, it is possible to cast val declared of type Any to an Int--a type-unsafe operation that can fail hard but can also succeed if the payload of the Any val is indeed an int--a cast from MutableAny to MutableInt is simply impossible and will necessarily fail, even if the payload of the MutableAny is indeed an Int. If you look at the JIRA you will see this as the cause of failure: Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.MutableAny cannot be cast to org.apache.spark.sql.catalyst.expressions.MutableInt Now, a question worth asking to the author of this class, is why does SparkSQL rely on this type-casting mechanism to parse Parquet files? I am inclined to believe that there is a deeper issue here. That being said, my patch does allow my SQL queries to complete successfully against my Parquet dataset instead of failing.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org