SPARK-20364 <https://issues.apache.org/jira/browse/SPARK-20364> describes a bug but I am unclear that we should call it a regression that blocks a release.
It is something working incorrectly (in some cases in terms of output) but this case looks not even working so far in the past releases. The current master produces a wrong result when there are dots in column names for Parquet in some cases, which did even work in past releases. So, this looks not a regression to me although it is a bug that definitely we should fix. In more details, I tested this cases as below: Spark 1.6.3 val path = "/tmp/foo" Seq(Tuple1(Some(1)), Tuple1(None)).toDF("col.dots").write.parquet(path) sqlContext.read.parquet(path).where("`col.dots` IS NOT NULL").show() java.lang.IllegalArgumentException: Column [col, dots] was not found in schema! at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) ... sqlContext.read.parquet(path).where("`col.dots` IS NULL").show() java.lang.IllegalArgumentException: Column [col, dots] was not found in schema! at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) ... Spark 2.0.2 val path = "/tmp/foo" Seq(Some(1), None).toDF("col.dots").write.parquet(path) spark.read.parquet(path).where("`col.dots` IS NOT NULL").show() java.lang.IllegalArgumentException: Column [col, dots] was not found in schema! at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) ... spark.read.parquet(path).where("`col.dots` IS NULL").show() java.lang.IllegalArgumentException: Column [col, dots] was not found in schema! at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) ... Spark 2.1.0 val path = "/tmp/foo" Seq(Some(1), None).toDF("col.dots").write.parquet(path) spark.read.parquet(path).where("`col.dots` IS NOT NULL").show() java.lang.IllegalArgumentException: Column [col, dots] was not found in schema! at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) ... spark.read.parquet(path).where("`col.dots` IS NULL").show() java.lang.IllegalArgumentException: Column [col, dots] was not found in schema! at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) ... Spark 2.1.1 RC4 val path = "/tmp/foo" Seq(Some(1), None).toDF("col.dots").write.parquet(path) spark.read.parquet(path).where("`col.dots` IS NOT NULL").show() java.lang.IllegalArgumentException: Column [col, dots] was not found in schema! at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) ... spark.read.parquet(path).where("`col.dots` IS NULL").show() java.lang.IllegalArgumentException: Column [col, dots] was not found in schema! at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) ... Current master val path = "/tmp/foo" Seq(Some(1), None).toDF("col.dots").write.parquet(path) spark.read.parquet(path).where("`col.dots` IS NOT NULL").show() +--------+ |col.dots| +--------+ +--------+ spark.read.parquet(path).where("`col.dots` IS NULL").show() +--------+ |col.dots| +--------+ | null| +--------+ 2017-04-29 2:57 GMT+09:00 Koert Kuipers <ko...@tresata.com>: > we have been testing the 2.2.0 snapshots in the last few weeks for inhouse > unit tests, integration tests and real workloads and we are very happy with > it. the only issue i had so far (some encoders not being serialize anymore) > has already been dealt with by wenchen. > > On Thu, Apr 27, 2017 at 6:49 PM, Sean Owen <so...@cloudera.com> wrote: > >> By the way the RC looks good. Sigs and license are OK, tests pass with >> -Phive -Pyarn -Phadoop-2.7. +1 from me. >> >> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <mich...@databricks.com> >> wrote: >> >>> Please vote on releasing the following candidate as Apache Spark >>> version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST >>> and passes if a majority of at least 3 +1 PMC votes are cast. >>> >>> [ ] +1 Release this package as Apache Spark 2.2.0 >>> [ ] -1 Do not release this package because ... >>> >>> >>> To learn more about Apache Spark, please see http://spark.apache.org/ >>> >>> The tag to be voted on is v2.2.0-rc1 >>> <https://github.com/apache/spark/tree/v2.2.0-rc1> (8ccb4a57c82146c >>> 1a8f8966c7e64010cf5632cb6) >>> >>> List of JIRA tickets resolved can be found with this filter >>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1> >>> . >>> >>> The release files, including signatures, digests, etc. can be found at: >>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/ >>> >>> Release artifacts are signed with the following key: >>> https://people.apache.org/keys/committer/pwendell.asc >>> >>> The staging repository for this release can be found at: >>> https://repository.apache.org/content/repositories/orgapachespark-1235/ >>> >>> The documentation corresponding to this release can be found at: >>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/ >>> >>> >>> *FAQ* >>> >>> *How can I help test this release?* >>> >>> If you are a Spark user, you can help us test this release by taking an >>> existing Spark workload and running on this release candidate, then >>> reporting any regressions. >>> >>> *What should happen to JIRA tickets still targeting 2.2.0?* >>> >>> Committers should look at those and triage. Extremely important bug >>> fixes, documentation, and API tweaks that impact compatibility should be >>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1. >>> >>> *But my bug isn't fixed!??!* >>> >>> In order to make timely releases, we will typically not hold the release >>> unless the bug in question is a regression from 2.1.1. >>> >> >