Seems like there aren't any objections. I'll pick this thread back up when a Parquet maintenance release has happened.
Henry On 11 April 2018 at 14:00, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Great. > > If we can upgrade the parquet dependency from 1.8.2 to 1.8.3 in Apache > Spark 2.3.1, let's upgrade orc dependency from 1.4.1 to 1.4.3 together. > > Currently, the patch is only merged into master branch now. 1.4.1 has the > following issue. > > https://issues.apache.org/jira/browse/SPARK-23340 > > Bests, > Dongjoon. > > > > On Wed, Apr 11, 2018 at 1:23 PM, Reynold Xin <r...@databricks.com> wrote: > >> Seems like this would make sense... we usually make maintenance releases >> for bug fixes after a month anyway. >> >> >> On Wed, Apr 11, 2018 at 12:52 PM, Henry Robinson <he...@apache.org> >> wrote: >> >>> >>> >>> On 11 April 2018 at 12:47, Ryan Blue <rb...@netflix.com.invalid> wrote: >>> >>>> I think a 1.8.3 Parquet release makes sense for the 2.3.x releases of >>>> Spark. >>>> >>>> To be clear though, this only affects Spark when reading data written >>>> by Impala, right? Or does Parquet CPP also produce data like this? >>>> >>> >>> I don't know about parquet-cpp, but yeah, the only implementation I've >>> seen writing the half-completed stats is Impala. (as you know, that's >>> compliant with the spec, just an unusual choice). >>> >>> >>>> >>>> On Wed, Apr 11, 2018 at 12:35 PM, Henry Robinson <he...@apache.org> >>>> wrote: >>>> >>>>> Hi all - >>>>> >>>>> SPARK-23852 (where a query can silently give wrong results thanks to a >>>>> predicate pushdown bug in Parquet) is a fairly bad bug. In other projects >>>>> I've been involved with, we've released maintenance releases for bugs of >>>>> this severity. >>>>> >>>>> Since Spark 2.4.0 is probably a while away, I wanted to see if there >>>>> was any consensus over whether we should consider (at least) a 2.3.1. >>>>> >>>>> The reason this particular issue is a bit tricky is that the Parquet >>>>> community haven't yet produced a maintenance release that fixes the >>>>> underlying bug, but they are in the process of releasing a new minor >>>>> version, 1.10, which includes a fix. Having spoken to a couple of Parquet >>>>> developers, they'd be willing to consider a maintenance release, but would >>>>> probably only bother if we (or another affected project) asked them to. >>>>> >>>>> My guess is that we wouldn't want to upgrade to a new minor version of >>>>> Parquet for a Spark maintenance release, so asking for a Parquet >>>>> maintenance release makes sense. >>>>> >>>>> What does everyone think? >>>>> >>>>> Best, >>>>> Henry >>>>> >>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> >>> >> >