Seems like there aren't any objections. I'll pick this thread back up when
a Parquet maintenance release has happened.

Henry

On 11 April 2018 at 14:00, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:

> Great.
>
> If we can upgrade the parquet dependency from 1.8.2 to 1.8.3 in Apache
> Spark 2.3.1, let's upgrade orc dependency from 1.4.1 to 1.4.3 together.
>
> Currently, the patch is only merged into master branch now. 1.4.1 has the
> following issue.
>
> https://issues.apache.org/jira/browse/SPARK-23340
>
> Bests,
> Dongjoon.
>
>
>
> On Wed, Apr 11, 2018 at 1:23 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> Seems like this would make sense... we usually make maintenance releases
>> for bug fixes after a month anyway.
>>
>>
>> On Wed, Apr 11, 2018 at 12:52 PM, Henry Robinson <he...@apache.org>
>> wrote:
>>
>>>
>>>
>>> On 11 April 2018 at 12:47, Ryan Blue <rb...@netflix.com.invalid> wrote:
>>>
>>>> I think a 1.8.3 Parquet release makes sense for the 2.3.x releases of
>>>> Spark.
>>>>
>>>> To be clear though, this only affects Spark when reading data written
>>>> by Impala, right? Or does Parquet CPP also produce data like this?
>>>>
>>>
>>> I don't know about parquet-cpp, but yeah, the only implementation I've
>>> seen writing the half-completed stats is Impala. (as you know, that's
>>> compliant with the spec, just an unusual choice).
>>>
>>>
>>>>
>>>> On Wed, Apr 11, 2018 at 12:35 PM, Henry Robinson <he...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi all -
>>>>>
>>>>> SPARK-23852 (where a query can silently give wrong results thanks to a
>>>>> predicate pushdown bug in Parquet) is a fairly bad bug. In other projects
>>>>> I've been involved with, we've released maintenance releases for bugs of
>>>>> this severity.
>>>>>
>>>>> Since Spark 2.4.0 is probably a while away, I wanted to see if there
>>>>> was any consensus over whether we should consider (at least) a 2.3.1.
>>>>>
>>>>> The reason this particular issue is a bit tricky is that the Parquet
>>>>> community haven't yet produced a maintenance release that fixes the
>>>>> underlying bug, but they are in the process of releasing a new minor
>>>>> version, 1.10, which includes a fix. Having spoken to a couple of Parquet
>>>>> developers, they'd be willing to consider a maintenance release, but would
>>>>> probably only bother if we (or another affected project) asked them to.
>>>>>
>>>>> My guess is that we wouldn't want to upgrade to a new minor version of
>>>>> Parquet for a Spark maintenance release, so asking for a Parquet
>>>>> maintenance release makes sense.
>>>>>
>>>>> What does everyone think?
>>>>>
>>>>> Best,
>>>>> Henry
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>>
>>
>

Reply via email to