I was trying to explain my comment, where I stated that, "changing the
default implementation to return false would be an incompatible
change".  The patch was added 6 months after that comment, so the
comment didn't address the patch.

The patch does not appear to change the default implementation to
return false unless the suffix of the file name is that of a known
unsplittable compression format.  So the folks who'd be harmed by this
are those who used a suffix like ".gz" for an Avro, Parquet or
other-format file.  Their applications might suddenly run much slower
and it would be difficult for them to determine why.  Such folks are
probably few, but perhaps exist.  I'd prefer a change that avoided
that possibility entirely.

Doug

On Fri, May 30, 2014 at 3:02 PM, Niels Basjes <ni...@basjes.nl> wrote:
> Hi,
>
> The way I see the effects of the original patch on existing subclasses:
> - implemented isSplitable
>    --> no performance difference.
> - did not implement isSplitable
>    --> then there is no performance difference if the container is either
> not compressed or uses a splittable compression.
>    --> If it uses a common non splittable compression (like gzip) then the
> output will suddenly be different (which is the correct answer) and the
> jobs will finish sooner because the input is not processed multiple times.
>
> Where do you see a performance impact?
>
> Niels
> On May 30, 2014 8:06 PM, "Doug Cutting" <cutt...@apache.org> wrote:
>
>> On Thu, May 29, 2014 at 2:47 AM, Niels Basjes <ni...@basjes.nl> wrote:
>> > For arguments I still do not fully understand this was rejected by Todd
>> and
>> > Doug.
>>
>> Performance is a part of compatibility.
>>
>> Doug
>>

Reply via email to