I was trying to explain my comment, where I stated that, "changing the default implementation to return false would be an incompatible change". The patch was added 6 months after that comment, so the comment didn't address the patch.
The patch does not appear to change the default implementation to return false unless the suffix of the file name is that of a known unsplittable compression format. So the folks who'd be harmed by this are those who used a suffix like ".gz" for an Avro, Parquet or other-format file. Their applications might suddenly run much slower and it would be difficult for them to determine why. Such folks are probably few, but perhaps exist. I'd prefer a change that avoided that possibility entirely. Doug On Fri, May 30, 2014 at 3:02 PM, Niels Basjes <ni...@basjes.nl> wrote: > Hi, > > The way I see the effects of the original patch on existing subclasses: > - implemented isSplitable > --> no performance difference. > - did not implement isSplitable > --> then there is no performance difference if the container is either > not compressed or uses a splittable compression. > --> If it uses a common non splittable compression (like gzip) then the > output will suddenly be different (which is the correct answer) and the > jobs will finish sooner because the input is not processed multiple times. > > Where do you see a performance impact? > > Niels > On May 30, 2014 8:06 PM, "Doug Cutting" <cutt...@apache.org> wrote: > >> On Thu, May 29, 2014 at 2:47 AM, Niels Basjes <ni...@basjes.nl> wrote: >> > For arguments I still do not fully understand this was rejected by Todd >> and >> > Doug. >> >> Performance is a part of compatibility. >> >> Doug >>