Ryan want to submit a pull request?
On Tue, Nov 1, 2016 at 9:05 AM, Ryan Blue <rb...@netflix.com.invalid> wrote: > 1.9.0 includes some fixes intended specifically for Spark: > > * PARQUET-389: Evaluates push-down predicates for missing columns as > though they are null. This is to address Spark's work-around that requires > reading and merging file schemas, even for metastore tables. > * PARQUET-654: Adds an option to disable record-level predicate push-down, > but keep row group evaluation. This allows Spark to skip row groups based > on stats and dictionaries, but implement its own vectorized record > filtering. > > The Parquet community also evaluated performance to ensure no performance > regressions from moving to the ByteBuffer read path. > > There is one concern about 1.9.0 that will be addressed in 1.9.1, which is > that stats calculations were incorrectly using unsigned byte order for > string comparison. This means that min/max stats can't be used if the data > contains (or may contain) UTF8 characters with the msb set. 1.9.0 won't > return the bad min/max values for correctness, but there is a property to > override this behavior for data that doesn't use the affected code points. > > Upgrading to 1.9.0 depends on how the community wants to handle the sort > order bug: whether correctness or performance should be the default. > > rb > > On Tue, Nov 1, 2016 at 2:22 AM, Sean Owen <so...@cloudera.com> wrote: > >> Yes this came up from a different direction: https://issues.apac >> he.org/jira/browse/SPARK-18140 >> >> I think it's fine to pursue an upgrade to fix these several issues. The >> question is just how well it will play with other components, so bears some >> testing and evaluation of the changes from 1.8, but yes this would be good. >> >> On Mon, Oct 31, 2016 at 9:07 PM Michael Allman <mich...@videoamp.com> >> wrote: >> >>> Hi All, >>> >>> Is anyone working on updating Spark's Parquet library dep to 1.9? If >>> not, I can at least get started on it and publish a PR. >>> >>> Cheers, >>> >>> Michael >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> > > > -- > Ryan Blue > Software Engineer > Netflix >