I am working on putting together a comprehensive test plan specifically covering parquet reader. One part of the plan is to create a tool that can generate parquet files of all flavors. This will considerably increase our coverage, and hopefully prevent this type of issues. I am looking at paqeut-mr and parqeut-compatability to get ideas. Hopefully I am on the right track. Welcome pointers.
Thanks, Chun On Thu, Nov 12, 2015 at 10:57 AM, Jacques Nadeau <jacq...@dremio.com> wrote: > Hey Guys, > > It sounds like the Parquet upgrade in 1.3 have fixed an incorrect result > problem with externally generated files. This has unfortunately resulted in > a performance regression in the context of partition pruning. I'm neutral > on whether this is a release stopper but it sounds like we have some strong > opinions from Aman, Jinfeng and Rahul. As such, I think this kills the > release. > > It seems like there are at least two options for resolution: > > - give people a migration tool for their previous Drill-created Parquet > files > - provide people a switch to enable the old behavior. (This will possibly > give users incorrect results if they use this in the wrong context--ick...) > > Let's move the discussion of the potential fix approaches to the DRILL-4070 > that Rahul filed. > > Two other questions that we should probably figure out answers to: > - How can we make sure this gets caught by testing in the future? > - Who wants to work on the fix? > > How does that sound? > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Thu, Nov 12, 2015 at 10:48 AM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > While breaking backward compatibility could be justified in cases like > > this, doing this without providing a tested upgrade process is > > unacceptable. > > > > - Rahul > > > > On Thu, Nov 12, 2015 at 10:43 AM, Steven Phillips <ste...@dremio.com> > > wrote: > > > > > Does DRILL-4070 cause incorrect results? Or just prevent partition > > pruning? > > > > > > On Thu, Nov 12, 2015 at 10:32 AM, Jason Altekruse < > > > altekruseja...@gmail.com> > > > wrote: > > > > > > > I just commented on the JIRA, we are behaving correctly for newly > > created > > > > parquet files. I did confirm the failure to prune on auto-partitioned > > > files > > > > created by 1.2. I do not think this is a release blocker, because I > do > > > not > > > > think we can solve this in Drill code without risking wrong results > > over > > > > parquet files written by other tools. I do support the creation of a > > > > migration utility for existing files written by Drill 1.2, but this > can > > > be > > > > released independent of 1.3. > > > > > > > > > > > > On Thu, Nov 12, 2015 at 10:26 AM, Jinfeng Ni <jinfengn...@gmail.com> > > > > wrote: > > > > > > > > > Agree with Aman that DRILL-4070 is a show stopper. Parquet is the > > > > > major data source Drill uses. If this release candidate breaks the > > > > > backward compatibility of partitioning pruning for the parquet > files > > > > > created with prior release of Drill, it could cause serious problem > > > > > for the current Drill user. > > > > > > > > > > -1 > > > > > > > > > > > > > > > > > > > > On Thu, Nov 12, 2015 at 10:10 AM, rahul challapalli > > > > > <challapallira...@gmail.com> wrote: > > > > > > -1 (non-binding) > > > > > > The nature of the issue (DRILL-4070) demands adequate testing > even > > > > with a > > > > > > workaround in place. > > > > > > > > > > > > On Thu, Nov 12, 2015 at 9:32 AM, Aman Sinha < > amansi...@apache.org> > > > > > wrote: > > > > > > > > > > > >> Given this issue, I would be a -1 unfortunately. > > > > > >> > > > > > >> On Thu, Nov 12, 2015 at 8:42 AM, Aman Sinha < > amansi...@apache.org > > > > > > > > wrote: > > > > > >> > > > > > >> > Can someone familiar with the parquet changes take a look at > > > > > DRILL-4070 ? > > > > > >> > It seems to break backward compatibility. > > > > > >> > > > > > > >> > On Tue, Nov 10, 2015 at 9:51 PM, Jacques Nadeau < > > > jacq...@dremio.com > > > > > > > > > > >> > wrote: > > > > > >> > > > > > > >> >> Hey Everybody, > > > > > >> >> > > > > > >> >> I'd like to propose a new release candidate of Apache Drill, > > > > version > > > > > >> >> 1.3.0. This is the third release candidate (rc2). This > > > addresses > > > > > some > > > > > >> >> issues identified in the the second release candidate > including > > > > some > > > > > >> test > > > > > >> >> issues & rpc concurrency issues. > > > > > >> >> > > > > > >> >> The tarball artifacts are hosted at [2] and the maven > artifacts > > > are > > > > > >> hosted > > > > > >> >> at [3]. This release candidate is based on commit > > > > > >> >> 13ab6b1f9897ebcf9179407ffaf84b79b0ee95a1 located at [4]. > > > > > >> >> The vote will be open for 72 hours ending at 10PM Pacific, > > > November > > > > > 13, > > > > > >> >> 2015. > > > > > >> >> > > > > > >> >> [ ] +1 > > > > > >> >> [ ] +0 > > > > > >> >> [ ] -1 > > > > > >> >> > > > > > >> >> thanks, > > > > > >> >> Jacques > > > > > >> >> > > > > > >> >> [1] > > > > > >> >> > > > > > >> >> > > > > > >> > > > > > > > > > > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12332946 > > > > > >> >> [2]http://people.apache.org/~jacques/apache-drill-1.3.0.rc2/ > > > > > >> >> [3] > > > > > >> >> > > > > > > > > > https://repository.apache.org/content/repositories/orgapachedrill-1013/ > > > > > >> >> [4] https://github.com/jacques-n/drill/tree/drill-1.3.0 > > > > > >> >> > > > > > >> >> > > > > > >> >> -- > > > > > >> >> Jacques Nadeau > > > > > >> >> CTO and Co-Founder, Dremio > > > > > >> >> > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > >