I heard that there are some issues between filter push-down and parquet metadata caching thing. But I'm not clear what exactly the problem is, and whether we have a plan to resolve that. Can you elaborate what the open questions are and the conflicts with metadata caching?
The reason I'm trying to look at the filer pushdown is that one query posted in the user list couple of days ago performed really bad on Drill 1.1, compared with other system. We did some comparison analysis and thought the difference mainly comes from the fact that Drill lacks the parquet filter pushdown capability. At least for now, the only way for Drill to match the other system's performance is to enable filter pushdown for that query. In the meantime, we also identified some room for improvement in Drill's run-time generated code, when it is used for filter evaluation. I'll submit a patch for review shortly. Regards, Jinfeng On Mon, Aug 31, 2015 at 8:13 PM, Jacques Nadeau <[email protected]> wrote: > Given that Julien and Jason are working heavily on a merge into Parquet, I > strongly suggest waiting on merging other patches around that code (or at > least working on top of the changes they are doing. > > I thought there were a number of open questions around the filter pushdown > and how it related to the metadata caching stuff. Have those been resolved? > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Mon, Aug 31, 2015 at 3:25 PM, Jinfeng Ni <[email protected]> wrote: > > > I'm actually trying Adam's parquet filter pushdown patch (DRILL-1950). > > That's > > why I happened to click one parquet class and hit the above "source code > > not found" error. > > > > Thanks! > > > > > > > > On Mon, Aug 31, 2015 at 3:20 PM, Jason Altekruse < > [email protected] > > > > > wrote: > > > > > https://github.com/mapr/incubator-parquet-mr/tree/1.6.0rc3-drill-r0.3 > > > > > > I am working with Julien Le Dem on getting us off of the fork, but for > > now > > > the source code is accessible here. Let me know if you need any help > > > looking through the parquet code. Is there a particular JIRA you are > > trying > > > to address? > > > > > > On Mon, Aug 31, 2015 at 3:15 PM, Jinfeng Ni <[email protected]> > > wrote: > > > > > > > It seems we are using a forked parquet library. Can someone point me > > > > to the source code for the forked parquet ? > > > > > > > > I tried to download the source code within IDE, and it complains the > > > > following: > > > > > > > > "*Cannot download sources* > > > > > > > > Sources not found for: > > > > com.twitter:parquet-column:1.6.0rc3-drill-r0.3 > > > > > > > > " > > > > > > > > So, looks like only the compiled code jar is published, but not the > > > source > > > > code jar file. > > > > > > > > > >
