[ https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360194#comment-16360194 ]
Paul Rogers commented on DRILL-6147: ------------------------------------ Salim Says: Predicate Pushdown - The reason I invoked Predicate Pushdown within the document is to help the analysis: o Notice how Record Batch materialization could involve many more pages o A solution that relies mainly on the current set of pages (one per column) might pay a heavy IO price without much to show for + By waiting for all columns to have at least one page loaded so that upfront stats are gathered + Batch memory is then divided optimally across columns and the current batch size is computed + Unfortunately, such logic will fail if more pages are involved than the ones taken in consideration o Example - + Two variable length columns c1 and c2 + Reader waits for two pages P1-1 and P2-1 so that we a) allocate memory optimally across c1 and c2 and b) compute a batch size that will minimize overflow logic + Assume, because of data length skew or predicate pushdown, that more pages are involved in loading the batch + for c1: {P1-1, P1-2, P1-3, P1-4}, c2: {P2-1, P2-2} + It is now highly possible that overflow logic might not be optimal since only two pages statistics were considered instead of six - I have added new logic to the ScanBatch so to log (on-demand) extra batch statistics which will help us assess the efficiency of the batch sizing strategy; will add this information to the document when this sub-task is done Paul's comment: Perhaps we are conflating a number of things here and things are not, in fact, this simple. Predicates must be applied per row, not per column or per page. If the predicate is "X = 5", then we must exclude all column values when X != 5. This means we cannot do independent bulk operations if we are also doing per-row predicate filtering. Said another way, predicate push-down forces row-by-row processing, even though the underlying storage format is columnar. (This is why the Filter operator works row-by-row.) The result set loader has built-in support for predicate push-down. Once a row is loaded, before it is saved, we can apply a predicate to that row. If we don't want the row, we don't save it, and the result set loader overwrites that row with a new value. That code works and is tested. (The predicate evaluation logic would have to be added; the row-handling parts are done. Here again, why not build on the existing work, adding additional value such as that filter evaluation logic.) The discussion above is about optimizing I/O. But, predicates work row by row. Not sure how page buffering impacts per-row predicates: one has to load all pages for a batch, and decode them, if we need even a single value from that page. (Parquet does not allow random access to values within a page due to encoding, if I understand correctly.) One optimization would be to read just the predicate column and apply the predicate to those values (X in my example above.) Then, if no X value matches the predicate, we can skip reading the other pages. But, I'd have thought we could get the same result from the footer metadata... > Limit batch size for Flat Parquet Reader > ---------------------------------------- > > Key: DRILL-6147 > URL: https://issues.apache.org/jira/browse/DRILL-6147 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet > Reporter: salim achouche > Assignee: salim achouche > Priority: Major > Fix For: 1.13.0 > > > The Parquet reader currently uses a hard-coded batch size limit (32k rows) > when creating scan batches; there is no parameter nor any logic for > controlling the amount of memory used. This enhancement will allow Drill to > take an extra input parameter to control direct memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)