[ https://issues.apache.org/jira/browse/DRILL-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
salim achouche resolved DRILL-6301. ----------------------------------- Resolution: Fixed Reviewer: Pritesh Maker This is an analytical task. > Parquet Performance Analysis > ---------------------------- > > Key: DRILL-6301 > URL: https://issues.apache.org/jira/browse/DRILL-6301 > Project: Apache Drill > Issue Type: Task > Components: Storage - Parquet > Reporter: salim achouche > Assignee: salim achouche > Priority: Major > Fix For: 1.14.0 > > > _*Description -*_ > * DRILL-5846 is meant to improve the Flat Parquet reader performance > * The associated implementation resulted in a 2x - 4x performance improvement > * Though during the review process ([pull > request|[https://github.com/apache/drill/pull/1060])] few key questions arised > > *_Intermediary Processing via Direct Memory vs Byte Arrays_* > * The main reasons for using byte arrays for intermediary processing is to > a) avoid the high cost of the DrillBuf checks (especially the reference > counting) and b) benefit from some observed Java optimizations when accessing > byte arrays > * Starting with version 1.12.0, the DrillBuf enablement checks have been > refined so that memory access and reference counting checks can be enabled > independently > * Benchmarking of Java's Direct Memory unsafe method using JMH indicates the > performance gap between heap vs direct memory is very narrow except for few > use-cases > * There are also concerns that the extra copy step (from direct memory into > byte arrays) will have a negative effect on performance; note that this > overhead was not observed using Intel's Vtune as the intermediary buffer were > a) pinned to a single CPU, b) reused, and c) small enough to remain in the L1 > cache during columnar processing. > _*Goal*_ > * The Flat Parquet reader is amongst the few Drill columnar operators > * It is imperative that we agree on the most optimal processing pattern so > that the decisions that we take within this Jira are not only applied to > Parquet but to all Drill columnar operators > _*Methodology*_ > # Assess the performance impact of using intermediary byte arrays (as > described above) > # Prototype a solution using Direct Memory and DrillBuf checks off, access > checks on, all checks on > # Make an educated decision on which processing pattern should be adopted > # Decide whether it is ok to use Java's unsafe API (and through what > mechanism) on byte arrays (when the use of byte arrays is a necessity) > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)