[jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types

ASF GitHub Bot (JIRA) Wed, 20 Dec 2017 21:14:20 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299565#comment-16299565
 ]


ASF GitHub Bot commented on DRILL-5846:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1060#discussion_r158196333
  
    --- Diff: exec/vector/src/main/codegen/templates/NullableValueVectors.java 
---
    @@ -51,6 +57,17 @@
     public final class ${className} extends BaseDataValueVector implements 
<#if type.major == "VarLen">VariableWidth<#else>FixedWidth</#if>Vector, 
NullableVector {
       private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(${className}.class);
     
    +  /**
    +   * Optimization to set contiguous values nullable state in a bulk 
manner; cannot define this array
    +   * within the Mutator class as Java doesn't allow static initialization 
within a non static inner class.
    +   */
    +  private static final int DEFINED_VALUES_ARRAY_LEN = 1 << 10;
    +  private static final byte[] DEFINED_VALUES_ARRAY  = new 
byte[DEFINED_VALUES_ARRAY_LEN];
    --- End diff --
    
    This variable is static in every generated class: there is one per vector 
type. Would it be better do define this in the base class so that there is only 
one copy in the system, rather than one copy per type?


> Improve Parquet Reader Performance for Flat Data types 
> -------------------------------------------------------
>
>                 Key: DRILL-5846
>                 URL: https://issues.apache.org/jira/browse/DRILL-5846
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>    Affects Versions: 1.11.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>              Labels: performance
>             Fix For: 1.13.0
>
>
> The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to 
> further improve the Parquet Reader performance as several users reported that 
> Parquet parsing represents the lion share of the overall query execution. It 
> tracks Flat Data types only as Nested DTs might involve functional and 
> processing enhancements (e.g., a nested column can be seen as a Document; 
> user might want to perform operations scoped at the document level that is no 
> need to span all rows). Another JIRA will be created to handle the nested 
> columns use-case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types

Reply via email to