Joe McDonnell created IMPALA-6054:
-------------------------------------

             Summary: Parquet dictionary pages should be freed on dictionary 
construction
                 Key: IMPALA-6054
                 URL: https://issues.apache.org/jira/browse/IMPALA-6054
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 2.10.0
            Reporter: Joe McDonnell


The Parquet scanner uses the dictionary_pool_ to allocate memory for the 
dictionary page (see BaseScalarColumnReader::InitDictionary()). This dictionary 
page is used to initialize the dictionary in CreateDictionaryDecoder(). The 
resulting dictionary is a vector of values. For some datatypes, such as 
strings, the resulting dictionary has an array of StringValue's that contain 
pointers into the dictionary page (see the StringValue specialization in 
ParquetPlainEncoder::Decode()). In this case, the dictionary page must be kept 
and attached to the last row batch that references it. However, for other 
datatypes, the values are copied into the dictionary and the dictionary page is 
no longer needed after the dictionary is constructed.

Currently, these dictionary pages remain in the dictionary_pool_ and are 
attached to the last row batch to be passed to other ExecNodes (see 
FlushRowGroupResources()). This should only pass StringValue dictionary pages 
(or other types that point to data in the page) on the row batch. The other 
types should be freed immediately once the dictionary has been constructed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to