Paul Rogers created DRILL-5267:
----------------------------------

             Summary: Managed external sort spills too often with Parquet data
                 Key: DRILL-5267
                 URL: https://issues.apache.org/jira/browse/DRILL-5267
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10
            Reporter: Paul Rogers
            Assignee: Paul Rogers
             Fix For: 1.10


DRILL-5266 describes how Parquet produces low-density record batches. The 
result of these batches is that the external sort spills more frequently than 
it should because it sizes spill files based on batch size, not data content of 
the batch. Since Parquet batches are 95% empty space, the spill files end up 
far too small.

Adjust the spill calculations based on actual data content, not the size of the 
overall record batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to