Paul Rogers created DRILL-5484:
----------------------------------

             Summary: easy.text.compliant.RepeatedVarCharOutput creates 
unnecessary 64K byte field
                 Key: DRILL-5484
                 URL: https://issues.apache.org/jira/browse/DRILL-5484
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Priority: Minor


The "Easy" text readers include a "complaint" reader for reading things like 
CSV. That mechanism includes a class, {{RepeatedVarCharOutput}}, which gathers 
field data into a single array, "columns".

Part of the work is to implement project by reading only needed columns. This 
is done with a {{fields}} array. Since the constructor that sets up the array 
does not know the number of fields, it guesses that there will be the maximum: 
64K.

{code}
  public static final int MAXIMUM_NUMBER_COLUMNS = 64 * 1024;
  ...
      boolean[] fields = new boolean[MAXIMUM_NUMBER_COLUMNS];
{code}

This is, of course, a quick & dirty solution, but it is a bit of a heavy price 
to pay for a single bit that indicates we want to read all field. It is not 
clear that the performance advantage of a flag check is worth the cost of 
having many 64K heap blocks allocated: we need one per file per reader.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to