Hyunsik Choi created TAJO-561:
---------------------------------

             Summary: PhysicalPlanner::createBestSortPlan should consider input 
size of leaf tasks
                 Key: TAJO-561
                 URL: https://issues.apache.org/jira/browse/TAJO-561
             Project: Tajo
          Issue Type: Bug
          Components: physical operator
            Reporter: Hyunsik Choi
             Fix For: 0.8-incubating


Please take a look at the following code.  This code determines which sort 
operator is chosen according to the input volume. Here are two problems. One is 
threshold is constant value, and it must be configurable. The second problem is 
that estimateSizeRecursive does not obtain an input volume if a task is leaf. 
We should fix them. In addition, I think that estimateSizeRecursive should be 
renamed to more proper name. It's vague and does not follow our naming 
convention.

{code:java}
public SortExec createBestSortPlan(TaskAttemptContext context, SortNode 
sortNode,
                                     PhysicalExec child) throws IOException {
    String [] outerLineage = 
PlannerUtil.getRelationLineage(sortNode.getChild());
    long estimatedSize = estimateSizeRecursive(context, outerLineage);
    final long threshold = 1048576 * 2000;

    // if the relation size is less than the reshold,
    // the in-memory sort will be used.
    if (estimatedSize <= threshold) {
      return new MemSortExec(context, sortNode, child);
    } else {
      return new ExternalSortExec(context, sm, sortNode, child);
    }
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to