Gopal V created HIVE-17124:
------------------------------

             Summary: PlanUtils: Rand() is not a failure-tolerant distribution 
column
                 Key: HIVE-17124
                 URL: https://issues.apache.org/jira/browse/HIVE-17124
             Project: Hive
          Issue Type: Bug
          Components: Query Planning
    Affects Versions: 2.3.0, 3.0.0
            Reporter: Gopal V


{code}
else {
      // numPartitionFields = -1 means random partitioning
      
partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
    }
{code}

This causes known data corruption during failure tolerance operations.

There is a failure tolerant distribution function inside ReduceSinkOperator, 
which kicks in automatically when using no partition columns

{code}
    if (partitionEval.length == 0) {
      // If no partition cols, just distribute the data uniformly
      // to provide better load balance. If the requirement is to have a single 
reducer, we should
      // set the number of reducers to 1. Use a constant seed to make the code 
deterministic.
      if (random == null) {
        random = new Random(12345);
      }
      keyHashCode = random.nextInt();
    }
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to