Gopal V created HIVE-17124: ------------------------------ Summary: PlanUtils: Rand() is not a failure-tolerant distribution column Key: HIVE-17124 URL: https://issues.apache.org/jira/browse/HIVE-17124 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 2.3.0, 3.0.0 Reporter: Gopal V
{code} else { // numPartitionFields = -1 means random partitioning partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand")); } {code} This causes known data corruption during failure tolerance operations. There is a failure tolerant distribution function inside ReduceSinkOperator, which kicks in automatically when using no partition columns {code} if (partitionEval.length == 0) { // If no partition cols, just distribute the data uniformly // to provide better load balance. If the requirement is to have a single reducer, we should // set the number of reducers to 1. Use a constant seed to make the code deterministic. if (random == null) { random = new Random(12345); } keyHashCode = random.nextInt(); } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)