Prasanth Jayachandran created HIVE-21496:
--------------------------------------------
Summary: Automatic sizing of unordered buffer can overflow
Key: HIVE-21496
URL: https://issues.apache.org/jira/browse/HIVE-21496
Project: Hive
Issue Type: Bug
Components: Physical Optimizer
Affects Versions: 4.0.0
Reporter: Prasanth Jayachandran
Attachments: hive.log
HIVE-21329 added automatic sizing of tez unordered partitioned KV buffer based
on group by statistics. However, some corner cases for group by statistics sets
Long.MAX for data size. This ends up setting Integer.MAX for unordered KV
buffer size. This buffer size is expected to be in MB. Converting Integer.MAX
value for MB to bytes will overflow and following exception is thrown.
{code:java}
2019-03-23T01:35:17,760 INFO [Dispatcher thread {Central}]
HistoryEventHandler.criticalEvents:
[HISTORY][DAG:dag_1553330105749_0001_1][Event:TASK_ATTEMPT_FINISHED]:
vertexName=Map 1, taskAttemptId=attempt_1553330105749_0001_1_00_000000_0,
creationTime=1553330117468, allocationTime=1553330117524,
startTime=1553330117562, finishTime=1553330117755, timeTaken=193,
status=FAILED, taskFailureType=NON_FATAL, errorEnum=FRAMEWORK_ERROR,
diagnostics=Error: Error while running task ( failure ) :
attempt_1553330105749_0001_1_00_000000_0:java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
at
org.apache.tez.runtime.common.resources.MemoryDistributor.registerRequest(MemoryDistributor.java:177)
at
org.apache.tez.runtime.common.resources.MemoryDistributor.requestMemory(MemoryDistributor.java:110)
at
org.apache.tez.runtime.api.impl.TezTaskContextImpl.requestInitialMemory(TezTaskContextImpl.java:214)
at
org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput.initialize(UnorderedPartitionedKVOutput.java:76)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:537)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:520)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:505)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745){code}
Stats for GBY operator is getting Long.MAX_VALUE as seen below
{code:java}
2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: [0] STATS-TS[0] (logs): numRows: 1795
dataSize: 4443078 basicStatsState: PARTIAL colStatsState: NONE colStats:
{severity= colName: severity colType: string countDistincts: 359 numNulls: 89
avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true}
2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: Estimating row count for
GenericUDFOPEqual(Column[severity], Const string ERROR) Original num rows: 1795
New num rows: 5
2019-03-23T01:35:16,467 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: [1] STATS-FIL[8]: numRows: 5 dataSize: 12376
basicStatsState: PARTIAL colStatsState: NONE colStats: {severity= colName:
severity colType: string countDistincts: 359 numNulls: 89 avgColLen: 100.0
numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true}
2019-03-23T01:35:16,467 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
exec.FilterOperator: Setting stats (Num rows: 5 Data size: 12376 Basic stats:
PARTIAL Column stats: NONE) on: FIL[8]
2019-03-23T01:35:16,468 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
exec.SelectOperator: Setting stats (Num rows: 5 Data size: 12376 Basic stats:
PARTIAL Column stats: NONE) on: SEL[2]
2019-03-23T01:35:16,468 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: [1] STATS-SEL[2]: numRows: 5 dataSize: 12376
basicStatsState: PARTIAL colStatsState: NONE colStats: {severity= colName:
severity colType: string countDistincts: 359 numNulls: 89 avgColLen: 100.0
numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true}
2019-03-23T01:35:16,471 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: STATS-GBY[3]: inputSize: 4443078
maxSplitSize: 256000000 parallelism: 1 containsGroupingSet: false
sizeOfGroupingSet: 1
2019-03-23T01:35:16,471 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: [Case 1] STATS-GBY[3]: cardinality: 5
2019-03-23T01:35:16,472 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
exec.GroupByOperator: Setting stats (Num rows: 1 Data size: 9223372036854775807
Basic stats: PARTIAL Column stats: NONE) on: GBY[3]
2019-03-23T01:35:16,472 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: [0] STATS-GBY[3]: numRows: 1 dataSize:
9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats:
{severity= colName: severity colType: string countDistincts: 1 numNulls: 18
avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0
avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: false}
2019-03-23T01:35:16,473 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
exec.ReduceSinkOperator: Setting stats (Num rows: 1 Data size:
9223372036854775807 Basic stats: PARTIAL Column stats: NONE) on: RS[4]
2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: [0] STATS-RS[4]: numRows: 1 dataSize:
9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats:
{severity= colName: severity colType: string countDistincts: 1 numNulls: 18
avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0
avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: false}
2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: STATS-GBY[5]: inputSize: 1 maxSplitSize:
256000000 parallelism: 1 containsGroupingSet: false sizeOfGroupingSet: 1
2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: [Case 7] STATS-GBY[5]: cardinality: 0
2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
stats.StatsUtils: STATS-GBY[5]: Equals 0 in number of rows. 0 rows will be set
to 1
2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
exec.GroupByOperator: Setting stats (Num rows: 1 Data size: 9223372036854775807
Basic stats: PARTIAL Column stats: NONE) on: GBY[5]
2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: [0] STATS-GBY[5]: numRows: 1 dataSize:
9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats:
{severity= colName: severity colType: string countDistincts: 1 numNulls: 18
avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0
avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: false}
2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
annotation.StatsRulesProcFactory: [0] STATS-FS[7]: numRows: 1 dataSize:
9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats:
{severity= colName: severity colType: string countDistincts: 1 numNulls: 36
avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0
avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
false}{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)