[jira] [Commented] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
[ https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846208#comment-17846208 ] Chenyu Zheng commented on TEZ-4542: --- Thanks [~abstractdog] and [~rbalamohan] for the review! [~abstractdog] BTW, do you mind taking a look at HIVE-27985 ? > Tez application may fail due to int overflow when record size is large and > sort memory is low. > -- > > Key: TEZ-4542 > URL: https://issues.apache.org/jira/browse/TEZ-4542 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2 >Reporter: Chenyu Zheng >Assignee: Chenyu Zheng >Priority: Major > Fix For: 0.10.4 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Tez application application fail, then found this error stack: > {code:java} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) > ... 18 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) > ... 19 more > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:244) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) > ... 28 more {code} > After adding the debug log, it is easy to find this problem. The variable > `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. > This problem will be triggered if the following two conditions are met at the > same time: > * Too many IO for vertex, causing the memory allocated to each I/O for > sorting to be too small. > * When average record size is larger than 2K, `dataSize` in > {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not > try to allocate less meta space. Then raise exception. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
[ https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chenyu Zheng updated TEZ-4542: -- Description: Tez application application fail, then found this error stack: {code:java} at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) ... 19 more Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) ... 28 more {code} After adding the debug log, it is easy to find this problem. The variable `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. This problem will be triggered if the following two conditions are met at the same time: * Too many IO for vertex, causing the memory allocated to each I/O for sorting to be too small. * When average record size is larger than 2K, `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not try to allocate less meta space. Then raise exception. was: Tez application application fail, then found this error stack: {code:java} at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) ... 19 more Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) at
[jira] [Updated] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
[ https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chenyu Zheng updated TEZ-4542: -- Description: Tez application application fail, then found this error stack: {code:java} at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) ... 19 more Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) ... 28 more {code} After adding the debug log, it is easy to find this problem. The variable `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. This problem will be triggered if the following two conditions are met at the same time: * Too many IO for vertex, causing the memory allocated to each I/O for sorting to be too small. * When average record size is larger than 2K, `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not try to allocate less meta space. Then raise exception. Solution: change dataSize to long was: Tez application application fail, then found this error stack: {code:java} at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) ... 19 more Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) at
[jira] [Created] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.
Chenyu Zheng created TEZ-4542: - Summary: Tez application may fail due to int overflow when record size is large and sort memory is low. Key: TEZ-4542 URL: https://issues.apache.org/jira/browse/TEZ-4542 Project: Apache Tez Issue Type: Bug Affects Versions: 0.9.2 Reporter: Chenyu Zheng Assignee: Chenyu Zheng Tez application application fail, then found this error stack: {code:java} at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) ... 19 more Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385) ... 28 more {code} After adding the debug log, it is easy to find this problem. The variable `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. This problem will be triggered if the following two conditions are met at the same time: * Too many IO for vertex, causing the memory allocated to each I/O for sorting to be too small. * When average record size is larger than 2K, `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not try to allocate less meta space. then raise exception. {{}} Solution: change dataSize to long -- This message was sent by Atlassian Jira (v8.20.10#820010)