[jira] [Commented] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.

2024-05-14 Thread Chenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846208#comment-17846208
 ] 

Chenyu Zheng commented on TEZ-4542:
---

Thanks [~abstractdog] and [~rbalamohan] for the review!

[~abstractdog]  BTW, do you mind taking a look at HIVE-27985 ? 

> Tez application may fail due to int overflow when record size is large and 
> sort memory is low.
> --
>
> Key: TEZ-4542
> URL: https://issues.apache.org/jira/browse/TEZ-4542
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.2
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
> Fix For: 0.10.4
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Tez application application fail, then found this error stack:
> {code:java}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
>   ... 18 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IllegalArgumentException
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
>   ... 19 more
> Caused by: java.lang.IllegalArgumentException
>   at java.nio.Buffer.position(Buffer.java:244)
>   at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936)
>   at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350)
>   at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406)
>   at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379)
>   at 
> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385)
>   ... 28 more {code}
> After adding the debug log, it is easy to find this problem. The variable 
> `dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. 
> This problem will be triggered if the following two conditions are met at the 
> same time:
>  * Too many IO for vertex, causing the memory allocated to each I/O for 
> sorting to be too small.
>  * When average record size is larger than 2K, `dataSize`  in 
> {{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not 
> try to allocate less meta space. Then raise exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.

2024-02-22 Thread Chenyu Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenyu Zheng updated TEZ-4542:
--
Description: 
Tez application application fail, then found this error stack:
{code:java}
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
  ... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException
  at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
  ... 19 more
Caused by: java.lang.IllegalArgumentException
  at java.nio.Buffer.position(Buffer.java:244)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379)
  at 
org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167)
  at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204)
  at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541)
  at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385)
  ... 28 more {code}
After adding the debug log, it is easy to find this problem. The variable 
`dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. 

This problem will be triggered if the following two conditions are met at the 
same time:
 * Too many IO for vertex, causing the memory allocated to each I/O for sorting 
to be too small.
 * When average record size is larger than 2K, `dataSize`  in 
{{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not 
try to allocate less meta space. Then raise exception.

  was:
Tez application application fail, then found this error stack:
{code:java}
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
  ... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException
  at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
  ... 19 more
Caused by: java.lang.IllegalArgumentException
  at java.nio.Buffer.position(Buffer.java:244)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406)
  at 

[jira] [Updated] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.

2024-02-22 Thread Chenyu Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/TEZ-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenyu Zheng updated TEZ-4542:
--
Description: 
Tez application application fail, then found this error stack:
{code:java}
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
  ... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException
  at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
  ... 19 more
Caused by: java.lang.IllegalArgumentException
  at java.nio.Buffer.position(Buffer.java:244)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379)
  at 
org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167)
  at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204)
  at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541)
  at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385)
  ... 28 more {code}
After adding the debug log, it is easy to find this problem. The variable 
`dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. 

This problem will be triggered if the following two conditions are met at the 
same time:
 * Too many IO for vertex, causing the memory allocated to each I/O for sorting 
to be too small.
 * When average record size is larger than 2K, `dataSize`  in 
{{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not 
try to allocate less meta space. Then raise exception.

Solution: change dataSize to long

  was:
Tez application application fail, then found this error stack:
{code:java}
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
  ... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException
  at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
  ... 19 more
Caused by: java.lang.IllegalArgumentException
  at java.nio.Buffer.position(Buffer.java:244)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406)
  at 

[jira] [Created] (TEZ-4542) Tez application may fail due to int overflow when record size is large and sort memory is low.

2024-02-22 Thread Chenyu Zheng (Jira)
Chenyu Zheng created TEZ-4542:
-

 Summary: Tez application may fail due to int overflow when record 
size is large and sort memory is low.
 Key: TEZ-4542
 URL: https://issues.apache.org/jira/browse/TEZ-4542
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.9.2
Reporter: Chenyu Zheng
Assignee: Chenyu Zheng


Tez application application fail, then found this error stack:
{code:java}
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
  ... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException
  at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:675)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:753)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:314)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:277)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:270)
  at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:256)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
  ... 19 more
Caused by: java.lang.IllegalArgumentException
  at java.nio.Buffer.position(Buffer.java:244)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.(PipelinedSorter.java:936)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:350)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:406)
  at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:379)
  at 
org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:167)
  at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:204)
  at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:541)
  at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385)
  ... 28 more {code}
After adding the debug log, it is easy to find this problem. The variable 
`dataSize` in {{{}PipelinedSorter::{}}}SortSpan is overflow. 

This problem will be triggered if the following two conditions are met at the 
same time:
 * Too many IO for vertex, causing the memory allocated to each I/O for sorting 
to be too small.
 * When average record size is larger than 2K, `dataSize`  in 
{{{}PipelinedSorter::{}}}SortSpan is overflow will be overflow, will not 
try to allocate less meta space. then raise exception.

{{}}

Solution: change dataSize to long



--
This message was sent by Atlassian Jira
(v8.20.10#820010)