Yi Zhang created TEZ-4577:
-----------------------------
Summary: SortSpan could be created real small, resulting in
eventual job failure
Key: TEZ-4577
URL: https://issues.apache.org/jira/browse/TEZ-4577
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.10.4
Reporter: Yi Zhang
we run into a issue with overflow as in TEZ-4542, with TEZ-4542 applied, it
then run into an issue of real small sortspan (per record in this case),
eventually the job failed due to timeout
from sample logs it looks like
SortSpan(ByteBuffer source, int maxItems, int perItem, RawComparator comparator)
once it get into a situation of maxItems=1, then it persists with maxItems=1
sample logs:
2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: Span260.length = 1, perItem = 139
2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: reserved.remaining()=268396925, reserved.metasize=16
2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: New Span261.length = 1, perItem = 139, counter:5307003
2024-08-19 19:02:28,157 [INFO] [Sorter \{scope_302 -> scope_308} #1]
|impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=260,
length=1, time=0
2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: Span261.length = 1, perItem = 128
2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: reserved.remaining()=268396781, reserved.metasize=16
2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: New Span262.length = 1, perItem = 128, counter:5307004
2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #0]
|impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=261,
length=1, time=0
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: Span262.length = 1, perItem = 145
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: reserved.remaining()=268396620, reserved.metasize=16
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: New Span263.length = 1, perItem = 145, counter:5307005
2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #1]
|impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=262,
length=1, time=0
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: Span263.length = 1, perItem = 139
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: reserved.remaining()=268396465, reserved.metasize=16
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: New Span264.length = 1, perItem = 139, counter:5307006
2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #0]
|impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=263,
length=1, time=0
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: Span264.length = 1, perItem = 129
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: reserved.remaining()=268396320, reserved.metasize=16
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 ->
scope-308: New Span265.length = 1, perItem = 129, counter:5307007
2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #1]
|impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=264,
length=1, time=0
--
This message was sent by Atlassian Jira
(v8.20.10#820010)