[jira] [Updated] (HIVE-28165) HiveSplitGenerator: send splits through filesystem instead of RPC in case of big payload

Jira Sat, 19 Jul 2025 21:58:01 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-28165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


László Bodor updated HIVE-28165:
--------------------------------
    Description: 
After some investigations regarding hive iceberg issues, it turned out that in 
the presence of delete files, the serialized payload might be huge, like 1-4MB 
/ split, which might lead to extreme memory pressure in the Tez AM, getting 
worse when having more and more splits.

Optimizing the payload is always the best option but it's not that obvious: 
instead, we should make hive and tez together take care of such situations 
without running into OOMs like this below:
{code}
ERROR : FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
vertexId=vertex_1711290808080_0000_4_00, diagnostics=[Vertex 
vertex_1711290808080_0000_4_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: web_sales_1 initializer failed, 
vertex=vertex_1711290808080_0000_4_00 [Map 1], java.lang.OutOfMemoryError: Java 
heap space
        at 
com.google.protobuf.ByteString$CodedBuilder.<init>(ByteString.java:907)
        at 
com.google.protobuf.ByteString$CodedBuilder.<init>(ByteString.java:902)
        at com.google.protobuf.ByteString.newCodedBuilder(ByteString.java:898)
        at 
com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.createEventList(HiveSplitGenerator.java:378)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:337)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$runInitializer$3(RootInputInitializerManager.java:199)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$319/0x0000000840942440.run(Unknown
 Source)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:192)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:173)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$createAndStartInitializing$2(RootInputInitializerManager.java:167)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$318/0x0000000840942040.run(Unknown
 Source)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
{code}

example where the large amount of delete files can cause such issues:
{code}
<14>1 2025-04-02T10:55:31.462Z query-coordinator-0-7 query-coordinator 1 
9b117466-88fb-48c9-898c-66a4e050d012 [mdc@38374 
class="metrics.LoggingMetricsReporter" level="INFO" thread="App Shared Pool - 
#54"] Received metrics report: 
ScanReport{tableName=default_iceberg.adp_staging.fct_position_security_level, 
snapshotId=1684724242029111928, filter=(is_null(ref(name="dw_delete_ind")) or 
not(ref(name="dw_delete_ind") == "(hash-5d29f651)")), schemaId=0, 
projectedFieldIds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 
38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 
98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 
114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 
146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 
162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172], 
projectedFieldNames=[...], 
scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS,
 totalDuration=PT1.034713356S, count=1}, 
resultDataFiles=CounterResult{unit=COUNT, value=3295}, 
resultDeleteFiles=CounterResult{unit=COUNT, value=82557}, 
totalDataManifests=CounterResult{unit=COUNT, value=28}, 
totalDeleteManifests=CounterResult{unit=COUNT, value=52}, 
scannedDataManifests=CounterResult{unit=COUNT, value=28}, 
skippedDataManifests=CounterResult{unit=COUNT, value=0}, 
totalFileSizeInBytes=CounterResult{unit=BYTES, value=721915830265}, 
totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=3022228880}, 
skippedDataFiles=CounterResult{unit=COUNT, value=0}, 
skippedDeleteFiles=CounterResult{unit=COUNT, value=0}, 
scannedDeleteManifests=CounterResult{unit=COUNT, value=52}, 
skippedDeleteManifests=CounterResult{unit=COUNT, value=0}, 
indexedDeleteFiles=CounterResult{unit=COUNT, value=12211}, 
equalityDeleteFiles=CounterResult{unit=COUNT, value=0}, 
positionalDeleteFiles=CounterResult{unit=COUNT, value=12211}}, 
metadata={read.split.open-file-cost=134217728, iceberg-version=Apache Iceberg 
1.4.3.2024.0.18.4-15 (commit 8c21337a4c0d7045926323d0ce3172ebb7123a2c)}}
{code}

extracted the important part:
{code}
totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=3022228880} -> 2.8G
indexedDeleteFiles=CounterResult{unit=COUNT, value=12211}
equalityDeleteFiles=CounterResult{unit=COUNT, value=0}, 
positionalDeleteFiles=CounterResult{unit=COUNT, value=12211}
{code}



  was:
After some investigations regarding hive iceberg issues, it turned out that in 
the presence of delete files, the serialized payload might be huge, like 1-4MB 
/ split, which might lead to extreme memory pressure in the Tez AM, getting 
worse when having more and more splits.

Optimizing the payload is always the best option but it's not that obvious: 
instead, we should make hive and tez together take care of such situations 
without running into OOMs like this below:
{code}
ERROR : FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
vertexId=vertex_1711290808080_0000_4_00, diagnostics=[Vertex 
vertex_1711290808080_0000_4_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: web_sales_1 initializer failed, 
vertex=vertex_1711290808080_0000_4_00 [Map 1], java.lang.OutOfMemoryError: Java 
heap space
        at 
com.google.protobuf.ByteString$CodedBuilder.<init>(ByteString.java:907)
        at 
com.google.protobuf.ByteString$CodedBuilder.<init>(ByteString.java:902)
        at com.google.protobuf.ByteString.newCodedBuilder(ByteString.java:898)
        at 
com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.createEventList(HiveSplitGenerator.java:378)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:337)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$runInitializer$3(RootInputInitializerManager.java:199)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$319/0x0000000840942440.run(Unknown
 Source)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:192)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:173)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$createAndStartInitializing$2(RootInputInitializerManager.java:167)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$318/0x0000000840942040.run(Unknown
 Source)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
{code}


> HiveSplitGenerator: send splits through filesystem instead of RPC in case of 
> big payload
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-28165
>                 URL: https://issues.apache.org/jira/browse/HIVE-28165
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> After some investigations regarding hive iceberg issues, it turned out that 
> in the presence of delete files, the serialized payload might be huge, like 
> 1-4MB / split, which might lead to extreme memory pressure in the Tez AM, 
> getting worse when having more and more splits.
> Optimizing the payload is always the best option but it's not that obvious: 
> instead, we should make hive and tez together take care of such situations 
> without running into OOMs like this below:
> {code}
> ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1711290808080_0000_4_00, diagnostics=[Vertex 
> vertex_1711290808080_0000_4_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: web_sales_1 initializer failed, 
> vertex=vertex_1711290808080_0000_4_00 [Map 1], java.lang.OutOfMemoryError: 
> Java heap space
>       at 
> com.google.protobuf.ByteString$CodedBuilder.<init>(ByteString.java:907)
>       at 
> com.google.protobuf.ByteString$CodedBuilder.<init>(ByteString.java:902)
>       at com.google.protobuf.ByteString.newCodedBuilder(ByteString.java:898)
>       at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.createEventList(HiveSplitGenerator.java:378)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:337)
>       at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$runInitializer$3(RootInputInitializerManager.java:199)
>       at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$319/0x0000000840942440.run(Unknown
>  Source)
>       at java.base/java.security.AccessController.doPrivileged(Native Method)
>       at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>       at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:192)
>       at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:173)
>       at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$createAndStartInitializing$2(RootInputInitializerManager.java:167)
>       at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$318/0x0000000840942040.run(Unknown
>  Source)
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>       at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>       at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
> example where the large amount of delete files can cause such issues:
> {code}
> <14>1 2025-04-02T10:55:31.462Z query-coordinator-0-7 query-coordinator 1 
> 9b117466-88fb-48c9-898c-66a4e050d012 [mdc@38374 
> class="metrics.LoggingMetricsReporter" level="INFO" thread="App Shared Pool - 
> #54"] Received metrics report: 
> ScanReport{tableName=default_iceberg.adp_staging.fct_position_security_level, 
> snapshotId=1684724242029111928, filter=(is_null(ref(name="dw_delete_ind")) or 
> not(ref(name="dw_delete_ind") == "(hash-5d29f651)")), schemaId=0, 
> projectedFieldIds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
> 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 
> 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 
> 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 
> 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 
> 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 
> 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 
> 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 
> 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 
> 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 
> 170, 171, 172], projectedFieldNames=[...], 
> scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS,
>  totalDuration=PT1.034713356S, count=1}, 
> resultDataFiles=CounterResult{unit=COUNT, value=3295}, 
> resultDeleteFiles=CounterResult{unit=COUNT, value=82557}, 
> totalDataManifests=CounterResult{unit=COUNT, value=28}, 
> totalDeleteManifests=CounterResult{unit=COUNT, value=52}, 
> scannedDataManifests=CounterResult{unit=COUNT, value=28}, 
> skippedDataManifests=CounterResult{unit=COUNT, value=0}, 
> totalFileSizeInBytes=CounterResult{unit=BYTES, value=721915830265}, 
> totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=3022228880}, 
> skippedDataFiles=CounterResult{unit=COUNT, value=0}, 
> skippedDeleteFiles=CounterResult{unit=COUNT, value=0}, 
> scannedDeleteManifests=CounterResult{unit=COUNT, value=52}, 
> skippedDeleteManifests=CounterResult{unit=COUNT, value=0}, 
> indexedDeleteFiles=CounterResult{unit=COUNT, value=12211}, 
> equalityDeleteFiles=CounterResult{unit=COUNT, value=0}, 
> positionalDeleteFiles=CounterResult{unit=COUNT, value=12211}}, 
> metadata={read.split.open-file-cost=134217728, iceberg-version=Apache Iceberg 
> 1.4.3.2024.0.18.4-15 (commit 8c21337a4c0d7045926323d0ce3172ebb7123a2c)}}
> {code}
> extracted the important part:
> {code}
> totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=3022228880} -> 2.8G
> indexedDeleteFiles=CounterResult{unit=COUNT, value=12211}
> equalityDeleteFiles=CounterResult{unit=COUNT, value=0}, 
> positionalDeleteFiles=CounterResult{unit=COUNT, value=12211}
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28165) HiveSplitGenerator: send splits through filesystem instead of RPC in case of big payload

Reply via email to