[
https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Magyar resolved TEZ-4271.
--------------------------------
Resolution: Won't Fix
Instead of limiting it on the TEZ side we'll increase the range in Hive as part
of HIVE-24715.
> Add config to limit desiredNumSplits
> ------------------------------------
>
> Key: TEZ-4271
> URL: https://issues.apache.org/jira/browse/TEZ-4271
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Attila Magyar
> Assignee: Attila Magyar
> Priority: Major
>
> raThere are multiple config parameters (like tez.grouping.min/max-size,
> tez.grouping.by-length, tez.grouping.by-count,
> tez.grouping.node.local.only) that impacts the number of grouped input splits
> but there is no single property for setting an exact top limit on the desired
> count.
> In Hive the max number of buckets is 4095. During an insert overwrite each
> tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails
> with a bucketId out of range exception.
>
> When "tez.grouping.by-count" is used then clamping the desiredNumSplits would
> be easy. However when "tez.grouping.by-length" is enabled (which is the
> default) clamping desiredNumSplits is not enough since TEZ might generate a
> few more splits than the desired.
> For example:
> * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5
> is on node0 the other 5 is on node1.
> * desiredNumSplits: 4
> * Total size: 100
> * lengthPerGroup: 100 / 4 = 25
> * group0: [node0=>10, node0=>10]
> * group1: [node1=>10, node1=>10]
> * group2: [node0=>10, node0=>10]
> * group2: [node1=>10, node1=>10]
> * group4: default-rack=>[node0=>10, node1=>10]
>
> The lengthPerGroup prevents adding more than 2 splits into the group
> resulting 5 groups instead of the 4 desired.
>
> If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would
> generate 3. But we can't assume all splits have the same size (?)
> We might need to detect if groupedSplits.size() is greater than desired in
> the loop, and redistribute the remaining splits across the existing groups
> (either in a round robin fashion or by selecting the smallest), instead of
> creating new groups. This might cause existing groups to be converted
> rackLocal groups if the node locality of the remaining is different then
> locality of the existing groups.
> Or doing a second pass after groupedSplits is fully calculated and trying to
> merge existing groups. Either way this complicates the logic even further. At
> this point I'm not sure what would be the best. [~rajesh.balamohan],
> [~t3rmin4t0r] do you have any suggestions?
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1,
> vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed,
> taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0
> failed, info=[Error: Error while running task ( failure ) :
> attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
> at java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) Caused by:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
> ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.IllegalArgumentException: bucketId out of range: 4098 at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
> ... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of
> range: 4098 at
> org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at
> org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289)
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352)
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814)
> ... 26 more ], TaskAttempt 1 failed, info=[Error: Error while running task (
> failure ) :
> attempt_1610498854304_0004_1_00_004098_1:java.lang.RuntimeException:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
> at java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) Caused by:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
> ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.IllegalArgumentException: bucketId out of range: 4098 at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
> ... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of
> range: 4098 at
> org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at
> org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289)
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352)
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814)
> ... 26 more ], TaskAttempt 2 failed, info=[Error: Error while running task (
> failure ) :
> attempt_1610498854304_0004_1_00_004098_2:java.lang.RuntimeException:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
> at java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) Caused by:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
> ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.IllegalArgumentException: bucketId out of range: 4098 at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
> ... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of
> range: 4098 at
> org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at
> org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289)
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352)
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814)
> ... 26 more ], TaskAttempt 3 failed, info=[Error: Error while running task (
> failure ) :
> attempt_1610498854304_0004_1_00_004098_3:java.lang.RuntimeException:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
> at java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) Caused by:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
> ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.IllegalArgumentException: bucketId out of range: 4098 at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
> ... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of
> range: 4098 at
> org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at
> org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289)
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352)
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814)
> ... 26 more ]], Vertex did not succeed due to OWN_TASK_FAILURE,
> failedTasks:1 killedTasks:3645, Vertex vertex_1610498854304_0004_1_00 [Map 1]
> killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2,
> vertexId=vertex_1610498854304_0004_1_01, diagnostics=[Vertex received Kill
> while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE,
> failedTasks:0 killedTasks:1, Vertex vertex_1610498854304_0004_1_01 [Reducer
> 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to
> VERTEX_FAILURE. failedVertices:1 killedVertices:1 {code}
>
> cc: [~abstractdog], [~ashutoshc]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)