konstantinb commented on PR #6484:
URL: https://github.com/apache/hive/pull/6484#issuecomment-4723665150
@armitage420 this optimization seems to put too much trust in the accuracy
of statistics _estimates_. Those aren't always accurate — e.g. for
variable-length / UDF output like `repeat()`, the width is capped at
`hive.stats.max.variable.length` (100B/row), so `computeOnlineDataSize` can
fall far below the actual build size and the gate then approves a broadcast
well over `hive.auto.convert.join.noconditionaltask.size`.
The following test file succeeds on current master but OOMs on this branch
under `TestMiniTezCliDriver`:
```sql
--! qt:dataset:src
set hive.auto.convert.join=true;
set hive.vectorized.execution.enabled=false;
set hive.tez.container.size=512;
set tez.cartesian-product.max-parallelism=32;
set tez.cartesian-product.min-ops-per-worker=10000;
create table build_small_side stored as orc as select s1.value as v from src
s1 cross join src s2 limit 2000;
create table probe_big stored as orc as select s1.value as pk from src s1
cross join src s2 limit 2001;
select length(max(b.w)) as ml
from probe_big p
cross join (select repeat(z.v, 31500) as w from build_small_side z) b;
```
On master the cross product runs as a distributed shuffle (`XPROD_EDGE`) and
completes. On this branch the byte-fallback converts it to a broadcast
map-join, and the build hashtable OOMs during load — ~429MB of build
(`SHUFFLE_BYTES`) broadcast into a ~416MB task heap (`COMMITTED_HEAP_BYTES`):
```
java.lang.RuntimeException: Map operator initialization failed
...
Caused by: java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261)
at
org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237)
at
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$LazyBinaryKvWriter.writeValue(MapJoinBytesTableContainer.java:333)
at
org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.writeValueAndLength(BytesBytesMultiHashMap.java:923)
at
org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:448)
at
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:460)
at
org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:261)
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:381)
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:448)
```
(`TestMiniTezCliDriver` is used deliberately: its per-container memory
isolation is what lets master's distributed cross product survive while the
broadcast concentrates the whole build into one container. Under the default
`MiniLlapLocal` driver everything shares one JVM heap, so the contrast doesn't
surface.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]