[ https://issues.apache.org/jira/browse/HIVE-13755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vikram Dixit K updated HIVE-13755: ---------------------------------- Priority: Critical (was: Major) > Hybrid mapjoin allocates memory the same for multi broadcast > ------------------------------------------------------------ > > Key: HIVE-13755 > URL: https://issues.apache.org/jira/browse/HIVE-13755 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 2.1.0 > Reporter: Wei Zheng > Assignee: Wei Zheng > Priority: Critical > > PROBLEM: > When hybrid mapjoin gets the memory needed, it estimates memory needed for > each hashtable the same. This may cause problem when there are multiple > broadcast, as it may exceeds the memory intended to allocate to it. > Example reducer task log attached. This task has 5 broadcast input, > Reducer 3 <- Map 10 (BROADCAST_EDGE), Map 11 (BROADCAST_EDGE), Map 12 > (BROADCAST_EDGE), Map 8 (SIMPLE_EDGE), Map 9 (BROADCAST_EDGE), Reducer 2 > (SIMPLE_EDGE) > excerpt of it: > {code} > 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: > Memory manager allocates 0 bytes for the loading hashtable. > 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] > |persistence.HashMapWrapper|: Key count from statistics is 210; setting map > size to 280 > 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Total available memory: 1968177152 > 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Estimated small table size: 155190 > 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Number of hash partitions to be > created: 16 > 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Write buffer size: 524288 > 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Number of partitions created: 16 > 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Number of partitions spilled directly > to disk on creation: 0 > 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: > Using tableContainer HybridHashTableContainer > 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Initializing container with > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and > org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] > |readers.UnorderedKVReader|: Num Records read: 20 > 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG > method=LoadHashtable start=1458069830811 end=1458069830814 duration=3 > from=org.apache.hadoop.hive.ql.exec.MapJoinOperator> > 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching > key: > svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_126_container > 2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.HashTableDummyOperator|: > Initializing operator HASHTABLEDUMMY[32] > 2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.MapJoinOperator|: > Initializing operator MAPJOIN[26] > 2016-03-15 19:23:50,816 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN > struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string,_col568:char(1),_col570:string> > totalsz = 95 > 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG > method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator> > 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: > Memory manager allocates 0 bytes for the loading hashtable. > 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] > |persistence.HashMapWrapper|: Key count from statistics is 5942112; setting > map size to 7922816 > 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Total available memory: 1968177152 > 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Estimated small table size: 1324101915 > 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Number of hash partitions to be > created: 16 > 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Write buffer size: 8388608 > 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Number of partitions created: 16 > 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Number of partitions spilled directly > to disk on creation: 0 > 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: > Using tableContainer HybridHashTableContainer > 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Initializing container with > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and > org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > 2016-03-15 19:23:51,543 [INFO] [pool-47-thread-1] > |readers.UnorderedKVReader|: Num Records read: 852596 > 2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG > method=LoadHashtable start=1458069830817 end=1458069831563 duration=746 > from=org.apache.hadoop.hive.ql.exec.MapJoinOperator> > 2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching > key: > svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_127_container > 2016-03-15 19:23:51,563 [INFO] [TezChild] |exec.HashTableDummyOperator|: > Initializing operator HASHTABLEDUMMY[31] > 2016-03-15 19:23:51,564 [INFO] [TezChild] |exec.MapJoinOperator|: > Initializing operator MAPJOIN[27] > 2016-03-15 19:23:51,566 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN > struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string> > totalsz = 93 > 2016-03-15 19:23:51,566 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG > method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator> > 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: > Memory manager allocates 0 bytes for the loading hashtable. > 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] > |persistence.HashMapWrapper|: Key count from statistics is 293380; setting > map size to 391174 > 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Total available memory: 1968177152 > 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Estimated small table size: 69929471 > 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Number of hash partitions to be > created: 16 > 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Write buffer size: 4194304 > 2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Number of partitions created: 16 > 2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Number of partitions spilled directly > to disk on creation: 0 > 2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: > Using tableContainer HybridHashTableContainer > 2016-03-15 19:23:51,569 [INFO] [pool-47-thread-1] > |persistence.HybridHashTableContainer|: Initializing container with > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and > org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > 2016-03-15 19:23:51,980 [INFO] [pool-47-thread-1] > |readers.UnorderedKVReader|: Num Records read: 586760 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)