[jira] [Commented] (HIVE-10773) MapJoinOperator times out on loading HashTable
[ https://issues.apache.org/jira/browse/HIVE-10773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755701#comment-16755701 ] Zhihua Deng commented on HIVE-10773: Another case is that if the mapjoin key has double type, a high map collision ratio will be seen when putting these into HashMapWrapper. this was fixed by HIVE-12354 > MapJoinOperator times out on loading HashTable > -- > > Key: HIVE-10773 > URL: https://issues.apache.org/jira/browse/HIVE-10773 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.14.0 >Reporter: frank luo >Priority: Major > > When running a map join, depends on data, it might timeout with last two > lines in the log as below. And when I do "set > mapreduce.task.timeout=60;", which is defaulted to 30, the query can > go through fine. The size of hashtable file is roughly 400M. > 2015-05-20 13:27:03,237 INFO [main] > org.apache.hadoop.hive.ql.exec.MapJoinOperator: *** Load from HashTable > for input file: > hdfs://nameservice1/tmp/hive/jluo/2ee8914d-1cef-4af4-aac6-51f64d630346/hive_2015-05-20_13-13-35_335_1565066409090716856-1/-mr-10007/00_0 > 2015-05-20 13:27:03,237 INFO [main] > org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load back 1 hashtable file > from tmp file > uri:file:/data/12/hadoop/yarn/local/usercache/xxy/appcache/application_1430337284339_2087 > /container_1430337284339_2087_01_03/Stage-3.tar.gz/MapJoin-mapfile31--.hashtable > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-10773) MapJoinOperator times out on loading HashTable
[ https://issues.apache.org/jira/browse/HIVE-10773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751954#comment-16751954 ] Zhihua Deng commented on HIVE-10773: We met the same issue running job on mapreduce. In one of our cases, the 99% stored in dumped hashtable are one-to-one kv mappings. Even though the file is not larger than 10m, the mapper tasks more than half an hour to load the table with about 200,000 keys. > MapJoinOperator times out on loading HashTable > -- > > Key: HIVE-10773 > URL: https://issues.apache.org/jira/browse/HIVE-10773 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.14.0 >Reporter: frank luo >Priority: Major > > When running a map join, depends on data, it might timeout with last two > lines in the log as below. And when I do "set > mapreduce.task.timeout=60;", which is defaulted to 30, the query can > go through fine. The size of hashtable file is roughly 400M. > 2015-05-20 13:27:03,237 INFO [main] > org.apache.hadoop.hive.ql.exec.MapJoinOperator: *** Load from HashTable > for input file: > hdfs://nameservice1/tmp/hive/jluo/2ee8914d-1cef-4af4-aac6-51f64d630346/hive_2015-05-20_13-13-35_335_1565066409090716856-1/-mr-10007/00_0 > 2015-05-20 13:27:03,237 INFO [main] > org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load back 1 hashtable file > from tmp file > uri:file:/data/12/hadoop/yarn/local/usercache/xxy/appcache/application_1430337284339_2087 > /container_1430337284339_2087_01_03/Stage-3.tar.gz/MapJoin-mapfile31--.hashtable > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-10773) MapJoinOperator times out on loading HashTable
[ https://issues.apache.org/jira/browse/HIVE-10773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322183#comment-15322183 ] VARIDH BHARGAVA commented on HIVE-10773: Is this an expected behavior ? We are running into similar issue where in using "hive.auto.convert.join=true" and using tables in join clause whose size is not more than 100 MB, we are seeing execution time of almost 5-6 hours ! We are using following settings : set HADOOP_HEAPSIZE=5120; set hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=10; set mapred.task.timeout=60; OR is there is a work around for the same > MapJoinOperator times out on loading HashTable > -- > > Key: HIVE-10773 > URL: https://issues.apache.org/jira/browse/HIVE-10773 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.14.0 >Reporter: frank luo > > When running a map join, depends on data, it might timeout with last two > lines in the log as below. And when I do "set > mapreduce.task.timeout=60;", which is defaulted to 30, the query can > go through fine. The size of hashtable file is roughly 400M. > 2015-05-20 13:27:03,237 INFO [main] > org.apache.hadoop.hive.ql.exec.MapJoinOperator: *** Load from HashTable > for input file: > hdfs://nameservice1/tmp/hive/jluo/2ee8914d-1cef-4af4-aac6-51f64d630346/hive_2015-05-20_13-13-35_335_1565066409090716856-1/-mr-10007/00_0 > 2015-05-20 13:27:03,237 INFO [main] > org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load back 1 hashtable file > from tmp file > uri:file:/data/12/hadoop/yarn/local/usercache/xxy/appcache/application_1430337284339_2087 > /container_1430337284339_2087_01_03/Stage-3.tar.gz/MapJoin-mapfile31--.hashtable > -- This message was sent by Atlassian JIRA (v6.3.4#6332)