Re: Hive Join Running Out of Memory
Clay, Keep in mind that setting this to false in the global hive-site.xml will mean that you will not do any client hash table generating and will miss out on optimizations for other joins. You should set this in your query directly. Another option is so increase the client side heap to allow for larger in memory tables. On Fri, Jul 18, 2014 at 11:12 AM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: I changed the hive.auto.convert.join.noconditionaltask = false in the hive site and that seemed to do the trick. Thanks! From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Friday, July 18, 2014 10:57 AM To: user@hive.apache.org Subject: Re: Hive Join Running Out of Memory I believe that would be the one. On Fri, Jul 18, 2014 at 10:54 AM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: Thank you. Would it be acceptable to use the following? SET hive.exec.mode.local.auto=false; From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Friday, July 18, 2014 10:45 AM To: user@hive.apache.org Subject: Re: Hive Join Running Out of Memory This is a failed optimization hive is trying to build the lookup table locally and then put it in the distributed cache and then to a map join. Look through your hive site for the configuration to turn these auto-map joins off. Based on your version the variables changed a names /deprecated etc so I can not tell you the exact ones. On Fri, Jul 18, 2014 at 10:35 AM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: Hello everyone. I need some assistance. I have a join that fails with return code 3. The query is; SELECT B.CARD_NBR AS CNT FROM TENDER_TABLE A JOIN LOYALTY_CARDS B ON A.CARD_NBR = B.CARD_NBR LIMIT 10; -- Row Counts -- LOYALTY_CARDS = 43,876,938 -- TENDER_TABLE = 1,412,228,333 The query execution output starts with; 2014-07-18 10:30:17 Starting to launch local task to process map join; maximum memory = 1065484288 The last output is as follows; 2014-07-18 10:30:44 Processing rows:380 Hashtable size: 379 Memory usage: 969531248 percentage: 0.91 I ran SET mapred.child.java.opts=-Xmx4G; before the query but that did not change the maximum memory. What am I not understanding and how should I troubleshoot his issue? hive SELECT B.CARD_NBR AS CNT FROM TENDER_TABLE A JOIN LOYALTY_CARDS B ON A.CARD_NBR = B.CARD_NBR LIMIT 10; Query ID = root_20140718103030_df1e7af9-7d66-4ba5-8d73-2d0bf58bb474 Total jobs = 1 14/07/18 10:30:17 WARN conf.Configuration: file:/tmp/root/hive_2014-07-18_10-30-15_081_1503496466695602651-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/07/18 10:30:17 WARN conf.Configuration: file:/tmp/root/hive_2014-07-18_10-30-15_081_1503496466695602651-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed Execution log at: /tmp/root/root_20140718103030_df1e7af9-7d66-4ba5-8d73-2d0bf58bb474.log 2014-07-18 10:30:17 Starting to launch local task to process map join; maximum memory = 1065484288 2014-07-18 10:30:20 Processing rows:20 Hashtable size: 19 Memory usage: 53829960percentage: 0.051 2014-07-18 10:30:21 Processing rows:30 Hashtable size: 29 Memory usage: 76926312percentage: 0.072 2014-07-18 10:30:22 Processing rows:40 Hashtable size: 39 Memory usage: 105119456 percentage: 0.099 2014-07-18 10:30:23 Processing rows:50 Hashtable size: 49 Memory usage: 129079592 percentage: 0.121 2014-07-18
Re: Hive Join Running Out of Memory
This is a failed optimization hive is trying to build the lookup table locally and then put it in the distributed cache and then to a map join. Look through your hive site for the configuration to turn these auto-map joins off. Based on your version the variables changed a names /deprecated etc so I can not tell you the exact ones. On Fri, Jul 18, 2014 at 10:35 AM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: Hello everyone. I need some assistance. I have a join that fails with return code 3. The query is; SELECT B.CARD_NBR AS CNT FROM TENDER_TABLE A JOIN LOYALTY_CARDS B ON A.CARD_NBR = B.CARD_NBR LIMIT 10; -- Row Counts -- LOYALTY_CARDS = 43,876,938 -- TENDER_TABLE = 1,412,228,333 The query execution output starts with; 2014-07-18 10:30:17 Starting to launch local task to process map join; maximum memory = 1065484288 The last output is as follows; 2014-07-18 10:30:44 Processing rows:380 Hashtable size: 379 Memory usage: 969531248 percentage: 0.91 I ran SET mapred.child.java.opts=-Xmx4G; before the query but that did not change the maximum memory. What am I not understanding and how should I troubleshoot his issue? hive SELECT B.CARD_NBR AS CNT FROM TENDER_TABLE A JOIN LOYALTY_CARDS B ON A.CARD_NBR = B.CARD_NBR LIMIT 10; Query ID = root_20140718103030_df1e7af9-7d66-4ba5-8d73-2d0bf58bb474 Total jobs = 1 14/07/18 10:30:17 WARN conf.Configuration: file:/tmp/root/hive_2014-07-18_10-30-15_081_1503496466695602651-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/07/18 10:30:17 WARN conf.Configuration: file:/tmp/root/hive_2014-07-18_10-30-15_081_1503496466695602651-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed Execution log at: /tmp/root/root_20140718103030_df1e7af9-7d66-4ba5-8d73-2d0bf58bb474.log 2014-07-18 10:30:17 Starting to launch local task to process map join; maximum memory = 1065484288 2014-07-18 10:30:20 Processing rows:20 Hashtable size: 19 Memory usage: 53829960percentage: 0.051 2014-07-18 10:30:21 Processing rows:30 Hashtable size: 29 Memory usage: 76926312percentage: 0.072 2014-07-18 10:30:22 Processing rows:40 Hashtable size: 39 Memory usage: 105119456 percentage: 0.099 2014-07-18 10:30:23 Processing rows:50 Hashtable size: 49 Memory usage: 129079592 percentage: 0.121 2014-07-18 10:30:24 Processing rows:60 Hashtable size: 59 Memory usage: 151469744 percentage: 0.142 2014-07-18 10:30:24 Processing rows:70 Hashtable size: 69 Memory usage: 174968512 percentage: 0.164 2014-07-18 10:30:25 Processing rows:80 Hashtable size: 79 Memory usage: 207735176 percentage: 0.195 2014-07-18 10:30:25 Processing rows:90 Hashtable size: 89 Memory usage: 232306976 percentage: 0.218 2014-07-18 10:30:26 Processing rows:100 Hashtable size: 99 Memory usage: 255813784 percentage: 0.24 2014-07-18 10:30:27 Processing rows:110 Hashtable size: 109 Memory usage: 280781448 percentage: 0.264 2014-07-18 10:30:27 Processing rows:120 Hashtable size: 119 Memory usage: 305606024 percentage: 0.287 2014-07-18 10:30:28 Processing rows:130 Hashtable size: 129 Memory usage: 323502504 percentage: 0.304 2014-07-18 10:30:28
RE: Hive Join Running Out of Memory
Thank you. Would it be acceptable to use the following? SET hive.exec.mode.local.auto=false; From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Friday, July 18, 2014 10:45 AM To: user@hive.apache.org Subject: Re: Hive Join Running Out of Memory This is a failed optimization hive is trying to build the lookup table locally and then put it in the distributed cache and then to a map join. Look through your hive site for the configuration to turn these auto-map joins off. Based on your version the variables changed a names /deprecated etc so I can not tell you the exact ones. On Fri, Jul 18, 2014 at 10:35 AM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: Hello everyone. I need some assistance. I have a join that fails with return code 3. The query is; SELECT B.CARD_NBR AS CNT FROM TENDER_TABLE A JOIN LOYALTY_CARDS B ON A.CARD_NBR = B.CARD_NBR LIMIT 10; -- Row Counts -- LOYALTY_CARDS = 43,876,938 -- TENDER_TABLE = 1,412,228,333 The query execution output starts with; 2014-07-18 10:30:17 Starting to launch local task to process map join; maximum memory = 1065484288 The last output is as follows; 2014-07-18 10:30:44 Processing rows: 380 Hashtable size: 379 Memory usage: 969531248 percentage: 0.91 I ran SET mapred.child.java.opts=-Xmx4G; before the query but that did not change the maximum memory. What am I not understanding and how should I troubleshoot his issue? hive SELECT B.CARD_NBR AS CNT FROM TENDER_TABLE A JOIN LOYALTY_CARDS B ON A.CARD_NBR = B.CARD_NBR LIMIT 10; Query ID = root_20140718103030_df1e7af9-7d66-4ba5-8d73-2d0bf58bb474 Total jobs = 1 14/07/18 10:30:17 WARN conf.Configuration: file:/tmp/root/hive_2014-07-18_10-30-15_081_1503496466695602651-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/07/18 10:30:17 WARN conf.Configuration: file:/tmp/root/hive_2014-07-18_10-30-15_081_1503496466695602651-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed Execution log at: /tmp/root/root_20140718103030_df1e7af9-7d66-4ba5-8d73-2d0bf58bb474.log 2014-07-18 10:30:17 Starting to launch local task to process map join; maximum memory = 1065484288 2014-07-18 10:30:20 Processing rows: 20 Hashtable size: 19 Memory usage: 53829960 percentage: 0.051 2014-07-18 10:30:21 Processing rows: 30 Hashtable size: 29 Memory usage: 76926312 percentage: 0.072 2014-07-18 10:30:22 Processing rows: 40 Hashtable size: 39 Memory usage: 105119456 percentage: 0.099 2014-07-18 10:30:23 Processing rows: 50 Hashtable size: 49 Memory usage: 129079592 percentage: 0.121 2014-07-18 10:30:24 Processing rows: 60 Hashtable size: 59 Memory usage: 151469744 percentage: 0.142 2014-07-18 10:30:24 Processing rows: 70 Hashtable size: 69 Memory usage: 174968512 percentage: 0.164 2014-07-18 10:30:25 Processing rows: 80 Hashtable size: 79 Memory usage: 207735176 percentage: 0.195 2014-07-18 10:30:25 Processing rows: 90 Hashtable size: 89 Memory usage: 232306976 percentage: 0.218 2014-07-18 10:30:26 Processing rows: 100 Hashtable size: 99 Memory usage: 255813784 percentage: 0.24 2014-07-18 10:30:27 Processing rows: 110 Hashtable size: 109 Memory usage: 280781448 percentage: 0.264 2014-07-18 10:30:27 Processing rows: 120 Hashtable size: 119
Re: Hive Join Running Out of Memory
I believe that would be the one. On Fri, Jul 18, 2014 at 10:54 AM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: Thank you. Would it be acceptable to use the following? SET hive.exec.mode.local.auto=false; From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Friday, July 18, 2014 10:45 AM To: user@hive.apache.org Subject: Re: Hive Join Running Out of Memory This is a failed optimization hive is trying to build the lookup table locally and then put it in the distributed cache and then to a map join. Look through your hive site for the configuration to turn these auto-map joins off. Based on your version the variables changed a names /deprecated etc so I can not tell you the exact ones. On Fri, Jul 18, 2014 at 10:35 AM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: Hello everyone. I need some assistance. I have a join that fails with return code 3. The query is; SELECT B.CARD_NBR AS CNT FROM TENDER_TABLE A JOIN LOYALTY_CARDS B ON A.CARD_NBR = B.CARD_NBR LIMIT 10; -- Row Counts -- LOYALTY_CARDS = 43,876,938 -- TENDER_TABLE = 1,412,228,333 The query execution output starts with; 2014-07-18 10:30:17 Starting to launch local task to process map join; maximum memory = 1065484288 The last output is as follows; 2014-07-18 10:30:44 Processing rows:380 Hashtable size: 379 Memory usage: 969531248 percentage: 0.91 I ran SET mapred.child.java.opts=-Xmx4G; before the query but that did not change the maximum memory. What am I not understanding and how should I troubleshoot his issue? hive SELECT B.CARD_NBR AS CNT FROM TENDER_TABLE A JOIN LOYALTY_CARDS B ON A.CARD_NBR = B.CARD_NBR LIMIT 10; Query ID = root_20140718103030_df1e7af9-7d66-4ba5-8d73-2d0bf58bb474 Total jobs = 1 14/07/18 10:30:17 WARN conf.Configuration: file:/tmp/root/hive_2014-07-18_10-30-15_081_1503496466695602651-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/07/18 10:30:17 WARN conf.Configuration: file:/tmp/root/hive_2014-07-18_10-30-15_081_1503496466695602651-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed Execution log at: /tmp/root/root_20140718103030_df1e7af9-7d66-4ba5-8d73-2d0bf58bb474.log 2014-07-18 10:30:17 Starting to launch local task to process map join; maximum memory = 1065484288 2014-07-18 10:30:20 Processing rows:20 Hashtable size: 19 Memory usage: 53829960percentage: 0.051 2014-07-18 10:30:21 Processing rows:30 Hashtable size: 29 Memory usage: 76926312percentage: 0.072 2014-07-18 10:30:22 Processing rows:40 Hashtable size: 39 Memory usage: 105119456 percentage: 0.099 2014-07-18 10:30:23 Processing rows:50 Hashtable size: 49 Memory usage: 129079592 percentage: 0.121 2014-07-18 10:30:24 Processing rows:60 Hashtable size: 59 Memory usage: 151469744 percentage: 0.142 2014-07-18 10:30:24 Processing rows:70 Hashtable size: 69 Memory usage: 174968512 percentage: 0.164 2014-07-18 10:30:25 Processing rows:80 Hashtable size: 79 Memory usage: 207735176 percentage: 0.195 2014-07-18 10:30:25 Processing rows:90 Hashtable size: 89 Memory usage: 232306976 percentage: 0.218 2014-07-18 10:30:26 Processing rows:100 Hashtable size: 99 Memory usage: 255813784 percentage: 0.24 2014-07-18 10:30:27 Processing rows
RE: Hive Join Running Out of Memory
I changed the hive.auto.convert.join.noconditionaltask = false in the hive site and that seemed to do the trick. Thanks! From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Friday, July 18, 2014 10:57 AM To: user@hive.apache.org Subject: Re: Hive Join Running Out of Memory I believe that would be the one. On Fri, Jul 18, 2014 at 10:54 AM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: Thank you. Would it be acceptable to use the following? SET hive.exec.mode.local.auto=false; From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Friday, July 18, 2014 10:45 AM To: user@hive.apache.org Subject: Re: Hive Join Running Out of Memory This is a failed optimization hive is trying to build the lookup table locally and then put it in the distributed cache and then to a map join. Look through your hive site for the configuration to turn these auto-map joins off. Based on your version the variables changed a names /deprecated etc so I can not tell you the exact ones. On Fri, Jul 18, 2014 at 10:35 AM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: Hello everyone. I need some assistance. I have a join that fails with return code 3. The query is; SELECT B.CARD_NBR AS CNT FROM TENDER_TABLE A JOIN LOYALTY_CARDS B ON A.CARD_NBR = B.CARD_NBR LIMIT 10; -- Row Counts -- LOYALTY_CARDS = 43,876,938 -- TENDER_TABLE = 1,412,228,333 The query execution output starts with; 2014-07-18 10:30:17 Starting to launch local task to process map join; maximum memory = 1065484288 The last output is as follows; 2014-07-18 10:30:44 Processing rows: 380 Hashtable size: 379 Memory usage: 969531248 percentage: 0.91 I ran SET mapred.child.java.opts=-Xmx4G; before the query but that did not change the maximum memory. What am I not understanding and how should I troubleshoot his issue? hive SELECT B.CARD_NBR AS CNT FROM TENDER_TABLE A JOIN LOYALTY_CARDS B ON A.CARD_NBR = B.CARD_NBR LIMIT 10; Query ID = root_20140718103030_df1e7af9-7d66-4ba5-8d73-2d0bf58bb474 Total jobs = 1 14/07/18 10:30:17 WARN conf.Configuration: file:/tmp/root/hive_2014-07-18_10-30-15_081_1503496466695602651-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/07/18 10:30:17 WARN conf.Configuration: file:/tmp/root/hive_2014-07-18_10-30-15_081_1503496466695602651-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/07/18 10:30:17 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed Execution log at: /tmp/root/root_20140718103030_df1e7af9-7d66-4ba5-8d73-2d0bf58bb474.log 2014-07-18 10:30:17 Starting to launch local task to process map join; maximum memory = 1065484288 2014-07-18 10:30:20 Processing rows: 20 Hashtable size: 19 Memory usage: 53829960 percentage: 0.051 2014-07-18 10:30:21 Processing rows: 30 Hashtable size: 29 Memory usage: 76926312 percentage: 0.072 2014-07-18 10:30:22 Processing rows: 40 Hashtable size: 39 Memory usage: 105119456 percentage: 0.099 2014-07-18 10:30:23 Processing rows: 50 Hashtable size: 49 Memory usage: 129079592 percentage: 0.121 2014-07-18 10:30:24 Processing rows: 60 Hashtable size: 59 Memory usage: 151469744 percentage: 0.142 2014-07-18 10:30:24 Processing rows: 70 Hashtable size: 69 Memory usage: 174968512 percentage: 0.164 2014-07-18 10:30:25 Processing rows: 80 Hashtable size: 79 Memory usage: 207735176 percentage: 0.195 2014-07-18 10:30:25 Processing rows: 90 Hashtable size