Re: Problems with 0.11, count(DISTINCT), and NPE
Doing: set hive.auto.convert.join.noconditionaltask=false; makes it work (though it does way more map reduce jobs than it should). When I get some time I will test against the latest trunk. Thanks, Nate On Sep 3, 2013, at 6:09 PM, Yin Huai huaiyin@gmail.com wrote: Based on the log, it may be also related to https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a not very optimized way), can you try set hive.auto.convert.join.noconditionaltask=false; ? If you still get the error, give set hive.auto.convert.join=false; a try (it will turn off map join auto convert, so you will use reduce-side join). Thanks, Yin On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan hashut...@apache.org wrote: Not sure about EMR. Your best bet is to ask on EMR forums. Thanks, Ashutosh On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen n...@natethelen.com wrote: Is there a way to run a patch on EMR? Thanks, Nate On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan hashut...@apache.org wrote: Fix in very related area has been checked in trunk today : https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your issue. Can you try latest trunk? Ashutosh On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.com wrote: I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment. Only difference between stage and production is the amount of traffic we get so the data set is much smaller. We are not using any custom code. I have greatly simplified the query down to the bare minimum that will cause the error: SELECT count(DISTINCT ag.adGroupGuid) as groups, count(DISTINCT av.adViewGuid) as ads, count(DISTINCT ac.adViewGuid) as uniqueClicks FROM adgroup ag INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid This will return the following before any Map Reduce jobs start: FAILED: NullPointerException null Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error: 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1 2013-09-03 18:09:19,800 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426
Re: Problems with 0.11, count(DISTINCT), and NPE
Is there a way to run a patch on EMR? Thanks, Nate On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan hashut...@apache.org wrote: Fix in very related area has been checked in trunk today : https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your issue. Can you try latest trunk? Ashutosh On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.com wrote: I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment. Only difference between stage and production is the amount of traffic we get so the data set is much smaller. We are not using any custom code. I have greatly simplified the query down to the bare minimum that will cause the error: SELECT count(DISTINCT ag.adGroupGuid) as groups, count(DISTINCT av.adViewGuid) as ads, count(DISTINCT ac.adViewGuid) as uniqueClicks FROM adgroup ag INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid This will return the following before any Map Reduce jobs start: FAILED: NullPointerException null Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error: 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1 2013-09-03 18:09:19,800 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke