Re: Problems with 0.11, count(DISTINCT), and NPE

2013-09-04 Thread Nathanial Thelen
Doing:

 set hive.auto.convert.join.noconditionaltask=false;

makes it work (though it does way more map reduce jobs than it should).  When I 
get some time I will test against the latest trunk.

Thanks,
Nate


On Sep 3, 2013, at 6:09 PM, Yin Huai huaiyin@gmail.com wrote:

 Based on the log, it may be also related to 
 https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a not 
 very optimized way), can you try set 
 hive.auto.convert.join.noconditionaltask=false; ? If you still get the 
 error, give set hive.auto.convert.join=false; a try (it will turn off map 
 join auto convert, so you will use reduce-side join). 
 
 Thanks,
 
 Yin
 
 
 On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan hashut...@apache.org wrote:
 Not sure about EMR. Your best bet is to ask on EMR forums.
 
 Thanks,
 Ashutosh
 
 
 On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen n...@natethelen.com wrote:
 Is there a way to run a patch on EMR?
 
 Thanks,
 Nate
 
 On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan hashut...@apache.org wrote:
 
 Fix in very related area has been checked in trunk today : 
 https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your 
 issue. 
 Can you try latest trunk?
 
 Ashutosh
 
 
 On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.com wrote:
 I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have 
 been getting NullPointerExceptions (NPE) for certain queries in our staging 
 environment.  Only difference between stage and production is the amount of 
 traffic we get so the data set is much smaller.  We are not using any custom 
 code.
 
 I have greatly simplified the query down to the bare minimum that will cause 
 the error:
 
 SELECT
 count(DISTINCT ag.adGroupGuid) as groups,
 count(DISTINCT av.adViewGuid) as ads,
 count(DISTINCT ac.adViewGuid) as uniqueClicks
 FROM
 adgroup ag
 INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
 LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
 
 This will return the following before any Map Reduce jobs start:
 
 FAILED: NullPointerException null
 
 Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I 
 see this error:
 
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 
 94324 file count: 20 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 
 142609 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 
 65519 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 
 205096 file count: 20 directory count: 1
 2013-09-03 18:09:19,800 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
 optimization is applicable
 2013-09-03 18:09:19,801 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table 
 scans
 2013-09-03 18:09:19,801 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
 optimization is applicable
 2013-09-03 18:09:19,801 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table 
 scans
 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver 
 (SessionState.java:printError(386)) - FAILED: NullPointerException null
 java.lang.NullPointerException
  at 
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
  at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
  at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
  at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
  at 
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
  at 
 org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
  at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426

Re: Problems with 0.11, count(DISTINCT), and NPE

2013-09-03 Thread Nathanial Thelen
Is there a way to run a patch on EMR?

Thanks,
Nate

On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan hashut...@apache.org wrote:

 Fix in very related area has been checked in trunk today : 
 https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your 
 issue. 
 Can you try latest trunk?
 
 Ashutosh
 
 
 On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.com wrote:
 I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been 
 getting NullPointerExceptions (NPE) for certain queries in our staging 
 environment.  Only difference between stage and production is the amount of 
 traffic we get so the data set is much smaller.  We are not using any custom 
 code.
 
 I have greatly simplified the query down to the bare minimum that will cause 
 the error:
 
 SELECT
 count(DISTINCT ag.adGroupGuid) as groups,
 count(DISTINCT av.adViewGuid) as ads,
 count(DISTINCT ac.adViewGuid) as uniqueClicks
 FROM
 adgroup ag
 INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
 LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
 
 This will return the following before any Map Reduce jobs start:
 
 FAILED: NullPointerException null
 
 Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I 
 see this error:
 
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 
 94324 file count: 20 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 
 142609 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 
 65519 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 
 205096 file count: 20 directory count: 1
 2013-09-03 18:09:19,800 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
 optimization is applicable
 2013-09-03 18:09:19,801 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans
 2013-09-03 18:09:19,801 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
 optimization is applicable
 2013-09-03 18:09:19,801 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans
 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver 
 (SessionState.java:printError(386)) - FAILED: NullPointerException null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
   at 
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
   at 
 org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke