Re: Problems with 0.11, count(DISTINCT), and NPE

2013-09-04 Thread Nathanial Thelen
Doing:

> set hive.auto.convert.join.noconditionaltask=false;

makes it work (though it does way more map reduce jobs than it should).  When I 
get some time I will test against the latest trunk.

Thanks,
Nate


On Sep 3, 2013, at 6:09 PM, Yin Huai  wrote:

> Based on the log, it may be also related to 
> https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a not 
> very optimized way), can you try "set 
> hive.auto.convert.join.noconditionaltask=false;" ? If you still get the 
> error, give "set hive.auto.convert.join=false;" a try (it will turn off map 
> join auto convert, so you will use reduce-side join). 
> 
> Thanks,
> 
> Yin
> 
> 
> On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan  wrote:
> Not sure about EMR. Your best bet is to ask on EMR forums.
> 
> Thanks,
> Ashutosh
> 
> 
> On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen  wrote:
> Is there a way to run a patch on EMR?
> 
> Thanks,
> Nate
> 
> On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan  wrote:
> 
>> Fix in very related area has been checked in trunk today : 
>> https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your 
>> issue. 
>> Can you try latest trunk?
>> 
>> Ashutosh
>> 
>> 
>> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen  wrote:
>> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have 
>> been getting NullPointerExceptions (NPE) for certain queries in our staging 
>> environment.  Only difference between stage and production is the amount of 
>> traffic we get so the data set is much smaller.  We are not using any custom 
>> code.
>> 
>> I have greatly simplified the query down to the bare minimum that will cause 
>> the error:
>> 
>> SELECT
>> count(DISTINCT ag.adGroupGuid) as groups,
>> count(DISTINCT av.adViewGuid) as ads,
>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>> adgroup ag
>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>> 
>> This will return the following before any Map Reduce jobs start:
>> 
>> FAILED: NullPointerException null
>> 
>> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I 
>> see this error:
>> 
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 
>> 94324 file count: 20 directory count: 1
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 
>> 142609 file count: 21 directory count: 1
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 
>> 65519 file count: 21 directory count: 1
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 
>> 205096 file count: 20 directory count: 1
>> 2013-09-03 18:09:19,800 INFO  
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
>> optimization is applicable
>> 2013-09-03 18:09:19,801 INFO  
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table 
>> scans
>> 2013-09-03 18:09:19,801 INFO  
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
>> optimization is applicable
>> 2013-09-03 18:09:19,801 INFO  
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table 
>> scans
>> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver 
>> (SessionState.java:printError(386)) - FAILED: NullPointerException null
>> java.lang.NullPointerException
>>  at 
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>>  at 
>

Re: Problems with 0.11, count(DISTINCT), and NPE

2013-09-03 Thread Nathanial Thelen
Is there a way to run a patch on EMR?

Thanks,
Nate

On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan  wrote:

> Fix in very related area has been checked in trunk today : 
> https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your 
> issue. 
> Can you try latest trunk?
> 
> Ashutosh
> 
> 
> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen  wrote:
> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been 
> getting NullPointerExceptions (NPE) for certain queries in our staging 
> environment.  Only difference between stage and production is the amount of 
> traffic we get so the data set is much smaller.  We are not using any custom 
> code.
> 
> I have greatly simplified the query down to the bare minimum that will cause 
> the error:
> 
> SELECT
> count(DISTINCT ag.adGroupGuid) as groups,
> count(DISTINCT av.adViewGuid) as ads,
> count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
> adgroup ag
> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
> 
> This will return the following before any Map Reduce jobs start:
> 
> FAILED: NullPointerException null
> 
> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I 
> see this error:
> 
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 
> 94324 file count: 20 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 
> 142609 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 
> 65519 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 
> 205096 file count: 20 directory count: 1
> 2013-09-03 18:09:19,800 INFO  
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
> optimization is applicable
> 2013-09-03 18:09:19,801 INFO  
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans
> 2013-09-03 18:09:19,801 INFO  
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
> optimization is applicable
> 2013-09-03 18:09:19,801 INFO  
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans
> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver 
> (SessionState.java:printError(386)) - FAILED: NullPointerException null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
>   at org.apa

Problems with 0.11, count(DISTINCT), and NPE

2013-09-03 Thread Nathanial Thelen
I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been 
getting NullPointerExceptions (NPE) for certain queries in our staging 
environment.  Only difference between stage and production is the amount of 
traffic we get so the data set is much smaller.  We are not using any custom 
code.

I have greatly simplified the query down to the bare minimum that will cause 
the error:

SELECT
count(DISTINCT ag.adGroupGuid) as groups,
count(DISTINCT av.adViewGuid) as ads,
count(DISTINCT ac.adViewGuid) as uniqueClicks
FROM
adgroup ag
INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid

This will return the following before any Map Reduce jobs start:

FAILED: NullPointerException null

Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see 
this error:

2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
(Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 
94324 file count: 20 directory count: 1
2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
(Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 
142609 file count: 21 directory count: 1
2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
(Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 
65519 file count: 21 directory count: 1
2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
(Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 
205096 file count: 20 directory count: 1
2013-09-03 18:09:19,800 INFO  
org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
(MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
optimization is applicable
2013-09-03 18:09:19,801 INFO  
org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
(MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans
2013-09-03 18:09:19,801 INFO  
org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
(MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
optimization is applicable
2013-09-03 18:09:19,801 INFO  
org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
(MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans
2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver 
(SessionState.java:printError(386)) - FAILED: NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
at 
org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
at 
org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

The same error also happens if I do an INNER JOIN to adclick, FYI.

I have checked that there are not any null values for any of the columns 
referenced in the query. 

Making almost any chang