Re: Semantics of Rank.

2013-09-03 Thread Lefty Leverenz
Another email thread led me to
HIVE-5038https://issues.apache.org/jira/browse/HIVE-5038(rank
operator is case-sensitive and has odd semantics) -- it's resolved
as invalid, but is that only for the odd semantics?

Perhaps this issue is clarified in more recent emails.  I'm catching up on
a huge backlog.

-- Lefty


On Tue, Sep 3, 2013 at 4:03 AM, Lefty Leverenz leftylever...@gmail.comwrote:

 What's the answer -- does the rank keyword have to be lowercase?

 If lowercase is obligatory we need to revise the wiki, which shows all
 uppercase (
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
 ).

 In the test files it's lowercase (windowing_rank.q, 
 ptf_negative_WhereWithRankCond.q).
  The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but
 I don't know if that means lowercase is required:

 @WindowFunctionDescription

 (

 description = @Description(

 name = rank,

 value = _FUNC_(x)

 ),

 supportsWindow = false,

 pivotResult = true

 )



 And what about the other keywords in the wikidoc?  Same lowercase
 requirement?

 -- Lefty


 On Fri, Jul 26, 2013 at 5:30 PM, saurabh mpp.databa...@gmail.com wrote:

 Hi all,

 Below are some of observations based on the on-going rank function
 discussion.

 1. I executed below mentioned queries  and only the query with rank
 (lowercase) executed successfully, rest were throwing exceptions FAILED:
 SemanticException Failed to breakup Windowing invocations into Groups.

 -  select cust_id, ord_dt, RANK() w from cust_ord window w as (partition
 by cust_id order by ord_dt);

 -  select cust_id, ord_dt, Rank() w from cust_ord window w as (partition
 by cust_id order by ord_dt);

 -   select cust_id, ord_dt, rank() w from cust_ord window w as (partition
 by cust_id order by ord_dt);

 It seems rank keyword is case-sensitive. Attached is the screenshot
 for reference.

 2. I created a dummy table with the data provided in the below mail trail
 and achieved the expected output, using the below mentioned query.

 *select cust_id, ord_dt, rank() over (partition by cust_id order by
 ord_dt) from cust_ord;*

  Request all to kindly review these details and suggest if it was of any
 help!

 Thanks.


 On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser 
 j.barrett.straus...@gmail.com wrote:

 Any further help on this, otherwise I'll file a jira.


 On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser 
 j.barrett.straus...@gmail.com wrote:

 As an example : If I run my query above removing the arg the following
 is thrown.

 FAILED: SemanticException Failed to breakup Windowing invocations into
 Groups. At least 1 group must only depend on input columns. Also check for
 circular dependencies.
 Underlying error:
 org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
 arguments are expected.


 Similar issue and fix here:

 http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html

 Even if it didn't require an arg it still doesn't explain my anomalous
 output.



 On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser 
 j.barrett.straus...@gmail.com wrote:

 That isn't true. If you try to run the above HIVE without an argument,
 it will throw an exception. I have seen other users replicate this problem
 as well.

 I can file a JIRA if someone can confirm that my query should work.


 On Wed, Jul 24, 2013 at 11:02 PM, manishbh...@rocketmail.com 
 manishbh...@rocketmail.com wrote:

 Analytical function doesn't expect any argument. Rank() itself enough
 to sequence based on the window you have defined in partition by. So

 Rank() over (partition by cmscustid  order by orderdate)

 Should work as long as I have wrote right syntax for hive.

 Sent via Rocket from my HTC

 - Reply message -
 From: j.barrett Strausser j.barrett.straus...@gmail.com
 To: user@hive.apache.org
 Subject: Semantics of Rank.
 Date: Thu, Jul 25, 2013 1:08 AM


 Thanks for the reply. Perhaps my misunderstanding of the relation
 between
 rank and the windowing function is wrong.

 What I want to achieve for the following is : For a given customer id,
 sort his orders. I thought the below would work.

 SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM order_data
 eh
 window w as (partition by cmscustid  order by orderdate);

 The rank function instead returns the rank of the order date over all
 all
 order dates.

 Example snippet from above

 Actual :

 675878327APR201294
 675878323JUN201295
 675878514DEC201296
 675879518DEC201197
 675879606MAY201298
 675879824MAR201399
 675879923NOV2012100


 Expected :

 675878327APR20121
 675878323JUN20122
 675878514DEC20121
 675879518DEC20111
 675879606MAY20121
 675879824MAR20131
 675879923NOV20121


 -b




 On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman 
 sglix...@outbrain.comwrote:

  the argument to rank is simply some value, whereas the rank function
  

Re: Hive JDBC Server: java.lang.IllegalStateException: Shutdown in progress

2013-09-03 Thread Guy Doulberg

Hi Nitin,

We are using Cdh4.2.1 for the Hadoop and for the hive,

I think that I understand the problem, when the hive process is 
stopping  the filesystem is closed before some of the threads.


I still need to figure out why the hive-server is restarting



On 09/02/2013 02:56 PM, Nitin Pawar wrote:

Can you share what version of hadoop and hive are you using?

This looks similar to HDFS-4841 
https://issues.apache.org/jira/browse/HDFS-4841



On Mon, Sep 2, 2013 at 4:20 PM, Guy Doulberg guy.doulb...@conduit.com 
mailto:guy.doulb...@conduit.com wrote:


Hi guys,

I have a hive JDBC server in production,
It started lately to fail.

In the log files I can see the following:

2013-09-02_10:42:53.13215 java.lang.IllegalStateException:
Shutdown in progress, cannot add a shutdownHook
2013-09-02_10:42:53.13215   at

org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:152)
2013-09-02_10:42:53.13216   at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2341)
2013-09-02_10:42:53.13216   at
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2313)
2013-09-02_10:42:53.13217   at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
2013-09-02_10:42:53.13217   at
org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
2013-09-02_10:42:53.13219   at
org.apache.hadoop.hive.ql.exec.Utilities.realFile(Utilities.java:1027)
2013-09-02_10:42:53.13219   at

org.apache.hadoop.hive.ql.exec.Utilities.getResourceFiles(Utilities.java:1551)
2013-09-02_10:42:53.13220   at
org.apache.hadoop.hive.ql.exec.ExecDriver.initialize(ExecDriver.java:152)
2013-09-02_10:42:53.13220   at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343)
2013-09-02_10:42:53.13221   at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)
2013-09-02_10:42:53.13221   at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
2013-09-02_10:42:53.13222   at

org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
2013-09-02_10:42:53.13224   at

org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644)
2013-09-02_10:42:53.13224   at

org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628)
2013-09-02_10:42:53.13225   at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
2013-09-02_10:42:53.13225   at
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
2013-09-02_10:42:53.13226   at

org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
2013-09-02_10:42:53.13226   at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
2013-09-02_10:42:53.13227   at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
2013-09-02_10:42:53.13227   at
java.lang.Thread.run(Thread.java:662)
2013-09-02_10:42:53.13228
2013-09-02_10:42:53.13761 FAILED: Execution Error, return code 3
from org.apache.hadoop.hive.ql.exec.MapRedTask
2013-09-02_10:42:53.13763 FAILED: Execution Error, return code 3
from org.apache.hadoop.hive.ql.exec.MapRedTask


Has some one encoutered this problem, and know why it is happening?

The jmx doesn't expose anything interesting.


Guy




--
Nitin Pawar




Re: Hive with Kerberos and a Remote Metastore

2013-09-03 Thread Subroto
I am also facing the same problem…. Any idea??

Cheers,
Subroto Sanyal
On Sep 3, 2013, at 3:04 PM, Christopher Penney wrote:

 I'm new to hive and trying to set it up in a relatively secure manner for a 
 test environment.  I want to use a remote metastore so MR jobs can access the 
 DB.  I seem to have things almost working, but when a user with a credential 
 tries to create a database I get:
 
 hive show databases;
 OK
 default
 hive create database testdb;
 FAILED: Error in metadata: MetaException(message:Got exception: 
 org.apache.hadoop.ipc.RemoteException User: 
 hdfs/hadoopserver.sub.dom@sub.dom.com is not allowed to impersonate 
 myuse...@sub.dom.com)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 
 I have hive --service metastore running as hdfs with 
 hdfs/hadoopserver.sub.dom@sub.dom.com as the principal.  I'm running hive 
 as myuserid on the same box.  I don't know if it's related, but if I try to 
 run hive from another system I get a GSS Initiate error unless I use the same 
 principal (hdfs/hadoopserver.sub.dom@sub.dom.com) for 
 hive.metastore.kerberos.principal.  Is that expected?
 
 When I try googling this I see similar issues, but the message about not 
 being able to impersonate only shows the single part user name where for me 
 it's showing the realm.  I tried playing with the auth_to_local property, but 
 it didn't help.  Map Reduce and HDFS operations are working fine otherwise.
 
 In core-site.xml I have:
 
 property
   namehadoop.proxyuser.hdfs.hosts/name
   value*/value
 /property
 
 property
   namehadoop.proxyuser.hdfs.groups/name
   value*/value
 /property
 
 In hive-site.xml I have:
 
 property
   namejavax.jdo.option.ConnectionURL/name
   valuejdbc:mysql://localhost/metastore/value
   descriptionthe URL of the MySQL database/description
 /property
 
 property
   namejavax.jdo.option.ConnectionDriverName/name
   valuecom.mysql.jdbc.Driver/value
 /property
 
 property
   namejavax.jdo.option.ConnectionUserName/name
   valuehive/value
 /property
 
 property
   namejavax.jdo.option.ConnectionPassword/name
   valuepassword/value
 /property
 
 property
   namedatanucleus.autoCreateSchema/name
   valuefalse/value
 /property
 
 property
   namedatanucleus.fixedDatastore/name
   valuetrue/value
 /property
 
 property
   namehive.metastore.uris/name
   valuethrift://hadoopserver.sub.dom.com:9083/value
 /property
 
 property
   namehive.security.authorization.enabled/name
   valuetrue/value
 /property
 
 property
   namehive.metastore.sasl.enabled/name
   valuetrue/value
 /property
 
 property
   namehive.metastore.kerberos.keytab.file/name
   value/etc/hadoop/hdfs.keytab/value
 /property
 
 property
   namehive.metastore.kerberos.principal/name
   valuehdfs/hadoopserver.sub.dom@sub.dom.com/value
 /property
 
 property
 namehive.metastore.execute.setugi/name
 valuetrue/value
 /property
 
 Any ideas?
 



Re: Hive Statistics information

2013-09-03 Thread Sanjay Subramanian
Thanks Ravi let me give this a shot
Regards
sanjay

From: Ravi Kiran maghamraviki...@gmail.commailto:maghamraviki...@gmail.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Friday, August 30, 2013 10:53 PM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Hive Statistics information

Hi Sanjay,

   What do the logs say when you fire the ANALYZE TABLE...   statement on a 
table ?
   One minor correction to the db connectionstring would be to use amp; for 
the query parameters.
hive.stats.dbconnectionstring=jdbc:mysql://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1amp;password=hive_user_vso1http://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1

I hope the database 
hive_vso1_tempstatsstorehttp://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1
 exists in your MySQL?

Regards
Ravi Magham


On Sat, Aug 31, 2013 at 6:15 AM, Sanjay Subramanian 
sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com
 wrote:
Hi guys

I have configured Hive to use MySQL for all statistics

hive.stats.atomic=false
hive.stats.autogather=true
hive.stats.collect.rawdatasize=true
hive.stats.dbclass=jdbc:mysql
hive.stats.dbconnectionstring=jdbc:mysql://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1http://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1
hive.stats.jdbc.timeout=30
hive.stats.jdbcdriver=com.mysql.jdbc.Driver
hive.stats.retries.max=0
hive.stats.retries.wait=3000

However in the MYSQL hive statistics tables , they don't seem to have any data ?

Where does Hive store the statistics information ?

sanjay

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: Hive Query - Issue

2013-09-03 Thread Sanjay Subramanian
Hi

When you do a SELECT * ,  the partition columns are returned as last N columns  
(if u have N partitions)

In this case the 63rd column in SELECT * is the partition column

Instead of SELECT *
Do a
SELECT
 col1,
 col2,
 col3,
…..


Not to show the candle to the sun if u r a AWK/SED ninja :-) but to get all 
column from hive u can do this

hive -e describe ur_table_name | awk '{print $1,}'|sed '1i SELECT'|less

Thanks

sanjay
From: Manickam P manicka...@outlook.commailto:manicka...@outlook.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Monday, September 2, 2013 4:32 AM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Hive Query - Issue

Hello Experts,

when i try to execute the below query i'm getting error. Please help me to 
correct this.

insert overwrite table table_baseline partition (sourcedate='base_2013_08') 
select * from (select * from table_a where sourcedate='tablea_2013_08' union 
all select * from table_b where sourcedate='tableb_2013_08') final

My intention here is i want to populate the table_baseline by using the all 
records from table_a and table_b with partition. I am getting the below error.

Error in semantic analysis: Line 1:23 Cannot insert into target table because 
column number/types are different ''BASE_2013_08'': Table insclause-0 has 62 
columns, but query has 63 columns.

I verified the column count and types everything is same but here it says some 
difference. The same query works fine without having any partitions in all the 
three tables but getting error while executing with partitions.


please help.



Thanks
Manickam P

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: Problems with 0.11, count(DISTINCT), and NPE

2013-09-03 Thread Ashutosh Chauhan
Fix in very related area has been checked in trunk today :
https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your
issue.
Can you try latest trunk?

Ashutosh


On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.comwrote:

 I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have
 been getting NullPointerExceptions (NPE) for certain queries in our staging
 environment.  Only difference between stage and production is the amount of
 traffic we get so the data set is much smaller.  We are not using any
 custom code.

 I have greatly simplified the query down to the bare minimum that will
 cause the error:

 SELECT
 count(DISTINCT ag.adGroupGuid) as groups,
 count(DISTINCT av.adViewGuid) as ads,
 count(DISTINCT ac.adViewGuid) as uniqueClicks
 FROM
 adgroup ag
 INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
 LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid

 This will return the following before any Map Reduce jobs start:

 FAILED: NullPointerException null

 Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I
 see this error:

 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
 length: 94324 file count: 20 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
 142609 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
 length: 65519 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
 205096 file count: 20 directory count: 1
 2013-09-03 18:09:19,800 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
 optimization is applicable
 2013-09-03 18:09:19,801 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
 scans
 2013-09-03 18:09:19,801 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
 optimization is applicable
 2013-09-03 18:09:19,801 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
 scans
 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
 (SessionState.java:printError(386)) - FAILED: NullPointerException null
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
  at
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
 at
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
  at
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
 at
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
  at
 org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
  at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
 at
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

 The same error also happens if I do an INNER JOIN to 

Re: Problems with 0.11, count(DISTINCT), and NPE

2013-09-03 Thread Nathanial Thelen
Is there a way to run a patch on EMR?

Thanks,
Nate

On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan hashut...@apache.org wrote:

 Fix in very related area has been checked in trunk today : 
 https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your 
 issue. 
 Can you try latest trunk?
 
 Ashutosh
 
 
 On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.com wrote:
 I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been 
 getting NullPointerExceptions (NPE) for certain queries in our staging 
 environment.  Only difference between stage and production is the amount of 
 traffic we get so the data set is much smaller.  We are not using any custom 
 code.
 
 I have greatly simplified the query down to the bare minimum that will cause 
 the error:
 
 SELECT
 count(DISTINCT ag.adGroupGuid) as groups,
 count(DISTINCT av.adViewGuid) as ads,
 count(DISTINCT ac.adViewGuid) as uniqueClicks
 FROM
 adgroup ag
 INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
 LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
 
 This will return the following before any Map Reduce jobs start:
 
 FAILED: NullPointerException null
 
 Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I 
 see this error:
 
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 
 94324 file count: 20 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 
 142609 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 
 65519 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities 
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for 
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 
 205096 file count: 20 directory count: 1
 2013-09-03 18:09:19,800 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
 optimization is applicable
 2013-09-03 18:09:19,801 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans
 2013-09-03 18:09:19,801 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where 
 optimization is applicable
 2013-09-03 18:09:19,801 INFO  
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer 
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans
 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver 
 (SessionState.java:printError(386)) - FAILED: NullPointerException null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
   at 
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
   at 
 org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 

Re: Problems with 0.11, count(DISTINCT), and NPE

2013-09-03 Thread Ashutosh Chauhan
Not sure about EMR. Your best bet is to ask on EMR forums.

Thanks,
Ashutosh


On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen n...@natethelen.comwrote:

 Is there a way to run a patch on EMR?

 Thanks,
 Nate

 On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan hashut...@apache.org wrote:

 Fix in very related area has been checked in trunk today :
 https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your
 issue.
 Can you try latest trunk?

 Ashutosh


 On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.comwrote:

 I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have
 been getting NullPointerExceptions (NPE) for certain queries in our staging
 environment.  Only difference between stage and production is the amount of
 traffic we get so the data set is much smaller.  We are not using any
 custom code.

 I have greatly simplified the query down to the bare minimum that will
 cause the error:

 SELECT
 count(DISTINCT ag.adGroupGuid) as groups,
 count(DISTINCT av.adViewGuid) as ads,
 count(DISTINCT ac.adViewGuid) as uniqueClicks
 FROM
 adgroup ag
 INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
 LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid

 This will return the following before any Map Reduce jobs start:

 FAILED: NullPointerException null

 Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning,
 I see this error:

 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
 length: 94324 file count: 20 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
 142609 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
 length: 65519 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
 205096 file count: 20 directory count: 1
 2013-09-03 18:09:19,800 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
 optimization is applicable
 2013-09-03 18:09:19,801 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
 scans
 2013-09-03 18:09:19,801 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
 optimization is applicable
 2013-09-03 18:09:19,801 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
 scans
 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
 (SessionState.java:printError(386)) - FAILED: NullPointerException null
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
  at
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
 at
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
  at
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
 at
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
  at
 org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
  at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
 at
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 

Re: Problems with 0.11, count(DISTINCT), and NPE

2013-09-03 Thread Yin Huai
Based on the log, it may be also related to
https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a not
very optimized way), can you try set
hive.auto.convert.join.noconditionaltask=false; ? If you still get the
error, give set hive.auto.convert.join=false; a try (it will turn off map
join auto convert, so you will use reduce-side join).

Thanks,

Yin


On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Not sure about EMR. Your best bet is to ask on EMR forums.

 Thanks,
 Ashutosh


 On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen n...@natethelen.comwrote:

 Is there a way to run a patch on EMR?

 Thanks,
 Nate

 On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan hashut...@apache.org
 wrote:

 Fix in very related area has been checked in trunk today :
 https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix
 your issue.
 Can you try latest trunk?

 Ashutosh


 On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.comwrote:

 I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have
 been getting NullPointerExceptions (NPE) for certain queries in our staging
 environment.  Only difference between stage and production is the amount of
 traffic we get so the data set is much smaller.  We are not using any
 custom code.

 I have greatly simplified the query down to the bare minimum that will
 cause the error:

 SELECT
 count(DISTINCT ag.adGroupGuid) as groups,
 count(DISTINCT av.adViewGuid) as ads,
 count(DISTINCT ac.adViewGuid) as uniqueClicks
 FROM
 adgroup ag
 INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
 LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid

 This will return the following before any Map Reduce jobs start:

 FAILED: NullPointerException null

 Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning,
 I see this error:

 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
 length: 94324 file count: 20 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
 142609 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
 length: 65519 file count: 21 directory count: 1
 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
 (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
 s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
 205096 file count: 20 directory count: 1
 2013-09-03 18:09:19,800 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
 optimization is applicable
 2013-09-03 18:09:19,801 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
 scans
 2013-09-03 18:09:19,801 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
 optimization is applicable
 2013-09-03 18:09:19,801 INFO
  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
 (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
 scans
 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
 (SessionState.java:printError(386)) - FAILED: NullPointerException null
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
  at
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
 at
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
  at
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
 at
 org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
  at
 org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
  at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
 at
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at
 

[ANNOUNCE] New Hive Committer - Yin Huai

2013-09-03 Thread Carl Steinbach
The Apache Hive PMC has voted to make Yin Huai a committer on the Apache
Hive project.

Please join me in congratulating Yin!

Thanks.

Carl


Re: [ANNOUNCE] New Hive Committer - Yin Huai

2013-09-03 Thread Jov
congratulations!

Jov
blog: http:amutu.com/blog http://amutu.com/blog


2013/9/4 Carl Steinbach c...@apache.org

 The Apache Hive PMC has voted to make Yin Huai a committer on the Apache
 Hive project.

 Please join me in congratulating Yin!

 Thanks.

 Carl



Re: [ANNOUNCE] New Hive Committer - Yin Huai

2013-09-03 Thread Prasanth Jayachandran
Congratulations yin!!!

On Tuesday, September 3, 2013, Jov am...@amutu.com wrote:
 congratulations!

 Jov
 blog: http:amutu.com/blog http://amutu.com/blog


 2013/9/4 Carl Steinbach c...@apache.org

 The Apache Hive PMC has voted to make Yin Huai a committer on the Apache
 Hive project.

 Please join me in congratulating Yin!

 Thanks.

 Carl



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.