Re: Semantics of Rank.
Another email thread led me to HIVE-5038https://issues.apache.org/jira/browse/HIVE-5038(rank operator is case-sensitive and has odd semantics) -- it's resolved as invalid, but is that only for the odd semantics? Perhaps this issue is clarified in more recent emails. I'm catching up on a huge backlog. -- Lefty On Tue, Sep 3, 2013 at 4:03 AM, Lefty Leverenz leftylever...@gmail.comwrote: What's the answer -- does the rank keyword have to be lowercase? If lowercase is obligatory we need to revise the wiki, which shows all uppercase ( https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics ). In the test files it's lowercase (windowing_rank.q, ptf_negative_WhereWithRankCond.q). The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but I don't know if that means lowercase is required: @WindowFunctionDescription ( description = @Description( name = rank, value = _FUNC_(x) ), supportsWindow = false, pivotResult = true ) And what about the other keywords in the wikidoc? Same lowercase requirement? -- Lefty On Fri, Jul 26, 2013 at 5:30 PM, saurabh mpp.databa...@gmail.com wrote: Hi all, Below are some of observations based on the on-going rank function discussion. 1. I executed below mentioned queries and only the query with rank (lowercase) executed successfully, rest were throwing exceptions FAILED: SemanticException Failed to breakup Windowing invocations into Groups. - select cust_id, ord_dt, RANK() w from cust_ord window w as (partition by cust_id order by ord_dt); - select cust_id, ord_dt, Rank() w from cust_ord window w as (partition by cust_id order by ord_dt); - select cust_id, ord_dt, rank() w from cust_ord window w as (partition by cust_id order by ord_dt); It seems rank keyword is case-sensitive. Attached is the screenshot for reference. 2. I created a dummy table with the data provided in the below mail trail and achieved the expected output, using the below mentioned query. *select cust_id, ord_dt, rank() over (partition by cust_id order by ord_dt) from cust_ord;* Request all to kindly review these details and suggest if it was of any help! Thanks. On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser j.barrett.straus...@gmail.com wrote: Any further help on this, otherwise I'll file a jira. On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser j.barrett.straus...@gmail.com wrote: As an example : If I run my query above removing the arg the following is thrown. FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected. Similar issue and fix here: http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html Even if it didn't require an arg it still doesn't explain my anomalous output. On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser j.barrett.straus...@gmail.com wrote: That isn't true. If you try to run the above HIVE without an argument, it will throw an exception. I have seen other users replicate this problem as well. I can file a JIRA if someone can confirm that my query should work. On Wed, Jul 24, 2013 at 11:02 PM, manishbh...@rocketmail.com manishbh...@rocketmail.com wrote: Analytical function doesn't expect any argument. Rank() itself enough to sequence based on the window you have defined in partition by. So Rank() over (partition by cmscustid order by orderdate) Should work as long as I have wrote right syntax for hive. Sent via Rocket from my HTC - Reply message - From: j.barrett Strausser j.barrett.straus...@gmail.com To: user@hive.apache.org Subject: Semantics of Rank. Date: Thu, Jul 25, 2013 1:08 AM Thanks for the reply. Perhaps my misunderstanding of the relation between rank and the windowing function is wrong. What I want to achieve for the following is : For a given customer id, sort his orders. I thought the below would work. SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM order_data eh window w as (partition by cmscustid order by orderdate); The rank function instead returns the rank of the order date over all all order dates. Example snippet from above Actual : 675878327APR201294 675878323JUN201295 675878514DEC201296 675879518DEC201197 675879606MAY201298 675879824MAR201399 675879923NOV2012100 Expected : 675878327APR20121 675878323JUN20122 675878514DEC20121 675879518DEC20111 675879606MAY20121 675879824MAR20131 675879923NOV20121 -b On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman sglix...@outbrain.comwrote: the argument to rank is simply some value, whereas the rank function
Re: Hive JDBC Server: java.lang.IllegalStateException: Shutdown in progress
Hi Nitin, We are using Cdh4.2.1 for the Hadoop and for the hive, I think that I understand the problem, when the hive process is stopping the filesystem is closed before some of the threads. I still need to figure out why the hive-server is restarting On 09/02/2013 02:56 PM, Nitin Pawar wrote: Can you share what version of hadoop and hive are you using? This looks similar to HDFS-4841 https://issues.apache.org/jira/browse/HDFS-4841 On Mon, Sep 2, 2013 at 4:20 PM, Guy Doulberg guy.doulb...@conduit.com mailto:guy.doulb...@conduit.com wrote: Hi guys, I have a hive JDBC server in production, It started lately to fail. In the log files I can see the following: 2013-09-02_10:42:53.13215 java.lang.IllegalStateException: Shutdown in progress, cannot add a shutdownHook 2013-09-02_10:42:53.13215 at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:152) 2013-09-02_10:42:53.13216 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2341) 2013-09-02_10:42:53.13216 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2313) 2013-09-02_10:42:53.13217 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351) 2013-09-02_10:42:53.13217 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194) 2013-09-02_10:42:53.13219 at org.apache.hadoop.hive.ql.exec.Utilities.realFile(Utilities.java:1027) 2013-09-02_10:42:53.13219 at org.apache.hadoop.hive.ql.exec.Utilities.getResourceFiles(Utilities.java:1551) 2013-09-02_10:42:53.13220 at org.apache.hadoop.hive.ql.exec.ExecDriver.initialize(ExecDriver.java:152) 2013-09-02_10:42:53.13220 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343) 2013-09-02_10:42:53.13221 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138) 2013-09-02_10:42:53.13221 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951) 2013-09-02_10:42:53.13222 at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198) 2013-09-02_10:42:53.13224 at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644) 2013-09-02_10:42:53.13224 at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628) 2013-09-02_10:42:53.13225 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 2013-09-02_10:42:53.13225 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 2013-09-02_10:42:53.13226 at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) 2013-09-02_10:42:53.13226 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 2013-09-02_10:42:53.13227 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 2013-09-02_10:42:53.13227 at java.lang.Thread.run(Thread.java:662) 2013-09-02_10:42:53.13228 2013-09-02_10:42:53.13761 FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.MapRedTask 2013-09-02_10:42:53.13763 FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.MapRedTask Has some one encoutered this problem, and know why it is happening? The jmx doesn't expose anything interesting. Guy -- Nitin Pawar
Re: Hive with Kerberos and a Remote Metastore
I am also facing the same problem…. Any idea?? Cheers, Subroto Sanyal On Sep 3, 2013, at 3:04 PM, Christopher Penney wrote: I'm new to hive and trying to set it up in a relatively secure manner for a test environment. I want to use a remote metastore so MR jobs can access the DB. I seem to have things almost working, but when a user with a credential tries to create a database I get: hive show databases; OK default hive create database testdb; FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.ipc.RemoteException User: hdfs/hadoopserver.sub.dom@sub.dom.com is not allowed to impersonate myuse...@sub.dom.com) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask I have hive --service metastore running as hdfs with hdfs/hadoopserver.sub.dom@sub.dom.com as the principal. I'm running hive as myuserid on the same box. I don't know if it's related, but if I try to run hive from another system I get a GSS Initiate error unless I use the same principal (hdfs/hadoopserver.sub.dom@sub.dom.com) for hive.metastore.kerberos.principal. Is that expected? When I try googling this I see similar issues, but the message about not being able to impersonate only shows the single part user name where for me it's showing the realm. I tried playing with the auth_to_local property, but it didn't help. Map Reduce and HDFS operations are working fine otherwise. In core-site.xml I have: property namehadoop.proxyuser.hdfs.hosts/name value*/value /property property namehadoop.proxyuser.hdfs.groups/name value*/value /property In hive-site.xml I have: property namejavax.jdo.option.ConnectionURL/name valuejdbc:mysql://localhost/metastore/value descriptionthe URL of the MySQL database/description /property property namejavax.jdo.option.ConnectionDriverName/name valuecom.mysql.jdbc.Driver/value /property property namejavax.jdo.option.ConnectionUserName/name valuehive/value /property property namejavax.jdo.option.ConnectionPassword/name valuepassword/value /property property namedatanucleus.autoCreateSchema/name valuefalse/value /property property namedatanucleus.fixedDatastore/name valuetrue/value /property property namehive.metastore.uris/name valuethrift://hadoopserver.sub.dom.com:9083/value /property property namehive.security.authorization.enabled/name valuetrue/value /property property namehive.metastore.sasl.enabled/name valuetrue/value /property property namehive.metastore.kerberos.keytab.file/name value/etc/hadoop/hdfs.keytab/value /property property namehive.metastore.kerberos.principal/name valuehdfs/hadoopserver.sub.dom@sub.dom.com/value /property property namehive.metastore.execute.setugi/name valuetrue/value /property Any ideas?
Re: Hive Statistics information
Thanks Ravi let me give this a shot Regards sanjay From: Ravi Kiran maghamraviki...@gmail.commailto:maghamraviki...@gmail.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Friday, August 30, 2013 10:53 PM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: Hive Statistics information Hi Sanjay, What do the logs say when you fire the ANALYZE TABLE... statement on a table ? One minor correction to the db connectionstring would be to use amp; for the query parameters. hive.stats.dbconnectionstring=jdbc:mysql://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1amp;password=hive_user_vso1http://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1 I hope the database hive_vso1_tempstatsstorehttp://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1 exists in your MySQL? Regards Ravi Magham On Sat, Aug 31, 2013 at 6:15 AM, Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com wrote: Hi guys I have configured Hive to use MySQL for all statistics hive.stats.atomic=false hive.stats.autogather=true hive.stats.collect.rawdatasize=true hive.stats.dbclass=jdbc:mysql hive.stats.dbconnectionstring=jdbc:mysql://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1http://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1 hive.stats.jdbc.timeout=30 hive.stats.jdbcdriver=com.mysql.jdbc.Driver hive.stats.retries.max=0 hive.stats.retries.wait=3000 However in the MYSQL hive statistics tables , they don't seem to have any data ? Where does Hive store the statistics information ? sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator. CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Hive Query - Issue
Hi When you do a SELECT * , the partition columns are returned as last N columns (if u have N partitions) In this case the 63rd column in SELECT * is the partition column Instead of SELECT * Do a SELECT col1, col2, col3, ….. Not to show the candle to the sun if u r a AWK/SED ninja :-) but to get all column from hive u can do this hive -e describe ur_table_name | awk '{print $1,}'|sed '1i SELECT'|less Thanks sanjay From: Manickam P manicka...@outlook.commailto:manicka...@outlook.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Monday, September 2, 2013 4:32 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Hive Query - Issue Hello Experts, when i try to execute the below query i'm getting error. Please help me to correct this. insert overwrite table table_baseline partition (sourcedate='base_2013_08') select * from (select * from table_a where sourcedate='tablea_2013_08' union all select * from table_b where sourcedate='tableb_2013_08') final My intention here is i want to populate the table_baseline by using the all records from table_a and table_b with partition. I am getting the below error. Error in semantic analysis: Line 1:23 Cannot insert into target table because column number/types are different ''BASE_2013_08'': Table insclause-0 has 62 columns, but query has 63 columns. I verified the column count and types everything is same but here it says some difference. The same query works fine without having any partitions in all the three tables but getting error while executing with partitions. please help. Thanks Manickam P CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Problems with 0.11, count(DISTINCT), and NPE
Fix in very related area has been checked in trunk today : https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your issue. Can you try latest trunk? Ashutosh On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.comwrote: I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment. Only difference between stage and production is the amount of traffic we get so the data set is much smaller. We are not using any custom code. I have greatly simplified the query down to the bare minimum that will cause the error: SELECT count(DISTINCT ag.adGroupGuid) as groups, count(DISTINCT av.adViewGuid) as ads, count(DISTINCT ac.adViewGuid) as uniqueClicks FROM adgroup ag INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid This will return the following before any Map Reduce jobs start: FAILED: NullPointerException null Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error: 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1 2013-09-03 18:09:19,800 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:187) The same error also happens if I do an INNER JOIN to
Re: Problems with 0.11, count(DISTINCT), and NPE
Is there a way to run a patch on EMR? Thanks, Nate On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan hashut...@apache.org wrote: Fix in very related area has been checked in trunk today : https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your issue. Can you try latest trunk? Ashutosh On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.com wrote: I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment. Only difference between stage and production is the amount of traffic we get so the data set is much smaller. We are not using any custom code. I have greatly simplified the query down to the bare minimum that will cause the error: SELECT count(DISTINCT ag.adGroupGuid) as groups, count(DISTINCT av.adViewGuid) as ads, count(DISTINCT ac.adViewGuid) as uniqueClicks FROM adgroup ag INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid This will return the following before any Map Reduce jobs start: FAILED: NullPointerException null Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error: 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1 2013-09-03 18:09:19,800 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
Re: Problems with 0.11, count(DISTINCT), and NPE
Not sure about EMR. Your best bet is to ask on EMR forums. Thanks, Ashutosh On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen n...@natethelen.comwrote: Is there a way to run a patch on EMR? Thanks, Nate On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan hashut...@apache.org wrote: Fix in very related area has been checked in trunk today : https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your issue. Can you try latest trunk? Ashutosh On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.comwrote: I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment. Only difference between stage and production is the amount of traffic we get so the data set is much smaller. We are not using any custom code. I have greatly simplified the query down to the bare minimum that will cause the error: SELECT count(DISTINCT ag.adGroupGuid) as groups, count(DISTINCT av.adViewGuid) as ads, count(DISTINCT ac.adViewGuid) as uniqueClicks FROM adgroup ag INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid This will return the following before any Map Reduce jobs start: FAILED: NullPointerException null Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error: 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1 2013-09-03 18:09:19,800 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
Re: Problems with 0.11, count(DISTINCT), and NPE
Based on the log, it may be also related to https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a not very optimized way), can you try set hive.auto.convert.join.noconditionaltask=false; ? If you still get the error, give set hive.auto.convert.join=false; a try (it will turn off map join auto convert, so you will use reduce-side join). Thanks, Yin On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan hashut...@apache.orgwrote: Not sure about EMR. Your best bet is to ask on EMR forums. Thanks, Ashutosh On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen n...@natethelen.comwrote: Is there a way to run a patch on EMR? Thanks, Nate On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan hashut...@apache.org wrote: Fix in very related area has been checked in trunk today : https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your issue. Can you try latest trunk? Ashutosh On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen n...@natethelen.comwrote: I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment. Only difference between stage and production is the amount of traffic we get so the data set is much smaller. We are not using any custom code. I have greatly simplified the query down to the bare minimum that will cause the error: SELECT count(DISTINCT ag.adGroupGuid) as groups, count(DISTINCT av.adViewGuid) as ads, count(DISTINCT ac.adViewGuid) as uniqueClicks FROM adgroup ag INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid This will return the following before any Map Reduce jobs start: FAILED: NullPointerException null Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error: 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1 2013-09-03 18:09:19,800 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at
[ANNOUNCE] New Hive Committer - Yin Huai
The Apache Hive PMC has voted to make Yin Huai a committer on the Apache Hive project. Please join me in congratulating Yin! Thanks. Carl
Re: [ANNOUNCE] New Hive Committer - Yin Huai
congratulations! Jov blog: http:amutu.com/blog http://amutu.com/blog 2013/9/4 Carl Steinbach c...@apache.org The Apache Hive PMC has voted to make Yin Huai a committer on the Apache Hive project. Please join me in congratulating Yin! Thanks. Carl
Re: [ANNOUNCE] New Hive Committer - Yin Huai
Congratulations yin!!! On Tuesday, September 3, 2013, Jov am...@amutu.com wrote: congratulations! Jov blog: http:amutu.com/blog http://amutu.com/blog 2013/9/4 Carl Steinbach c...@apache.org The Apache Hive PMC has voted to make Yin Huai a committer on the Apache Hive project. Please join me in congratulating Yin! Thanks. Carl -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.