SV: Duplicate rows when using group by in subquery
Hello again. I have now checked out latest code from trunk and built as per instructions. However, this query: select a.Symbol, count(*) from (select Symbol, catid from cat group by Symbol, catid) a group by a.Symbol; still returns an incorrect number of rows for table: create table cat(CATID bigint, CUSTOMERID int, FILLPRICE double, FILLSIZE int, INSTRUMENTTYPE int, ORDERACTION int, ORDERSTATUS int, ORDERTYPE int, ORDID string, PRICE double, RECORDTYPE int, SIZE int, SRCORDID string, SRCREPID int, TIMESTAMP timestamp) PARTITIONED BY (SYMBOL string, REPID int) row format delimited fields terminated by ',' stored as ORC; Here is the result of EXPLAIN: hive EXPLAIN select a.Symbol, count(*) from (select Symbol, catid from cat group by Symbol, catid) a group by a.Symbol; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME cat))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL Symbol)) (TOK_SELEXPR (TOK_TABLE_OR_COL catid))) (TOK_GROUPBY (TOK_TABLE_OR_COL Symbol) (TOK_TABLE_OR_COL catid a)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) Symbol)) (TOK_SELEXPR (TOK_FUNCTIONSTAR count))) (TOK_GROUPBY (. (TOK_TABLE_OR_COL a) Symbol STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: a:cat TableScan alias: cat Select Operator expressions: expr: symbol type: string expr: catid type: bigint outputColumnNames: symbol, catid Group By Operator bucketGroup: false keys: expr: symbol type: string expr: catid type: bigint mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string expr: _col1 type: bigint sort order: ++ Map-reduce partition columns: expr: _col0 type: string expr: _col1 type: bigint tag: -1 Reduce Operator Tree: Group By Operator bucketGroup: false keys: expr: KEY._col0 type: string expr: KEY._col1 type: bigint mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string outputColumnNames: _col0 Group By Operator aggregations: expr: count() bucketGroup: false keys: expr: _col0 type: string mode: complete outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 Using set hive.optimize.reducededuplication=false; I get 2 mapreduce jobs and the correct number of rows (24). Can I verify somehow, maybe through looking in the source code, that I indeed have the correct version? Or execute a command from hive cli that shows version etc. Just built from source this morning so seems strange that the bug would still persist :(. Från: Yin Huai huaiyin@gmail.com Till: user@hive.apache.org; Mikael Öhman mikael_u...@yahoo.se Skickat: tisdag, 17 september 2013 15:30 Ämne: Re: Duplicate rows when using group by in subquery Hello Mikael, ReduceSinkDeduplication automatically kicked in because it is enabled by default. The original plan tries to shuffle your data twice. Then, ReduceSinkDeduplication finds that the original plan can be optimized to shuffle your data once. But, when picking the partitioning columns, this optimizer picked the wrong columns because of the bug. Also, you can try your query with and without ReduceSinkDeduplication (use set hive.optimize.reducededuplication=false; to turn this
Re: User accounts to execute hive queries
Thanks Nitin for the help, I would try. Thanks and Regards, Rudra On Wed, Sep 18, 2013 at 5:14 PM, Thejas Nair the...@hortonworks.com wrote: You might find my slides on this topic useful - http://www.slideshare.net/thejasmn/hive-authorization-models Also linked from last slide - https://cwiki.apache.org/confluence/display/HCATALOG/Storage+Based+Authorization On Tue, Sep 17, 2013 at 11:46 PM, Nitin Pawar nitinpawar...@gmail.com wrote: The link I gave in previous mail explains how can you user level authorizations in hive. On Mon, Sep 16, 2013 at 7:57 PM, shouvanik.hal...@accenture.com wrote: Hi Nitin, I want it secured. Yes, I would like to give specific access to specific users. E.g. “select * from” access to some and “add/modify/delete” options to some “What kind of security do you have on hdfs? “ I could not follow this question Thanks, Shouvanik From: Nitin Pawar [mailto:nitinpawar...@gmail.com] Sent: Monday, September 16, 2013 6:50 PM To: Haldar, Shouvanik Cc: user@hive.apache.org Subject: Re: User accounts to execute hive queries You will need to tell few more things. Do you want it secured? Do you distinguish users in different categories on what one particular user can do or not? What kind of security do you have on hdfs? It is definitely possible for users to run queries on their own username but then you have to take few measures as well. which user can do what action. Which user can access what location on hdfs etc For user management on hive side you can read at https://cwiki.apache.org/Hive/languagemanual-authorization.html if you do not want to go through the secure way, then add all the users to one group and then grant permissions to that group on your warehouse directory. other way if the table data is not shared then, create individual directory for each user on hdfs and give only that user access to that directory. This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com -- Nitin Pawar -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Operators and || do not work
Hello, Though the documentation https://cwiki.apache.org/Hive/languagemanual-udf.html says they are same as AND and OR, they do not even get parsed. User gets parsing when they are used. Was that intentional or is it a regression? hive select key from src where key=a || key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '|' 'key' '=' in expression specification hive select key from src where key=a key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '' 'key' '=' in expression specification Thanks Amareshwari
Hive 0.11.0 | Issue with ORC Tables
Hi All, We have setup apache hive 0.11.0 services on Hadoop cluster (apache version 0.20.203.0). Hive is showing expected results when tables are stored as TextFile. Whereas, Hive 0.11.0's new feature ORC(Optimized Row Columnar) is throwing an exception while running a select query, when we run select queries on tables stored as ORC. Stacktrace of the exception : 2013-09-19 20:33:38,095 ERROR CliDriver (SessionState.java:printError(386)) - Failed with exception java.io.IOException:com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. java.io.IOException: com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. at com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:49) at com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:754) at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:294) at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:484) at com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:438) at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:10129) at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:9993) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:300) at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.parseFrom(OrcProto.java:9970) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.init(ReaderImpl.java:193) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:56) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:168) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:432) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508) We did following steps that leads to above exception: * SET mapred.output.compression.codec= org.apache.hadoop.io.compress.SnappyCodec; * CREATE TABLE person(id INT, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS ORC tblproperties (orc.compress=Snappy); * LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE person; * Executing : SELECT * FROM person; Results : Failed with exception java.io.IOException:com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. Also, we included codec property in core-site.xml in our hadoop cluster with other configuration settings. property nameio.compression.codecs/name valueorg.apache.hadoop.io.compress.SnappyCodec/value /property Following are the new jars with their placements 1. Placed a new jar at $HIVE_HOME/lib/config-1.0.0.jar 2. Placed a new jar for metastore connection $HIVE_HOME/lib/mysql-connector-java-5.1.17-bin.jar 3. Moved jackson-core-asl-1.8.8.jar from $HIVE_HOME/lib to $HADOOP_HOME/lib 4.
Re: Hive 0.11.0 | Issue with ORC Tables
How did you create test.txt as ORC file? On Thu, Sep 19, 2013 at 5:34 PM, Savant, Keshav keshav.c.sav...@fisglobal.com wrote: Hi All, ** ** We have setup apache “hive 0.11.0” services on Hadoop cluster (apache version 0.20.203.0). Hive is showing expected results when tables are stored as * TextFile*. Whereas, Hive 0.11.0’s new feature ORC(*Optimized Row Columnar*) is throwing an exception while running a select query, when we run select queries on tables stored as “*ORC*”. Stacktrace of the exception : ** ** 2013-09-19 20:33:38,095 ERROR CliDriver (SessionState.java:printError(386)) - Failed with exception java.io.IOException:com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. java.io.IOException: com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)** ** at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)** ** at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)*** * at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)** ** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. at com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:49) at com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:754) at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:294)* *** at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:484) at com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:438) at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:10129) at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:9993) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:300) at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.parseFrom(OrcProto.java:9970) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.init(ReaderImpl.java:193)*** * at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:56) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:168) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:432) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508) ** ** We did following steps that leads to above exception: **· **SET mapred.output.compression.codec= org.apache.hadoop.io.compress.SnappyCodec; **· **CREATE TABLE person(id INT, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS ORC tblproperties (orc.compress=Snappy); **· **LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE person; **· ***Executing :* SELECT * FROM person; *Results :* Failed with exception java.io.IOException:com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. ** ** Also, we included codec property in core-site.xml
Re: Operators and || do not work
I have not tested it on historical versions, so don't know on which versions it used to work (if ever), but possibly antlr upgrade [1] may have impacted this. [1] : https://issues.apache.org/jira/browse/HIVE-2439 Ashutosh On Thu, Sep 19, 2013 at 4:52 AM, amareshwari sriramdasu amareshw...@gmail.com wrote: Hello, Though the documentation https://cwiki.apache.org/Hive/languagemanual-udf.html says they are same as AND and OR, they do not even get parsed. User gets parsing when they are used. Was that intentional or is it a regression? hive select key from src where key=a || key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '|' 'key' '=' in expression specification hive select key from src where key=a key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '' 'key' '=' in expression specification Thanks Amareshwari
Re: Hive 0.11.0 | Issue with ORC Tables
On Thu, Sep 19, 2013 at 5:04 AM, Savant, Keshav keshav.c.sav...@fisglobal.com wrote: Hi All, ** ** We have setup apache “hive 0.11.0” services on Hadoop cluster (apache version 0.20.203.0). Hive is showing expected results when tables are stored as * TextFile*. Whereas, Hive 0.11.0’s new feature ORC(*Optimized Row Columnar*) is throwing an exception while running a select query, when we run select queries on tables stored as “*ORC*”. Stacktrace of the exception : ** ** 2013-09-19 20:33:38,095 ERROR CliDriver (SessionState.java:printError(386)) - Failed with exception java.io.IOException:com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. java.io.IOException: com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)** ** at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)** ** at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)*** * at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)** ** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. at com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:49) at com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:754) at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:294)* *** at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:484) at com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:438) at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:10129) at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:9993) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:300) at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.parseFrom(OrcProto.java:9970) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.init(ReaderImpl.java:193)*** * at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:56) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:168) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:432) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508) ** ** We did following steps that leads to above exception: **· **SET mapred.output.compression.codec= org.apache.hadoop.io.compress.SnappyCodec; **· **CREATE TABLE person(id INT, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS ORC tblproperties (orc.compress=Snappy); The problem is that load data doesn't convert the file into ORC format. You need to use the following commands: CREATE TABLE person_staging (id INT, name STRING); LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE person_staging; SELECT * FROM person_staging; INSERT OVERWRITE TABLE person select * from person_staging; SELECT * FROM person; Sorry for the bad error message. I improved the ORC reader to explicitly check that the file is actually an ORC file in https://issues.apache.org/jira/browse/HIVE-4724 . **· **LOAD DATA LOCAL INPATH
Re: Export/Import Table in Hive NPE
Hi All- I have opened up a ticket for this issue: https://issues.apache.org/jira/browse/HIVE-5318 Can anyone repo to confirm its a bug with Hive and not with a configuration within my instance? THanks, Brad On Tue, Sep 17, 2013 at 2:22 PM, Brad Ruderman bruder...@radiumone.comwrote: Hi All- I am trying to export a table in Hive 0.9, then import it into Hive 0.10 staging. Essentially moving data from a production import to staging. I used the EXPORT table command, however when I try to import the table back into staging I receive the following (pulled from the hive.log file). Could anyone help to point me in what i should be looking that could cause the problem? This is a hive managed table in the source instance, where it was originally moved into hive by Sqoop. Thanks, Brad 2013-09-17 14:10:27,482 INFO parse.ParseDriver (ParseDriver.java:parse(433)) - Parsing command: IMPORT FROM 'hdfs://user/hdfs/test_table' 2013-09-17 14:10:27,482 INFO parse.ParseDriver (ParseDriver.java:parse(450)) - Parse Completed 2013-09-17 14:10:27,486 ERROR ql.Driver (SessionState.java:printError(427)) - FAILED: SemanticException Exception while processing org.apache.hadoop.hive.ql.parse.SemanticException: Exception while processing at org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:277) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:938) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: user at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:448) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:410) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2308) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351) at org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:87) ... 15 more Caused by: java.net.UnknownHostException: user ... 27 more 2013-09-17 14:10:27,487 INFO ql.Driver (PerfLogger.java:PerfLogEnd(115)) - /PERFLOG method=compile start=1379452227481 end=1379452227487 duration=6 2013-09-17 14:10:27,487 INFO ql.Driver (PerfLogger.java:PerfLogBegin(88)) - PERFLOG method=releaseLocks 2013-09-17 14:10:27,487 INFO ql.Driver (PerfLogger.java:PerfLogEnd(115)) - /PERFLOG method=releaseLocks start=1379452227487 end=1379452227487 duration=0 2013-09-17 14:10:27,487 INFO ql.Driver (PerfLogger.java:PerfLogBegin(88)) - PERFLOG method=releaseLocks 2013-09-17 14:10:27,487 INFO ql.Driver (PerfLogger.java:PerfLogEnd(115)) - /PERFLOG method=releaseLocks start=1379452227487 end=1379452227487 duration=0
Re: Duplicate rows when using group by in subquery
Maybe you were stilling using the cli which was pointing to hive 0.11 libs. After you build trunk (https://github.com/apache/hive.git), you need to use trunk-dir/build/dist as your hive home and use trunk-dir/build/dist/bin/hive to launch hive cli. You can find hive 0.13 libs in trunk-dir/build/dist/lib btw, seems trunk has an issue today. You can try hive 0.12 branch. On Thu, Sep 19, 2013 at 4:26 AM, Mikael Öhman mikael_u...@yahoo.se wrote: Hello again. I have now checked out latest code from trunk and built as per instructions. However, this query: select a.Symbol, count(*) from (select Symbol, catid from cat group by Symbol, catid) a group by a.Symbol; still returns an incorrect number of rows for table: create table cat(CATID bigint, CUSTOMERID int, FILLPRICE double, FILLSIZE int, INSTRUMENTTYPE int, ORDERACTION int, ORDERSTATUS int, ORDERTYPE int, ORDID string, PRICE double, RECORDTYPE int, SIZE int, SRCORDID string, SRCREPID int, TIMESTAMP timestamp) PARTITIONED BY (SYMBOL string, REPID int) row format delimited fields terminated by ',' stored as ORC; Here is the result of EXPLAIN: hive EXPLAIN select a.Symbol, count(*) from (select Symbol, catid from cat group by Symbol, catid) a group by a.Symbol; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME cat))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL Symbol)) (TOK_SELEXPR (TOK_TABLE_OR_COL catid))) (TOK_GROUPBY (TOK_TABLE_OR_COL Symbol) (TOK_TABLE_OR_COL catid a)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) Symbol)) (TOK_SELEXPR (TOK_FUNCTIONSTAR count))) (TOK_GROUPBY (. (TOK_TABLE_OR_COL a) Symbol STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: a:cat TableScan alias: cat Select Operator expressions: expr: symbol type: string expr: catid type: bigint outputColumnNames: symbol, catid Group By Operator bucketGroup: false keys: expr: symbol type: string expr: catid type: bigint mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string expr: _col1 type: bigint sort order: ++ Map-reduce partition columns: expr: _col0 type: string expr: _col1 type: bigint tag: -1 Reduce Operator Tree: Group By Operator bucketGroup: false keys: expr: KEY._col0 type: string expr: KEY._col1 type: bigint mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string outputColumnNames: _col0 Group By Operator aggregations: expr: count() bucketGroup: false keys: expr: _col0 type: string mode: complete outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 Using set hive.optimize.reducededuplication=false; I get 2 mapreduce jobs and the correct number of rows (24). Can I verify somehow, maybe through looking in the source code, that I indeed have the correct version? Or execute a command from hive cli that shows version etc. Just built from source this morning so seems strange that the bug would still persist :(. -- *Från:* Yin Huai huaiyin@gmail.com *Till:* user@hive.apache.org; Mikael Öhman mikael_u...@yahoo.se *Skickat:* tisdag, 17 september 2013 15:30
Re: Operators and || do not work
Hi Amareshwari/Ashutosh, Ashutosh is probably right, I doubt if this ever worked. I couldn't find a clientpositive test case which uses or ||. I also modified a unit test case in Hive9 to use instead of AND and that failed with the same error Amareshwari saw. Hive9 does not have HIVE-2439. -Thiruvel On 9/19/13 7:21 AM, Ashutosh Chauhan hashut...@apache.org wrote: I have not tested it on historical versions, so don't know on which versions it used to work (if ever), but possibly antlr upgrade [1] may have impacted this. [1] : https://issues.apache.org/jira/browse/HIVE-2439 Ashutosh On Thu, Sep 19, 2013 at 4:52 AM, amareshwari sriramdasu amareshw...@gmail.com wrote: Hello, Though the documentation https://cwiki.apache.org/Hive/languagemanual-udf.html says they are same as AND and OR, they do not even get parsed. User gets parsing when they are used. Was that intentional or is it a regression? hive select key from src where key=a || key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '|' 'key' '=' in expression specification hive select key from src where key=a key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '' 'key' '=' in expression specification Thanks Amareshwari
Re: Operators and || do not work
Yes, should not be because of HIVE-2439. Even in hive-0.7, it is not working, not sure if it worked at any version. Will create a jira to track. Thanks Amareshwari On Fri, Sep 20, 2013 at 6:03 AM, Thiruvel Thirumoolan thiru...@yahoo-inc.com wrote: Hi Amareshwari/Ashutosh, Ashutosh is probably right, I doubt if this ever worked. I couldn't find a clientpositive test case which uses or ||. I also modified a unit test case in Hive9 to use instead of AND and that failed with the same error Amareshwari saw. Hive9 does not have HIVE-2439. -Thiruvel On 9/19/13 7:21 AM, Ashutosh Chauhan hashut...@apache.org wrote: I have not tested it on historical versions, so don't know on which versions it used to work (if ever), but possibly antlr upgrade [1] may have impacted this. [1] : https://issues.apache.org/jira/browse/HIVE-2439 Ashutosh On Thu, Sep 19, 2013 at 4:52 AM, amareshwari sriramdasu amareshw...@gmail.com wrote: Hello, Though the documentation https://cwiki.apache.org/Hive/languagemanual-udf.html says they are same as AND and OR, they do not even get parsed. User gets parsing when they are used. Was that intentional or is it a regression? hive select key from src where key=a || key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '|' 'key' '=' in expression specification hive select key from src where key=a key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '' 'key' '=' in expression specification Thanks Amareshwari
Re: De-serializing Thrift Optional fields
Hi, We are creating a table by De-serializing thrift file. We end up with an extra hive column named *optionals* and of the type *struct*. This breaks the SELECT * option! How can we prevent it?