[jira] Updated: (HIVE-1430) serializing/deserializing the query plan is useless and expensive
[ https://issues.apache.org/jira/browse/HIVE-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1430: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: 0.7.0 Resolution: Fixed Fixed. Thanks Ning > serializing/deserializing the query plan is useless and expensive > - > > Key: HIVE-1430 > URL: https://issues.apache.org/jira/browse/HIVE-1430 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1430.patch > > > We should turn it off by default -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-187) ODBC driver
[ https://issues.apache.org/jira/browse/HIVE-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-187: Attachment: thrift_64.r790732.tgz Uploading thrift_64.r790732.tgz the complete 64 bit thrift libs (including libfb303.a) & binaries . These libraries are compiled under CentOS 5.2 (kernel 2.6.20, GCC 4.1.2) > ODBC driver > --- > > Key: HIVE-187 > URL: https://issues.apache.org/jira/browse/HIVE-187 > Project: Hadoop Hive > Issue Type: New Feature > Components: Clients >Affects Versions: 0.6.0 >Reporter: Raghotham Murthy >Assignee: Eric Hwang > Fix For: 0.4.0 > > Attachments: HIVE-187.1.patch, HIVE-187.2.patch, HIVE-187.3.patch, > hive-187.4.patch, thrift_64.r790732.tgz, thrift_home_linux_32.tgz, > thrift_home_linux_64.tgz, unixODBC-2.2.14-1.tgz, unixODBC-2.2.14-2.tgz, > unixODBC-2.2.14-3.tgz, unixODBC-2.2.14-hive-patched.tar.gz, > unixODBC-2.2.14.tgz, unixodbc.patch > > > We need to provide the a small number of functions to get basic query > execution and retrieval of results. This is based on the tutorial provided > here: http://www.easysoft.com/developer/languages/c/odbc_tutorial.html > > The minimum set of ODBC functions required are: > SQLAllocHandle - for environment, connection, statement > SQLSetEnvAttr > SQLDriverConnect > SQLExecDirect > SQLNumResultCols > SQLFetch > SQLGetData > SQLDisconnect > SQLFreeHandle > > If required the plan would be to do the following: > 1. generate c++ client stubs for thrift server > 2. implement the required functions in c++ by calling the c++ client > 3. make the c++ functions in (2) extern C and then use those in the odbc > SQL* functions > 4. provide a .so (in linux) which can be used by the ODBC clients. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1176: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: 0.7.0 Resolution: Fixed Committed. Thanks Arvind! > 'create if not exists' fails for a table name with 'select' in it > - > > Key: HIVE-1176 > URL: https://issues.apache.org/jira/browse/HIVE-1176 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore, Query Processor >Reporter: Prasad Chakka >Assignee: Arvind Prabhakar > Fix For: 0.6.0, 0.7.0 > > Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, > HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176-6.patch, > HIVE-1176.lib-files.tar.gz, HIVE-1176.patch > > > hive> create table if not exists tmp_select(s string, c string, n int); > org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got > exception: javax.jdo.JDOUserException JDOQL Single-String query should always > start with SELECT) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) > at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException > JDOQL Single-String query should always start with SELECT) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) > ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882048#action_12882048 ] Arvind Prabhakar commented on HIVE-1271: @Ashish: I created HIVE-1432 to track the test case creation. I will be submitting a patch for that soon. Thanks for pointing this out. > Case sensitiveness of type information specified when using custom reducer > causes type mismatch > --- > > Key: HIVE-1271 > URL: https://issues.apache.org/jira/browse/HIVE-1271 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Dilip Joseph >Assignee: Arvind Prabhakar > Fix For: 0.6.0 > > Attachments: HIVE-1271-1.patch, HIVE-1271.patch > > > Type information specified while using a custom reduce script is converted > to lower case, and causes type mismatch during query semantic analysis . The > following REDUCE query where field name = "userId" failed. > hive> CREATE TABLE SS ( >> a INT, >> b INT, >> vals ARRAY> >> ); > OK > hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s >> INSERT OVERWRITE TABLE SS >> REDUCE * >> USING 'myreduce.py' >> AS >> (a INT, >> b INT, >> vals ARRAY> >> ) >> ; > FAILED: Error in semantic analysis: line 2:27 Cannot insert into > target table because column number/types are different SS: Cannot > convert column 2 from array> to > array>. > The same query worked fine after changing "userId" to "userid". -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1432) Create a test case for case sensitive comparison done during field comparison
Create a test case for case sensitive comparison done during field comparison - Key: HIVE-1432 URL: https://issues.apache.org/jira/browse/HIVE-1432 Project: Hadoop Hive Issue Type: Task Components: Query Processor Reporter: Arvind Prabhakar Assignee: Arvind Prabhakar Fix For: 0.6.0 See HIVE-1271. This jira tracks the creation of a test case to test this fix specifically. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1431) Hive CLI can't handle query files that begin with comments
[ https://issues.apache.org/jira/browse/HIVE-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882028#action_12882028 ] John Sichi commented on HIVE-1431: -- sqlline (see my notes in HIVE-987) deals with comments correctly in a fairly simple fashion. > Hive CLI can't handle query files that begin with comments > -- > > Key: HIVE-1431 > URL: https://issues.apache.org/jira/browse/HIVE-1431 > Project: Hadoop Hive > Issue Type: Bug > Components: CLI >Reporter: Carl Steinbach > Fix For: 0.6.0, 0.7.0 > > > {code} > % cat test.q > -- This is a comment, followed by a command > set -v; > -- > -- Another comment > -- > show tables; > -- Last comment > (master) [ ~/Projects/hive ] > % hive < test.q > Hive history file=/tmp/carl/hive_job_log_carl_201006231606_1140875653.txt > hive> -- This is a comment, followed by a command > > set -v; > FAILED: Parse Error: line 2:0 cannot recognize input 'set' > hive> -- > > -- Another comment > > -- > > show tables; > OK > rawchunks > Time taken: 5.334 seconds > hive> -- Last comment > > (master) [ ~/Projects/hive ] > % > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: 6.0 and trunk look broken to me
On Wed, Jun 23, 2010 at 10:48 PM, John Sichi wrote: > Did you get past this? It looks like some kind of bad build. > > JVS > > On Jun 23, 2010, at 2:38 PM, Ashish Thusoo wrote: > >> Not sure if this is just my env but on 0.6.0 when I run the unit tests I get >> a bunch of errors of the following form: >> >> [junit] Begin query: alter3.q >> [junit] java.lang.NoSuchFieldError: HIVESESSIONSILENT >> [junit] at >> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1052) >> [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> [junit] at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> [junit] at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> [junit] at java.lang.reflect.Method.invoke(Method.java:597) >> [junit] at org.apache.hadoop.util.RunJar.main(RunJar.java:155) >> [junit] at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194) >> [junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> [junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> [junit] at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220) >> [junit] >> >> -Original Message- >> From: John Sichi [mailto:jsi...@facebook.com] >> Sent: Wednesday, June 23, 2010 2:15 PM >> To: >> Subject: Re: 6.0 and trunk look broken to me >> >> (You mean 0.6, right?) >> >> I'm not able to reproduce this (just tested with latest trunk on Linux and >> Mac). Is anyone else seeing it? >> >> JVS >> >> On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote: >> >>> Trunk and 6.0 both show this in hadoop local mode and hadoop distributed >>> mode. >>> >>> export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca >>> edw...@ec dist]$ export >>> HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$ >>> bin/hive Hive history >>> file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt >>> hive> show tables; >>> FAILED: Parse Error: line 0:-1 cannot recognize input '' >>> >>> [edw...@ec dist]$ more /tmp/edward/hive.log >>> 2010-06-23 16:41:00,749 ERROR ql.Driver >>> (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1 >>> cannot recognize input '' >>> >>> org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot >>> recognize input '' >>> >>> at >>> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401) >>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299) >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379) >>> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) >>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) >>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >> > > I do not know what is up. Cleaned up my .ivy2 checked out and the build again. I guess if no one else is seeing it, it must be something on my system. Total time: 3 minutes 7 seconds [edw...@ec hive_6_pre]$ cd build/dist/ [edw...@ec dist]$ cd ../hive-trunk/^C [edw...@ec dist]$ ls bin conf examples lib README.txt [edw...@ec dist]$ bin/hive Hive history file=/tmp/edward/hive_job_log_edward_201006232341_41029014.txt hive> show tables; FAILED: Parse Error: line 0:-1 cannot recognize input '' hive> exit; [edw...@ec dist]$ ant -v Apache Ant version 1.8.0 compiled on February 1 2010 Trying the default build file: build.xml Buildfile: build.xml does not exist! Build failed [edw...@ec dist]$ java -v Unrecognized option: -v Could not create the Java virtual machine. [edw...@ec dist]$ java -version java version "1.6.0_18" Java(TM) SE Runtime Environment (build 1.6.0_18-b07) Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
[jira] Commented: (HIVE-1431) Hive CLI can't handle query files that begin with comments
[ https://issues.apache.org/jira/browse/HIVE-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882024#action_12882024 ] Edward Capriolo commented on HIVE-1431: --- We have a few tickets open, we really need to move all this stuff to a real parser so we can properly deal with things like ';' or comments like this or whatever. It is painfully hard to work around all these type of things and we never get to the root of the problem. > Hive CLI can't handle query files that begin with comments > -- > > Key: HIVE-1431 > URL: https://issues.apache.org/jira/browse/HIVE-1431 > Project: Hadoop Hive > Issue Type: Bug > Components: CLI >Reporter: Carl Steinbach > Fix For: 0.6.0, 0.7.0 > > > {code} > % cat test.q > -- This is a comment, followed by a command > set -v; > -- > -- Another comment > -- > show tables; > -- Last comment > (master) [ ~/Projects/hive ] > % hive < test.q > Hive history file=/tmp/carl/hive_job_log_carl_201006231606_1140875653.txt > hive> -- This is a comment, followed by a command > > set -v; > FAILED: Parse Error: line 2:0 cannot recognize input 'set' > hive> -- > > -- Another comment > > -- > > show tables; > OK > rawchunks > Time taken: 5.334 seconds > hive> -- Last comment > > (master) [ ~/Projects/hive ] > % > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1342) Predicate push down get error result when sub-queries have the same alias name
[ https://issues.apache.org/jira/browse/HIVE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Xu updated HIVE-1342: - Status: Patch Available (was: Open) Affects Version/s: 0.6.0 (was: 0.5.0) (was: 0.4.2) Fix Version/s: 0.6.0 > Predicate push down get error result when sub-queries have the same alias > name > --- > > Key: HIVE-1342 > URL: https://issues.apache.org/jira/browse/HIVE-1342 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Ted Xu >Priority: Critical > Fix For: 0.6.0 > > Attachments: cmd.hql, explain, ppd_same_alias_1.patch > > > Query is over-optimized by PPD when sub-queries have the same alias name, see > the query: > --- > create table if not exists dm_fact_buyer_prd_info_d ( > category_id string > ,gmv_trade_num int > ,user_idint > ) > PARTITIONED BY (ds int); > set hive.optimize.ppd=true; > set hive.map.aggr=true; > explain select category_id1,category_id2,assoc_idx > from ( > select > category_id1 > , category_id2 > , count(distinct user_id) as assoc_idx > from ( > select > t1.category_id as category_id1 > , t2.category_id as category_id2 > , t1.user_id > from ( > select category_id, user_id > from dm_fact_buyer_prd_info_d > group by category_id, user_id ) t1 > join ( > select category_id, user_id > from dm_fact_buyer_prd_info_d > group by category_id, user_id ) t2 on > t1.user_id=t2.user_id > ) t1 > group by category_id1, category_id2 ) t_o > where category_id1 <> category_id2 > and assoc_idx > 2; > - > The query above will fail when execute, throwing exception: "can not cast > UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text)". > I explained the query and the execute plan looks really wired ( only Stage-1, > see the highlighted predicate): > --- > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > t_o:t1:t1:dm_fact_buyer_prd_info_d > TableScan > alias: dm_fact_buyer_prd_info_d > Filter Operator > predicate: > expr: *(category_id <> user_id)* > type: boolean > Select Operator > expressions: > expr: category_id > type: string > expr: user_id > type: bigint > outputColumnNames: category_id, user_id > Group By Operator > keys: > expr: category_id > type: string > expr: user_id > type: bigint > mode: hash > outputColumnNames: _col0, _col1 > Reduce Output Operator > key expressions: > expr: _col0 > type: string > expr: _col1 > type: bigint > sort order: ++ > Map-reduce partition columns: > expr: _col0 > type: string > expr: _col1 > type: bigint > tag: -1 > Reduce Operator Tree: > Group By Operator > keys: > expr: KEY._col0 > type: string > expr: KEY._col1 > type: bigint > mode: mergepartial > outputColumnNames: _col0, _col1 > Select Operator > expressions: > expr: _col0 > type: string > expr: _col1 > type: bigint > outputColumnNames: _col0, _col1 > File Output Operator > compressed: true > GlobalTableId: 0 > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequ
Review Request: Hive Variables
--- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/229/ --- Review request for Hive Developers. Summary --- Hive Variables Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 955109 trunk/conf/hive-default.xml 955109 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 955109 trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java 955109 trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q PRE-CREATION trunk/ql/src/test/results/clientpositive/set_processor_namespaces.q.out PRE-CREATION Diff: http://review.hbase.org/r/229/diff Testing --- Thanks, Edward
[jira] Updated: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1096: -- Attachment: hive-1096-11-patch.txt Was not interpolating system:vars. Fixed with better test case. > Hive Variables > -- > > Key: HIVE-1096 > URL: https://issues.apache.org/jira/browse/HIVE-1096 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Fix For: 0.6.0, 0.7.0 > > Attachments: 1096-9.diff, hive-1096-10-patch.txt, > hive-1096-11-patch.txt, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, > hive-1096.diff > > > From mailing list: > --Amazon Elastic MapReduce version of Hive seems to have a nice feature > called "Variables." Basically you can define a variable via command-line > while invoking hive with -d DT=2009-12-09 and then refer to the variable via > ${DT} within the hive queries. This could be extremely useful. I can't seem > to find this feature even on trunk. Is this feature currently anywhere in the > roadmap?-- > This could be implemented in many places. > A simple place to put this is > in Driver.compile or Driver.run we can do string substitutions at that level, > and further downstream need not be effected. > There could be some benefits to doing this further downstream, parser,plan. > but based on the simple needs we may not need to overthink this. > I will get started on implementing in compile unless someone wants to discuss > this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: 6.0 and trunk look broken to me
Did you get past this? It looks like some kind of bad build. JVS On Jun 23, 2010, at 2:38 PM, Ashish Thusoo wrote: > Not sure if this is just my env but on 0.6.0 when I run the unit tests I get > a bunch of errors of the following form: > >[junit] Begin query: alter3.q >[junit] java.lang.NoSuchFieldError: HIVESESSIONSILENT >[junit] at > org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1052) >[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >[junit] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >[junit] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >[junit] at java.lang.reflect.Method.invoke(Method.java:597) >[junit] at org.apache.hadoop.util.RunJar.main(RunJar.java:155) >[junit] at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194) >[junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >[junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >[junit] at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220) >[junit] > > -Original Message- > From: John Sichi [mailto:jsi...@facebook.com] > Sent: Wednesday, June 23, 2010 2:15 PM > To: > Subject: Re: 6.0 and trunk look broken to me > > (You mean 0.6, right?) > > I'm not able to reproduce this (just tested with latest trunk on Linux and > Mac). Is anyone else seeing it? > > JVS > > On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote: > >> Trunk and 6.0 both show this in hadoop local mode and hadoop distributed >> mode. >> >> export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca >> edw...@ec dist]$ export >> HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$ >> bin/hive Hive history >> file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt >> hive> show tables; >> FAILED: Parse Error: line 0:-1 cannot recognize input '' >> >> [edw...@ec dist]$ more /tmp/edward/hive.log >> 2010-06-23 16:41:00,749 ERROR ql.Driver >> (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1 >> cannot recognize input '' >> >> org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot >> recognize input '' >> >> at >> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401) >> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299) >> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379) >> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) >> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) >> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >
[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881998#action_12881998 ] Ashish Thusoo commented on HIVE-1271: - I have committed this to trunk and will commit to 0.6.0 soon. One thing I did overlook though. We should add a test case for this. Can you do that as part of another JIRA as this one is already partially committed. Thanks, Ashish > Case sensitiveness of type information specified when using custom reducer > causes type mismatch > --- > > Key: HIVE-1271 > URL: https://issues.apache.org/jira/browse/HIVE-1271 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Dilip Joseph >Assignee: Arvind Prabhakar > Fix For: 0.6.0 > > Attachments: HIVE-1271-1.patch, HIVE-1271.patch > > > Type information specified while using a custom reduce script is converted > to lower case, and causes type mismatch during query semantic analysis . The > following REDUCE query where field name = "userId" failed. > hive> CREATE TABLE SS ( >> a INT, >> b INT, >> vals ARRAY> >> ); > OK > hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s >> INSERT OVERWRITE TABLE SS >> REDUCE * >> USING 'myreduce.py' >> AS >> (a INT, >> b INT, >> vals ARRAY> >> ) >> ; > FAILED: Error in semantic analysis: line 2:27 Cannot insert into > target table because column number/types are different SS: Cannot > convert column 2 from array> to > array>. > The same query worked fine after changing "userId" to "userid". -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881967#action_12881967 ] John Sichi commented on HIVE-1176: -- +1. Will commit when tests pass. > 'create if not exists' fails for a table name with 'select' in it > - > > Key: HIVE-1176 > URL: https://issues.apache.org/jira/browse/HIVE-1176 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore, Query Processor >Reporter: Prasad Chakka >Assignee: Arvind Prabhakar > Fix For: 0.6.0 > > Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, > HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176-6.patch, > HIVE-1176.lib-files.tar.gz, HIVE-1176.patch > > > hive> create table if not exists tmp_select(s string, c string, n int); > org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got > exception: javax.jdo.JDOUserException JDOQL Single-String query should always > start with SELECT) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) > at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException > JDOQL Single-String query should always start with SELECT) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) > ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Attachment: HIVE-1307.0.patch Uploading a preliminary patch. This is not ready for review yet. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.6.0 > > Attachments: HIVE-1307.0.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arvind Prabhakar updated HIVE-1176: --- Attachment: HIVE-1176-6.patch > 'create if not exists' fails for a table name with 'select' in it > - > > Key: HIVE-1176 > URL: https://issues.apache.org/jira/browse/HIVE-1176 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore, Query Processor >Reporter: Prasad Chakka >Assignee: Arvind Prabhakar > Fix For: 0.6.0 > > Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, > HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176-6.patch, > HIVE-1176.lib-files.tar.gz, HIVE-1176.patch > > > hive> create table if not exists tmp_select(s string, c string, n int); > org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got > exception: javax.jdo.JDOUserException JDOQL Single-String query should always > start with SELECT) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) > at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException > JDOQL Single-String query should always start with SELECT) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) > ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881956#action_12881956 ] Arvind Prabhakar commented on HIVE-1176: yes - thats what my intention was. Thanks for catching it. > 'create if not exists' fails for a table name with 'select' in it > - > > Key: HIVE-1176 > URL: https://issues.apache.org/jira/browse/HIVE-1176 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore, Query Processor >Reporter: Prasad Chakka >Assignee: Arvind Prabhakar > Fix For: 0.6.0 > > Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, > HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176-6.patch, > HIVE-1176.lib-files.tar.gz, HIVE-1176.patch > > > hive> create table if not exists tmp_select(s string, c string, n int); > org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got > exception: javax.jdo.JDOUserException JDOQL Single-String query should always > start with SELECT) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) > at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException > JDOQL Single-String query should always start with SELECT) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) > ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1096: - Fix Version/s: (was: 0.5.1) Affects Version/s: (was: 0.5.0) > Hive Variables > -- > > Key: HIVE-1096 > URL: https://issues.apache.org/jira/browse/HIVE-1096 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Fix For: 0.6.0, 0.7.0 > > Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, > hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff > > > From mailing list: > --Amazon Elastic MapReduce version of Hive seems to have a nice feature > called "Variables." Basically you can define a variable via command-line > while invoking hive with -d DT=2009-12-09 and then refer to the variable via > ${DT} within the hive queries. This could be extremely useful. I can't seem > to find this feature even on trunk. Is this feature currently anywhere in the > roadmap?-- > This could be implemented in many places. > A simple place to put this is > in Driver.compile or Driver.run we can do string substitutions at that level, > and further downstream need not be effected. > There could be some benefits to doing this further downstream, parser,plan. > but based on the simple needs we may not need to overthink this. > I will get started on implementing in compile unless someone wants to discuss > this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1387) Make PERCENTILE work with double data type
[ https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Lahiri updated HIVE-1387: Attachment: HIVE-1387.2.patch median_approx_quality.png I've attached HIVE-1387.2.patch, which does the following: (1) Creates a percentile_approx() UDAF which uses the histogram_numeric() UDAF to estimate quantiles from a histogram. The syntax matches the existing percentile() UDAF, and extends it with a third parameter that specifies the number of histogram bins to use (and thus, the accuracy of quantile estimation): SELECT percentile_approx(val, 0.5) FROM random;// estimates the median SELECT percentile_approx(val, array(0.5, 0.95, 0.98)) FROM random; // estimates 3 quantiles SELECT percentile_approx(val, 0.5, 1000) FROM random; // estimates the median using 1,000 histogram bins instead of the default of 10,000. (2) I've left the existing percentile() UDAF as it is for the following reasons: when the number of unique values in a column is relatively small, percentile_approx() will return an exact result. When the number of unique values in a column is very large (as one might expect with double), then percentile() will run out of memory and crash, so there's really no need to modify the existing percentile() to support doubles. (3) The accuracy of quantile estimation seems to be pretty good. Attached a graph showing approximation quality for the median using different histogram sizes for random datasets of 100,000 numbers. The default number of histogram bins is 10,000, which appears to work quite well. (4) This patch also refactors the histogram_numeric() class to put all the generic histogram functionality into a re-usable inner class. > Make PERCENTILE work with double data type > -- > > Key: HIVE-1387 > URL: https://issues.apache.org/jira/browse/HIVE-1387 > Project: Hadoop Hive > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Vaibhav Aggarwal >Assignee: Mayank Lahiri > Fix For: 0.6.0 > > Attachments: HIVE-1387.2.patch, median_approx_quality.png, > patch-1387-1.patch > > > The PERCENTILE UDAF does not work with double datatype. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1096: - Fix Version/s: 0.5.1 0.7.0 > Hive Variables > -- > > Key: HIVE-1096 > URL: https://issues.apache.org/jira/browse/HIVE-1096 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Fix For: 0.5.1, 0.6.0, 0.7.0 > > Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, > hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff > > > From mailing list: > --Amazon Elastic MapReduce version of Hive seems to have a nice feature > called "Variables." Basically you can define a variable via command-line > while invoking hive with -d DT=2009-12-09 and then refer to the variable via > ${DT} within the hive queries. This could be extremely useful. I can't seem > to find this feature even on trunk. Is this feature currently anywhere in the > roadmap?-- > This could be implemented in many places. > A simple place to put this is > in Driver.compile or Driver.run we can do string substitutions at that level, > and further downstream need not be effected. > There could be some benefits to doing this further downstream, parser,plan. > but based on the simple needs we may not need to overthink this. > I will get started on implementing in compile unless someone wants to discuss > this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1387) Make PERCENTILE work with double data type
[ https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Lahiri updated HIVE-1387: Status: Patch Available (was: Open) Affects Version/s: 0.6.0 Fix Version/s: 0.6.0 > Make PERCENTILE work with double data type > -- > > Key: HIVE-1387 > URL: https://issues.apache.org/jira/browse/HIVE-1387 > Project: Hadoop Hive > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Vaibhav Aggarwal >Assignee: Mayank Lahiri > Fix For: 0.6.0 > > Attachments: HIVE-1387.2.patch, median_approx_quality.png, > patch-1387-1.patch > > > The PERCENTILE UDAF does not work with double datatype. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881955#action_12881955 ] Carl Steinbach commented on HIVE-1096: -- Hi Ed, can you please post this patch to review.hbase.org? Thanks! > Hive Variables > -- > > Key: HIVE-1096 > URL: https://issues.apache.org/jira/browse/HIVE-1096 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Fix For: 0.5.1, 0.6.0, 0.7.0 > > Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, > hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff > > > From mailing list: > --Amazon Elastic MapReduce version of Hive seems to have a nice feature > called "Variables." Basically you can define a variable via command-line > while invoking hive with -d DT=2009-12-09 and then refer to the variable via > ${DT} within the hive queries. This could be extremely useful. I can't seem > to find this feature even on trunk. Is this feature currently anywhere in the > roadmap?-- > This could be implemented in many places. > A simple place to put this is > in Driver.compile or Driver.run we can do string substitutions at that level, > and further downstream need not be effected. > There could be some benefits to doing this further downstream, parser,plan. > but based on the simple needs we may not need to overthink this. > I will get started on implementing in compile unless someone wants to discuss > this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881950#action_12881950 ] John Sichi commented on HIVE-1176: -- Thanks for the doc Arvind. But for the patch: we need the ORDER BY on the SELECT that produces results in the output log (not the INSERT). > 'create if not exists' fails for a table name with 'select' in it > - > > Key: HIVE-1176 > URL: https://issues.apache.org/jira/browse/HIVE-1176 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore, Query Processor >Reporter: Prasad Chakka >Assignee: Arvind Prabhakar > Fix For: 0.6.0 > > Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, > HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176.lib-files.tar.gz, > HIVE-1176.patch > > > hive> create table if not exists tmp_select(s string, c string, n int); > org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got > exception: javax.jdo.JDOUserException JDOQL Single-String query should always > start with SELECT) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) > at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException > JDOQL Single-String query should always start with SELECT) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) > ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881946#action_12881946 ] Arvind Prabhakar commented on HIVE-1176: @John: done. Please see the new patch attachment - HIVE-1176-5.patch Since a lot of good points came out of the discussion on this jira, I took the liberty of adding them to the Hive wiki for posterity. You can find it [here|http://wiki.apache.org/hadoop/Hive/TipsForAddingNewTests]. Please add to it any other points that you feel contributors should take into consideration while adding new tests. > 'create if not exists' fails for a table name with 'select' in it > - > > Key: HIVE-1176 > URL: https://issues.apache.org/jira/browse/HIVE-1176 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore, Query Processor >Reporter: Prasad Chakka >Assignee: Arvind Prabhakar > Fix For: 0.6.0 > > Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, > HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176.lib-files.tar.gz, > HIVE-1176.patch > > > hive> create table if not exists tmp_select(s string, c string, n int); > org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got > exception: javax.jdo.JDOUserException JDOQL Single-String query should always > start with SELECT) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) > at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException > JDOQL Single-String query should always start with SELECT) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) > ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arvind Prabhakar updated HIVE-1176: --- Attachment: HIVE-1176-5.patch > 'create if not exists' fails for a table name with 'select' in it > - > > Key: HIVE-1176 > URL: https://issues.apache.org/jira/browse/HIVE-1176 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore, Query Processor >Reporter: Prasad Chakka >Assignee: Arvind Prabhakar > Fix For: 0.6.0 > > Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, > HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176.lib-files.tar.gz, > HIVE-1176.patch > > > hive> create table if not exists tmp_select(s string, c string, n int); > org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got > exception: javax.jdo.JDOUserException JDOQL Single-String query should always > start with SELECT) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) > at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException > JDOQL Single-String query should always start with SELECT) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) > ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1431) Hive CLI can't handle query files that begin with comments
Hive CLI can't handle query files that begin with comments -- Key: HIVE-1431 URL: https://issues.apache.org/jira/browse/HIVE-1431 Project: Hadoop Hive Issue Type: Bug Components: CLI Reporter: Carl Steinbach Fix For: 0.6.0, 0.7.0 {code} % cat test.q -- This is a comment, followed by a command set -v; -- -- Another comment -- show tables; -- Last comment (master) [ ~/Projects/hive ] % hive < test.q Hive history file=/tmp/carl/hive_job_log_carl_201006231606_1140875653.txt hive> -- This is a comment, followed by a command > set -v; FAILED: Parse Error: line 2:0 cannot recognize input 'set' hive> -- > -- Another comment > -- > show tables; OK rawchunks Time taken: 5.334 seconds hive> -- Last comment > (master) [ ~/Projects/hive ] % {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1430) serializing/deserializing the query plan is useless and expensive
[ https://issues.apache.org/jira/browse/HIVE-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881942#action_12881942 ] Namit Jain commented on HIVE-1430: -- +1 > serializing/deserializing the query plan is useless and expensive > - > > Key: HIVE-1430 > URL: https://issues.apache.org/jira/browse/HIVE-1430 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Ning Zhang > Attachments: HIVE-1430.patch > > > We should turn it off by default -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it
[ https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881936#action_12881936 ] John Sichi commented on HIVE-1176: -- Just one more change needed...please add an ORDER BY to the select in the testcase. This is required to avoid spurious diffs later since without ORDER BY, the query result order is non-deterministic. After that I'll run through tests and commit. > 'create if not exists' fails for a table name with 'select' in it > - > > Key: HIVE-1176 > URL: https://issues.apache.org/jira/browse/HIVE-1176 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore, Query Processor >Reporter: Prasad Chakka >Assignee: Arvind Prabhakar > Fix For: 0.6.0 > > Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, > HIVE-1176-4.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch > > > hive> create table if not exists tmp_select(s string, c string, n int); > org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got > exception: javax.jdo.JDOUserException JDOQL Single-String query should always > start with SELECT) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) > at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException > JDOQL Single-String query should always start with SELECT) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450) > at > org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439) > ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1430) serializing/deserializing the query plan is useless and expensive
[ https://issues.apache.org/jira/browse/HIVE-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1430: - Status: Patch Available (was: Open) > serializing/deserializing the query plan is useless and expensive > - > > Key: HIVE-1430 > URL: https://issues.apache.org/jira/browse/HIVE-1430 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Ning Zhang > Attachments: HIVE-1430.patch > > > We should turn it off by default -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1430) serializing/deserializing the query plan is useless and expensive
[ https://issues.apache.org/jira/browse/HIVE-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1430: - Attachment: HIVE-1430.patch > serializing/deserializing the query plan is useless and expensive > - > > Key: HIVE-1430 > URL: https://issues.apache.org/jira/browse/HIVE-1430 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Ning Zhang > Attachments: HIVE-1430.patch > > > We should turn it off by default -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1430) serializing/deserializing the query plan is useless and expensive
serializing/deserializing the query plan is useless and expensive - Key: HIVE-1430 URL: https://issues.apache.org/jira/browse/HIVE-1430 Project: Hadoop Hive Issue Type: Bug Reporter: Namit Jain Assignee: Ning Zhang We should turn it off by default -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
[ https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881921#action_12881921 ] John Sichi commented on HIVE-1416: -- Attached junit-noframes.html with the failures (but not the diffs). Example diff snippet from union6.q: @@ -233,7 +233,6 @@ 406val_406 66 val_66 98 val_98 -tst1 500 > Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode > -- > > Key: HIVE-1416 > URL: https://issues.apache.org/jira/browse/HIVE-1416 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.6.0, 0.7.0 > > Attachments: HIVE-1416.2.patch, HIVE-1416.patch, junit-noframes.html > > > Hive parses the file name generated by tasks to figure out the task ID in > order to generate files for empty buckets. Different hadoop versions and > execution mode have different ways of naming output files by > mappers/reducers. We need to move the parsing code to shims. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
[ https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1416: - Attachment: junit-noframes.html > Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode > -- > > Key: HIVE-1416 > URL: https://issues.apache.org/jira/browse/HIVE-1416 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.6.0, 0.7.0 > > Attachments: HIVE-1416.2.patch, HIVE-1416.patch, junit-noframes.html > > > Hive parses the file name generated by tasks to figure out the task ID in > order to generate files for empty buckets. Different hadoop versions and > execution mode have different ways of naming output files by > mappers/reducers. We need to move the parsing code to shims. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
[ https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1416: - Status: Open (was: Patch Available) Ning, this ran through cleanly with Hadoop 0.17 (where I verified that it fixes the problem), but on Hadoop 0.20, it results in a lot of test failures. These aren't just diffs due to missing ORDER BY; values are actually missing from the results. > Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode > -- > > Key: HIVE-1416 > URL: https://issues.apache.org/jira/browse/HIVE-1416 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.6.0, 0.7.0 > > Attachments: HIVE-1416.2.patch, HIVE-1416.patch > > > Hive parses the file name generated by tasks to figure out the task ID in > order to generate files for empty buckets. Different hadoop versions and > execution mode have different ways of naming output files by > mappers/reducers. We need to move the parsing code to shims. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: 6.0 and trunk look broken to me
Not sure if this is just my env but on 0.6.0 when I run the unit tests I get a bunch of errors of the following form: [junit] Begin query: alter3.q [junit] java.lang.NoSuchFieldError: HIVESESSIONSILENT [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1052) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.util.RunJar.main(RunJar.java:155) [junit] at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194) [junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) [junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) [junit] at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220) [junit] -Original Message- From: John Sichi [mailto:jsi...@facebook.com] Sent: Wednesday, June 23, 2010 2:15 PM To: Subject: Re: 6.0 and trunk look broken to me (You mean 0.6, right?) I'm not able to reproduce this (just tested with latest trunk on Linux and Mac). Is anyone else seeing it? JVS On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote: > Trunk and 6.0 both show this in hadoop local mode and hadoop distributed mode. > > export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca > edw...@ec dist]$ export > HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$ > bin/hive Hive history > file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt > hive> show tables; > FAILED: Parse Error: line 0:-1 cannot recognize input '' > > [edw...@ec dist]$ more /tmp/edward/hive.log > 2010-06-23 16:41:00,749 ERROR ql.Driver > (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1 > cannot recognize input '' > > org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot > recognize input '' > > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
[jira] Updated: (HIVE-1229) replace dependencies on HBase deprecated API
[ https://issues.apache.org/jira/browse/HIVE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Basab Maulik updated HIVE-1229: --- Attachment: HIVE-1229.2.patch fixed checkstyle violations and rebased against trunk. Tests run successfully: ant test -Dtestcase=TestLazyHBaseObject ant test -Dtestcase=TestHBaseSerDe ant test -Dtestcase=TestHBaseCliDriver -Dqfile=hbase_queries.q thanks. > replace dependencies on HBase deprecated API > > > Key: HIVE-1229 > URL: https://issues.apache.org/jira/browse/HIVE-1229 > Project: Hadoop Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 0.6.0 >Reporter: John Sichi >Assignee: Basab Maulik > Attachments: HIVE-1229.2.patch > > > Some of these dependencies are on the old Hadoop mapred packages; others are > HBase-specific. The former have to wait until the rest of Hive moves over to > the new Hadoop mapreduce package, but the HBase-specific ones don't have to > wait. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: 6.0 and trunk look broken to me
(You mean 0.6, right?) I'm not able to reproduce this (just tested with latest trunk on Linux and Mac). Is anyone else seeing it? JVS On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote: > Trunk and 6.0 both show this in hadoop local mode and hadoop distributed mode. > > export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca > edw...@ec dist]$ export > HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$ > bin/hive > Hive history file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt > hive> show tables; > FAILED: Parse Error: line 0:-1 cannot recognize input '' > > [edw...@ec dist]$ more /tmp/edward/hive.log > 2010-06-23 16:41:00,749 ERROR ql.Driver > (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1 > cannot recognize input '' > > org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot > recognize input '' > > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
[jira] Resolved: (HIVE-56) The reducer output is not created if the mapper input is empty
[ https://issues.apache.org/jira/browse/HIVE-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-56. Fix Version/s: 0.6.0 Resolution: Fixed This was fixed a long time back. In hive, an empty input is created to start a dummy mapper > The reducer output is not created if the mapper input is empty > -- > > Key: HIVE-56 > URL: https://issues.apache.org/jira/browse/HIVE-56 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.6.0 > > > For some Hive stuff, I ran into the following scenario: > For a given map-reduce job, the input was empty. Because of that no mappers > and reducers were created. It would have been helpful if an empty output for > the reducer would have been created. > After browsing though the code, it seems that in initTasks() in > JobInProgress, no mappers and reducers are initialized if input is empty. > I was thinking of putting a fix there. If the input is empty, before > returning, create the output directory (as specified by the reducer) if > needed.Any comments/suggestions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1229) replace dependencies on HBase deprecated API
[ https://issues.apache.org/jira/browse/HIVE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Basab Maulik updated HIVE-1229: --- Attachment: (was: HIVE-1129.1.patch) > replace dependencies on HBase deprecated API > > > Key: HIVE-1229 > URL: https://issues.apache.org/jira/browse/HIVE-1229 > Project: Hadoop Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 0.6.0 >Reporter: John Sichi >Assignee: Basab Maulik > > Some of these dependencies are on the old Hadoop mapred packages; others are > HBase-specific. The former have to wait until the rest of Hive moves over to > the new Hadoop mapreduce package, but the HBase-specific ones don't have to > wait. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
6.0 and trunk look broken to me
Trunk and 6.0 both show this in hadoop local mode and hadoop distributed mode. export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca edw...@ec dist]$ export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$ bin/hive Hive history file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt hive> show tables; FAILED: Parse Error: line 0:-1 cannot recognize input '' [edw...@ec dist]$ more /tmp/edward/hive.log 2010-06-23 16:41:00,749 ERROR ql.Driver (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1 cannot recognize input '' org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot recognize input '' at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
[jira] Updated: (HIVE-1359) Unit test should be shim-aware
[ https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1359: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Committed. Thanks Ning! > Unit test should be shim-aware > -- > > Key: HIVE-1359 > URL: https://issues.apache.org/jira/browse/HIVE-1359 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0, 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.6.0, 0.7.0 > > Attachments: HIVE-1359.2.patch, HIVE-1359.patch, unit_tests.txt > > > Some features in Hive only works for certain Hadoop versions through shim. > However the unit test structure is not shim-aware in that there is only one > set of queries and expected outputs for all Hadoop versions. This may not be > sufficient when we will have different output for different Hadoop versions. > One example is CombineHiveInputFormat wich is only available from Hadoop > 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be > different. Another example is archival partitions (HAR) which is also only > available from 0.20. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: problem with hive to integrate with hbase
Hi Muhammad, Just build from the top level of hive trunk (not from the hbase-handler component) and everything, including the hbase-handler, will be built for you. Follow the normal Hive build instructions in http://wiki.apache.org/hadoop/Hive/HowToContribute Note that we currently build against the 0.20.3 version of HBase; if you run into trouble due to mismatches with your 0.20.2 version, you'll need to downgrade the jars in hbase-handler/lib and then rebuild Hive to produce a compatible storage handler. JVS On Jun 23, 2010, at 7:37 AM, Muhammad Mudassar wrote: > Hi all > I want to integrate hive with hbase. I am running single node Hbase > 0.20.2 with hadoop 0.20.2 configured in > single node cluster mode. when I tried to run *ant jar* from > Hbase-Handler to get hive_hbase_handler.jar it gives me errors like: > setup: > > compile: > [echo] Compiling: hbase-handler >[javac] Compiling 9 source files to > /home/hadoop/dfs/hive/build/hbase-handler/classes >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:34: > package org.apache.hadoop.hive.serde does not exist >[javac] import org.apache.hadoop.hive.serde.Constants; >[javac]^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:35: > package org.apache.hadoop.hive.serde2 does not exist >[javac] import org.apache.hadoop.hive.serde2.ByteStream; >[javac] ^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:36: > package org.apache.hadoop.hive.serde2 does not exist >[javac] import org.apache.hadoop.hive.serde2.SerDe; >[javac] ^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:37: > package org.apache.hadoop.hive.serde2 does not exist >[javac] import org.apache.hadoop.hive.serde2.SerDeException; >[javac] ^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:38: > package org.apache.hadoop.hive.serde2 does not exist >[javac] import org.apache.hadoop.hive.serde2.SerDeUtils; >[javac] ^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:39: > package org.apache.hadoop.hive.serde2.lazy does not exist >[javac] import org.apache.hadoop.hive.serde2.lazy.LazyFactory; >[javac] ^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:40: > package org.apache.hadoop.hive.serde2.lazy does not exist >[javac] import org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe; >[javac] ^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:41: > package org.apache.hadoop.hive.serde2.lazy does not exist >[javac] import org.apache.hadoop.hive.serde2.lazy.LazyUtils; >[javac] ^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:42: > package org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe does not > exist >[javac] import > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.SerDeParameters; >[javac] ^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:43: > package org.apache.hadoop.hive.serde2.lazy.objectinspector does not > exist >[javac] import > org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector; >[javac] ^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:44: > package org.apache.hadoop.hive.serde2.objectinspector does not exist >[javac] import > org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector; >[javac] ^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:45: > package org.apache.hadoop.hive.serde2.objectinspector does not exist >[javac] import > org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector; >[javac] ^ >[javac] > /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:46: > package org.apache.hadoop.hive.serde2.objectinspector does not exist >[javac] import > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; >[javac] ^ >[javac] > /
[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881767#action_12881767 ] Ashish Thusoo commented on HIVE-1271: - sounds good to me. Thanks for the explanations. +1. Will commit after running the tests. Ashish > Case sensitiveness of type information specified when using custom reducer > causes type mismatch > --- > > Key: HIVE-1271 > URL: https://issues.apache.org/jira/browse/HIVE-1271 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Dilip Joseph >Assignee: Arvind Prabhakar > Fix For: 0.6.0 > > Attachments: HIVE-1271-1.patch, HIVE-1271.patch > > > Type information specified while using a custom reduce script is converted > to lower case, and causes type mismatch during query semantic analysis . The > following REDUCE query where field name = "userId" failed. > hive> CREATE TABLE SS ( >> a INT, >> b INT, >> vals ARRAY> >> ); > OK > hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s >> INSERT OVERWRITE TABLE SS >> REDUCE * >> USING 'myreduce.py' >> AS >> (a INT, >> b INT, >> vals ARRAY> >> ) >> ; > FAILED: Error in semantic analysis: line 2:27 Cannot insert into > target table because column number/types are different SS: Cannot > convert column 2 from array> to > array>. > The same query worked fine after changing "userId" to "userid". -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1018) pushing down group-by before joins
[ https://issues.apache.org/jira/browse/HIVE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881758#action_12881758 ] Ning Zhang commented on HIVE-1018: -- Good points Joy. It will be interesting to see what are the typical use cases you have combining join and GroupBy. Previous what in my mind here is to optimize away the very bad case of skewness in the join (many rows with the same join key). Since GroupBy eliminates the skewness, these rewrite rules push down GroupBy before JOIN for these special cases. What you have mentioned are definitely what we should optimize for these cases. The are helpful for the general cases (non-skewed join) as well. > pushing down group-by before joins > -- > > Key: HIVE-1018 > URL: https://issues.apache.org/jira/browse/HIVE-1018 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang > > Queries with both Group-by and Joins are very common and they are expensive > operations. By default Hive evalutes Joins first then group-by. Sometimes it > is possible to rewrite queries to apply group-by (or map-side partial group > by) first before join. This will remove a lot of duplicated keys in joins and > alleviate skewness in join keys for this case. This rewrite should be > cost-based. Before we have the stats and the CB framework, we can give users > hints to do the rewrite. > A particular case is where the join keys are the same as the grouping keys. > Or the group keys is a superset of the join keys (so that grouping won't > affect the result of joins). > Examples: > -- Q1 > select A.key, B.key > from A join B on (A.key=B.key) > group by A.key, B.key; > --Q2 > select distinct A.key, B.key > from A join B on (A.key=B.key); > --Q3, aggregation function is sum, count, min, max, (avg and median cannot be > handled). > selec A.key, sum(A.value), count(1), min(value), max(value) > from A left semi join B on (A.key=B.key) > group by A.key; > -- Q4. grouping keys is a superset of join keys > select distinct A.key, A.value > from A join B on (A.key=B.key) > In the case of join keys are not a subset of grouping keys, we can introduce > a map-side partial grouping operator with the keys of the UNION of the join > and grouping keys, to remove unnecessary duplications. This should be > cost-based though. > Any thoughts and suggestions? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1096: -- Status: Patch Available (was: Open) > Hive Variables > -- > > Key: HIVE-1096 > URL: https://issues.apache.org/jira/browse/HIVE-1096 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Fix For: 0.6.0 > > Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, > hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff > > > From mailing list: > --Amazon Elastic MapReduce version of Hive seems to have a nice feature > called "Variables." Basically you can define a variable via command-line > while invoking hive with -d DT=2009-12-09 and then refer to the variable via > ${DT} within the hive queries. This could be extremely useful. I can't seem > to find this feature even on trunk. Is this feature currently anywhere in the > roadmap?-- > This could be implemented in many places. > A simple place to put this is > in Driver.compile or Driver.run we can do string substitutions at that level, > and further downstream need not be effected. > There could be some benefits to doing this further downstream, parser,plan. > but based on the simple needs we may not need to overthink this. > I will get started on implementing in compile unless someone wants to discuss > this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1096: -- Attachment: hive-1096-10-patch.txt Patch adds variable interpretation. > Hive Variables > -- > > Key: HIVE-1096 > URL: https://issues.apache.org/jira/browse/HIVE-1096 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Fix For: 0.6.0 > > Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, > hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff > > > From mailing list: > --Amazon Elastic MapReduce version of Hive seems to have a nice feature > called "Variables." Basically you can define a variable via command-line > while invoking hive with -d DT=2009-12-09 and then refer to the variable via > ${DT} within the hive queries. This could be extremely useful. I can't seem > to find this feature even on trunk. Is this feature currently anywhere in the > roadmap?-- > This could be implemented in many places. > A simple place to put this is > in Driver.compile or Driver.run we can do string substitutions at that level, > and further downstream need not be effected. > There could be some benefits to doing this further downstream, parser,plan. > but based on the simple needs we may not need to overthink this. > I will get started on implementing in compile unless someone wants to discuss > this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
problem with hive to integrate with hbase
Hi all I want to integrate hive with hbase. I am running single node Hbase 0.20.2 with hadoop 0.20.2 configured in single node cluster mode. when I tried to run *ant jar* from Hbase-Handler to get hive_hbase_handler.jar it gives me errors like: setup: compile: [echo] Compiling: hbase-handler [javac] Compiling 9 source files to /home/hadoop/dfs/hive/build/hbase-handler/classes [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:34: package org.apache.hadoop.hive.serde does not exist [javac] import org.apache.hadoop.hive.serde.Constants; [javac]^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:35: package org.apache.hadoop.hive.serde2 does not exist [javac] import org.apache.hadoop.hive.serde2.ByteStream; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:36: package org.apache.hadoop.hive.serde2 does not exist [javac] import org.apache.hadoop.hive.serde2.SerDe; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:37: package org.apache.hadoop.hive.serde2 does not exist [javac] import org.apache.hadoop.hive.serde2.SerDeException; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:38: package org.apache.hadoop.hive.serde2 does not exist [javac] import org.apache.hadoop.hive.serde2.SerDeUtils; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:39: package org.apache.hadoop.hive.serde2.lazy does not exist [javac] import org.apache.hadoop.hive.serde2.lazy.LazyFactory; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:40: package org.apache.hadoop.hive.serde2.lazy does not exist [javac] import org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:41: package org.apache.hadoop.hive.serde2.lazy does not exist [javac] import org.apache.hadoop.hive.serde2.lazy.LazyUtils; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:42: package org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe does not exist [javac] import org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.SerDeParameters; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:43: package org.apache.hadoop.hive.serde2.lazy.objectinspector does not exist [javac] import org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:44: package org.apache.hadoop.hive.serde2.objectinspector does not exist [javac] import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:45: package org.apache.hadoop.hive.serde2.objectinspector does not exist [javac] import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:46: package org.apache.hadoop.hive.serde2.objectinspector does not exist [javac] import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:47: package org.apache.hadoop.hive.serde2.objectinspector does not exist [javac] import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:48: package org.apache.hadoop.hive.serde2.objectinspector does not exist [javac] import org.apache.hadoop.hive.serde2.objectinspector.StructField; [javac] ^ [javac] /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/h
Re: real time query option
On Wed, Jun 23, 2010 at 2:12 AM, Amr Awadallah wrote: > For low-latency queries you should either use HBase instead, or consider > Hive over HBase, see: > > http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase/ > > -- amr > > On 6/22/2010 11:05 PM, jaydeep vishwakarma wrote: >> >> Hi, >> >> I want to avoid delta time to execute the queries. Every time even when >> we fetch single row from hive tables it goes to typical map and reduce >> process. Is there any platform which built on top of HDFS or hive table >> which help me to get real time query data, I want to avoid filling data >> to DB. >> >> Regards, >> Jaydeep >> >> The information contained in this communication is intended solely for the >> use of the individual or entity to whom it is addressed and others >> authorized to receive it. It may contain confidential or legally privileged >> information. If you are not the intended recipient you are hereby notified >> that any disclosure, copying, distribution or taking any action in reliance >> on the contents of this information is strictly prohibited and may be >> unlawful. If you have received this communication in error, please notify us >> immediately by responding to this email and then delete it from your system. >> The firm is neither liable for the proper and complete transmission of the >> information contained in this communication nor for any delay in its >> receipt. > Hive by its nature is not real time, but there are some "REAL TIME" options in hive, that you might be able to take advantage of. If your dataset is small: set mapred.job.tracker=local; This will give you a local 1 mapper 1 reducer job. There is not jobtracker start up overhead everything happens in thread. Option: pre compute your results sets you want in real time. select * from tablea where part=x Is NOT a map reduce job. So if you have precomputed tablea selecting it will be as fast as hadoop can stream it to your client.
[jira] Commented: (HIVE-1018) pushing down group-by before joins
[ https://issues.apache.org/jira/browse/HIVE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881597#action_12881597 ] Joydeep Sen Sarma commented on HIVE-1018: - interesting idea. in most of the queries i have written (over the course of last few months - this has involved a *lot* of joins and group-bys) - either the aggregate expressions or the group by clause would have a combination of columns from all tables being joined. these would be fairly hard to optimize based on the ideas outlined here. in most of the join+group-by cases i see - people are joining fact with dimension and then using the at least some non-join columns of the dimension for grouping (typically along with some columns from fact). the join/grouping columns being equal/superset seems interesting - but i am not sure about practical applicability. even in the cases mentioned - some alternate trivial but effective optimizations are available: 1. join key=grouping key - grouping operator should realize that data is already sorted/clustered by grouping key (because it was joined on the same key). in this case we don't need partial aggregates - but can generate full aggregates off the output of the join. no hash maps required. 2. join key = subset of grouping keys - in this case (for sort merge join) - we can sort on the grouping keys (doesn't hurt much) for doing the join and then apply strategy #1. > pushing down group-by before joins > -- > > Key: HIVE-1018 > URL: https://issues.apache.org/jira/browse/HIVE-1018 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang > > Queries with both Group-by and Joins are very common and they are expensive > operations. By default Hive evalutes Joins first then group-by. Sometimes it > is possible to rewrite queries to apply group-by (or map-side partial group > by) first before join. This will remove a lot of duplicated keys in joins and > alleviate skewness in join keys for this case. This rewrite should be > cost-based. Before we have the stats and the CB framework, we can give users > hints to do the rewrite. > A particular case is where the join keys are the same as the grouping keys. > Or the group keys is a superset of the join keys (so that grouping won't > affect the result of joins). > Examples: > -- Q1 > select A.key, B.key > from A join B on (A.key=B.key) > group by A.key, B.key; > --Q2 > select distinct A.key, B.key > from A join B on (A.key=B.key); > --Q3, aggregation function is sum, count, min, max, (avg and median cannot be > handled). > selec A.key, sum(A.value), count(1), min(value), max(value) > from A left semi join B on (A.key=B.key) > group by A.key; > -- Q4. grouping keys is a superset of join keys > select distinct A.key, A.value > from A join B on (A.key=B.key) > In the case of join keys are not a subset of grouping keys, we can introduce > a map-side partial grouping operator with the keys of the UNION of the join > and grouping keys, to remove unnecessary duplications. This should be > cost-based though. > Any thoughts and suggestions? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1304: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Committed. Thanks John > add row_sequence UDF > > > Key: HIVE-1304 > URL: https://issues.apache.org/jira/browse/HIVE-1304 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: John Sichi >Assignee: John Sichi > Fix For: 0.7.0 > > Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch > > > This is a poor man's answer to the standard analytic function row_number(); > it assigns a sequence of numbers to rows, starting from 1. > I'm calling it row_sequence() to distinguish it from the real analytic > function, so that once we add support for those, there won't be any conflict > with the existing UDF. > The problem with this UDF approach is that there are no guarantees about > ordering in SQL processing internals, so use with caution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.