[jira] Updated: (HIVE-1430) serializing/deserializing the query plan is useless and expensive

2010-06-23 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1430:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Fixed. Thanks Ning

> serializing/deserializing the query plan is useless and expensive
> -
>
> Key: HIVE-1430
> URL: https://issues.apache.org/jira/browse/HIVE-1430
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1430.patch
>
>
> We should turn it off by default

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-187) ODBC driver

2010-06-23 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-187:


Attachment: thrift_64.r790732.tgz

Uploading thrift_64.r790732.tgz the complete 64 bit thrift libs (including 
libfb303.a) & binaries . These libraries are compiled under CentOS 5.2 (kernel 
2.6.20, GCC 4.1.2) 

> ODBC driver
> ---
>
> Key: HIVE-187
> URL: https://issues.apache.org/jira/browse/HIVE-187
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.6.0
>Reporter: Raghotham Murthy
>Assignee: Eric Hwang
> Fix For: 0.4.0
>
> Attachments: HIVE-187.1.patch, HIVE-187.2.patch, HIVE-187.3.patch, 
> hive-187.4.patch, thrift_64.r790732.tgz, thrift_home_linux_32.tgz, 
> thrift_home_linux_64.tgz, unixODBC-2.2.14-1.tgz, unixODBC-2.2.14-2.tgz, 
> unixODBC-2.2.14-3.tgz, unixODBC-2.2.14-hive-patched.tar.gz, 
> unixODBC-2.2.14.tgz, unixodbc.patch
>
>
> We need to provide the a small number of functions to get basic query
> execution and retrieval of results. This is based on the tutorial provided
> here: http://www.easysoft.com/developer/languages/c/odbc_tutorial.html
>  
> The minimum set of ODBC functions required are:
> SQLAllocHandle - for environment, connection, statement
> SQLSetEnvAttr
> SQLDriverConnect
> SQLExecDirect
> SQLNumResultCols
> SQLFetch
> SQLGetData
> SQLDisconnect
> SQLFreeHandle
>  
> If required the plan would be to do the following:
> 1. generate c++ client stubs for thrift server
> 2. implement the required functions in c++ by calling the c++ client
> 3. make the c++ functions in (2) extern C and then use those in the odbc
> SQL* functions
> 4. provide a .so (in linux) which can be used by the ODBC clients.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1176:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed.  Thanks Arvind!


> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176-6.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-23 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882048#action_12882048
 ] 

Arvind Prabhakar commented on HIVE-1271:


@Ashish: I created HIVE-1432 to track the test case creation. I will be 
submitting a patch for that soon. Thanks for pointing this out.

> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1432) Create a test case for case sensitive comparison done during field comparison

2010-06-23 Thread Arvind Prabhakar (JIRA)
Create a test case for case sensitive comparison done during field comparison
-

 Key: HIVE-1432
 URL: https://issues.apache.org/jira/browse/HIVE-1432
 Project: Hadoop Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Arvind Prabhakar
Assignee: Arvind Prabhakar
 Fix For: 0.6.0


See HIVE-1271. This jira tracks the creation of a test case to test this fix 
specifically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1431) Hive CLI can't handle query files that begin with comments

2010-06-23 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882028#action_12882028
 ] 

John Sichi commented on HIVE-1431:
--

sqlline (see my notes in HIVE-987) deals with comments correctly in a fairly 
simple fashion.

> Hive CLI can't handle query files that begin with comments
> --
>
> Key: HIVE-1431
> URL: https://issues.apache.org/jira/browse/HIVE-1431
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
>
> {code}
> % cat test.q
> -- This is a comment, followed by a command
> set -v;
> -- 
> -- Another comment
> --
> show tables;
> -- Last comment
> (master) [ ~/Projects/hive ]
> % hive < test.q
> Hive history file=/tmp/carl/hive_job_log_carl_201006231606_1140875653.txt
> hive> -- This is a comment, followed by a command
> > set -v;
> FAILED: Parse Error: line 2:0 cannot recognize input 'set'
> hive> -- 
> > -- Another comment
> > --
> > show tables;
> OK
> rawchunks
> Time taken: 5.334 seconds
> hive> -- Last comment
> > (master) [ ~/Projects/hive ]
> % 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: 6.0 and trunk look broken to me

2010-06-23 Thread Edward Capriolo
On Wed, Jun 23, 2010 at 10:48 PM, John Sichi  wrote:
> Did you get past this?  It looks like some kind of bad build.
>
> JVS
>
> On Jun 23, 2010, at 2:38 PM, Ashish Thusoo wrote:
>
>> Not sure if this is just my env but on 0.6.0 when I run the unit tests I get 
>> a bunch of errors of the following form:
>>
>>    [junit] Begin query: alter3.q
>>    [junit] java.lang.NoSuchFieldError: HIVESESSIONSILENT
>>    [junit]     at 
>> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1052)
>>    [junit]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>    [junit]     at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>    [junit]     at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>    [junit]     at java.lang.reflect.Method.invoke(Method.java:597)
>>    [junit]     at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>    [junit]     at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
>>    [junit]     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>    [junit]     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>    [junit]     at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
>>    [junit]
>>
>> -Original Message-
>> From: John Sichi [mailto:jsi...@facebook.com]
>> Sent: Wednesday, June 23, 2010 2:15 PM
>> To: 
>> Subject: Re: 6.0 and trunk look broken to me
>>
>> (You mean 0.6, right?)
>>
>> I'm not able to reproduce this (just tested with latest trunk on Linux and 
>> Mac).  Is anyone else seeing it?
>>
>> JVS
>>
>> On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote:
>>
>>> Trunk and 6.0 both show this in hadoop local mode and hadoop distributed 
>>> mode.
>>>
>>> export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca
>>> edw...@ec dist]$ export
>>> HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$
>>> bin/hive Hive history
>>> file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt
>>> hive> show tables;
>>> FAILED: Parse Error: line 0:-1 cannot recognize input ''
>>>
>>> [edw...@ec dist]$ more /tmp/edward/hive.log
>>> 2010-06-23 16:41:00,749 ERROR ql.Driver
>>> (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1
>>> cannot recognize input ''
>>>
>>> org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot
>>> recognize input ''
>>>
>>>      at 
>>> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401)
>>>      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299)
>>>      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379)
>>>      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>>>      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>>>      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>      at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>      at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>
>

I do not know what is up. Cleaned up my .ivy2 checked out and the
build again. I guess if no one else is seeing it, it must be something
on my system.

Total time: 3 minutes 7 seconds
[edw...@ec hive_6_pre]$ cd build/dist/
[edw...@ec dist]$ cd ../hive-trunk/^C
[edw...@ec dist]$ ls
bin  conf  examples  lib  README.txt
[edw...@ec dist]$ bin/hive
Hive history file=/tmp/edward/hive_job_log_edward_201006232341_41029014.txt
hive> show tables;
FAILED: Parse Error: line 0:-1 cannot recognize input ''

hive> exit;
[edw...@ec dist]$ ant -v
Apache Ant version 1.8.0 compiled on February 1 2010
Trying the default build file: build.xml
Buildfile: build.xml does not exist!
Build failed
[edw...@ec dist]$ java -v
Unrecognized option: -v
Could not create the Java virtual machine.
[edw...@ec dist]$ java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)


[jira] Commented: (HIVE-1431) Hive CLI can't handle query files that begin with comments

2010-06-23 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882024#action_12882024
 ] 

Edward Capriolo commented on HIVE-1431:
---

We have a few tickets open, we really need to move all this stuff to a real 
parser so we can properly deal with things like ';' or comments like this or 
whatever. It is painfully hard to work around all these type of things and we 
never get to the root of the problem.

> Hive CLI can't handle query files that begin with comments
> --
>
> Key: HIVE-1431
> URL: https://issues.apache.org/jira/browse/HIVE-1431
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
>
> {code}
> % cat test.q
> -- This is a comment, followed by a command
> set -v;
> -- 
> -- Another comment
> --
> show tables;
> -- Last comment
> (master) [ ~/Projects/hive ]
> % hive < test.q
> Hive history file=/tmp/carl/hive_job_log_carl_201006231606_1140875653.txt
> hive> -- This is a comment, followed by a command
> > set -v;
> FAILED: Parse Error: line 2:0 cannot recognize input 'set'
> hive> -- 
> > -- Another comment
> > --
> > show tables;
> OK
> rawchunks
> Time taken: 5.334 seconds
> hive> -- Last comment
> > (master) [ ~/Projects/hive ]
> % 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1342) Predicate push down get error result when sub-queries have the same alias name

2010-06-23 Thread Ted Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Xu updated HIVE-1342:
-

   Status: Patch Available  (was: Open)
Affects Version/s: 0.6.0
   (was: 0.5.0)
   (was: 0.4.2)
Fix Version/s: 0.6.0

> Predicate push down get error result when sub-queries have the same alias 
> name 
> ---
>
> Key: HIVE-1342
> URL: https://issues.apache.org/jira/browse/HIVE-1342
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Ted Xu
>Priority: Critical
> Fix For: 0.6.0
>
> Attachments: cmd.hql, explain, ppd_same_alias_1.patch
>
>
> Query is over-optimized by PPD when sub-queries have the same alias name, see 
> the query:
> ---
> create table if not exists dm_fact_buyer_prd_info_d (
>   category_id string
>   ,gmv_trade_num  int
>   ,user_idint
>   )
> PARTITIONED BY (ds int);
> set hive.optimize.ppd=true;
> set hive.map.aggr=true;
> explain select category_id1,category_id2,assoc_idx
> from (
>   select 
>   category_id1
>   , category_id2
>   , count(distinct user_id) as assoc_idx
>   from (
>   select 
>   t1.category_id as category_id1
>   , t2.category_id as category_id2
>   , t1.user_id
>   from (
>   select category_id, user_id
>   from dm_fact_buyer_prd_info_d
>   group by category_id, user_id ) t1
>   join (
>   select category_id, user_id
>   from dm_fact_buyer_prd_info_d
>   group by category_id, user_id ) t2 on 
> t1.user_id=t2.user_id 
>   ) t1
>   group by category_id1, category_id2 ) t_o
>   where category_id1 <> category_id2
>   and assoc_idx > 2;
> -
> The query above will fail when execute, throwing exception: "can not cast 
> UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text)". 
> I explained the query and the execute plan looks really wired ( only Stage-1, 
> see the highlighted predicate):
> ---
> Stage: Stage-1
> Map Reduce
>   Alias -> Map Operator Tree:
> t_o:t1:t1:dm_fact_buyer_prd_info_d 
>   TableScan
> alias: dm_fact_buyer_prd_info_d
> Filter Operator
>   predicate:
>   expr: *(category_id <> user_id)*
>   type: boolean
>   Select Operator
> expressions:
>   expr: category_id
>   type: string
>   expr: user_id
>   type: bigint
> outputColumnNames: category_id, user_id
> Group By Operator
>   keys:
> expr: category_id
> type: string
> expr: user_id
> type: bigint
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Reduce Output Operator
> key expressions:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> sort order: ++
> Map-reduce partition columns:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> tag: -1
>   Reduce Operator Tree:
> Group By Operator
>   keys:
> expr: KEY._col0
> type: string
> expr: KEY._col1
> type: bigint
>   mode: mergepartial
>   outputColumnNames: _col0, _col1
>   Select Operator
> expressions:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> outputColumnNames: _col0, _col1
> File Output Operator
>   compressed: true
>   GlobalTableId: 0
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequ

Review Request: Hive Variables

2010-06-23 Thread Edward Capriolo

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/229/
---

Review request for Hive Developers.


Summary
---

Hive Variables


Diffs
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 955109 
  trunk/conf/hive-default.xml 955109 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 955109 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java 
955109 
  trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/set_processor_namespaces.q.out 
PRE-CREATION 

Diff: http://review.hbase.org/r/229/diff


Testing
---


Thanks,

Edward



[jira] Updated: (HIVE-1096) Hive Variables

2010-06-23 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Attachment: hive-1096-11-patch.txt

Was not interpolating system:vars. Fixed with better test case.

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0, 0.7.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, 
> hive-1096-11-patch.txt, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, 
> hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: 6.0 and trunk look broken to me

2010-06-23 Thread John Sichi
Did you get past this?  It looks like some kind of bad build.

JVS

On Jun 23, 2010, at 2:38 PM, Ashish Thusoo wrote:

> Not sure if this is just my env but on 0.6.0 when I run the unit tests I get 
> a bunch of errors of the following form:
> 
>[junit] Begin query: alter3.q
>[junit] java.lang.NoSuchFieldError: HIVESESSIONSILENT
>[junit] at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1052)
>[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>[junit] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>[junit] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>[junit] at java.lang.reflect.Method.invoke(Method.java:597)
>[junit] at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>[junit] at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
>[junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>[junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>[junit] at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
>[junit] 
> 
> -Original Message-
> From: John Sichi [mailto:jsi...@facebook.com] 
> Sent: Wednesday, June 23, 2010 2:15 PM
> To: 
> Subject: Re: 6.0 and trunk look broken to me
> 
> (You mean 0.6, right?)
> 
> I'm not able to reproduce this (just tested with latest trunk on Linux and 
> Mac).  Is anyone else seeing it?
> 
> JVS
> 
> On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote:
> 
>> Trunk and 6.0 both show this in hadoop local mode and hadoop distributed 
>> mode.
>> 
>> export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca
>> edw...@ec dist]$ export
>> HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$ 
>> bin/hive Hive history 
>> file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt
>> hive> show tables;
>> FAILED: Parse Error: line 0:-1 cannot recognize input ''
>> 
>> [edw...@ec dist]$ more /tmp/edward/hive.log
>> 2010-06-23 16:41:00,749 ERROR ql.Driver
>> (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1 
>> cannot recognize input ''
>> 
>> org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot 
>> recognize input ''
>> 
>>  at 
>> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401)
>>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299)
>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379)
>>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>  at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 



[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-23 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881998#action_12881998
 ] 

Ashish Thusoo commented on HIVE-1271:
-

I have committed this to trunk and will commit to 0.6.0 soon. One thing I did 
overlook though. We should add a test case for this. Can you do that as part of 
another JIRA as this one is already partially committed.

Thanks,
Ashish

> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881967#action_12881967
 ] 

John Sichi commented on HIVE-1176:
--

+1.  Will commit when tests pass.


> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176-6.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-06-23 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Attachment: HIVE-1307.0.patch

Uploading a preliminary patch. This is not ready for review yet. 

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1307.0.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

Attachment: HIVE-1176-6.patch

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176-6.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881956#action_12881956
 ] 

Arvind Prabhakar commented on HIVE-1176:


yes - thats what my intention was. Thanks for catching it.


> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176-6.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-06-23 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1096:
-

Fix Version/s: (was: 0.5.1)
Affects Version/s: (was: 0.5.0)

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0, 0.7.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, 
> hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1387) Make PERCENTILE work with double data type

2010-06-23 Thread Mayank Lahiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1387:


Attachment: HIVE-1387.2.patch
median_approx_quality.png

I've attached HIVE-1387.2.patch, which does the following:

(1) Creates a percentile_approx() UDAF which uses the histogram_numeric() UDAF 
to estimate quantiles from a histogram. The syntax matches the existing 
percentile() UDAF, and extends it with a third parameter that specifies the 
number of histogram bins to use (and thus, the accuracy of quantile estimation):

SELECT percentile_approx(val, 0.5) FROM random;// estimates the median
SELECT percentile_approx(val, array(0.5, 0.95, 0.98)) FROM random; // estimates 
3 quantiles
SELECT percentile_approx(val, 0.5, 1000) FROM random; // estimates the median 
using 1,000 histogram bins instead of the default of 10,000.

(2) I've left the existing percentile() UDAF as it is for the following 
reasons: when the number of unique values in a column is relatively small, 
percentile_approx() will return an exact result. When the number of unique 
values in a column is very large (as one might expect with double), then 
percentile() will run out of memory and crash, so there's really no need to 
modify the existing percentile() to support doubles.

(3) The accuracy of quantile estimation seems to be pretty good. Attached a 
graph showing approximation quality for the median using different histogram 
sizes for random datasets of 100,000 numbers. The default number of histogram 
bins is 10,000, which appears to work quite well.

(4) This patch also refactors the histogram_numeric() class to put all the 
generic histogram functionality into a re-usable inner class. 

> Make PERCENTILE work with double data type
> --
>
> Key: HIVE-1387
> URL: https://issues.apache.org/jira/browse/HIVE-1387
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Vaibhav Aggarwal
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: HIVE-1387.2.patch, median_approx_quality.png, 
> patch-1387-1.patch
>
>
> The PERCENTILE UDAF does not work with double datatype.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-06-23 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1096:
-

Fix Version/s: 0.5.1
   0.7.0

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.5.1, 0.6.0, 0.7.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, 
> hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1387) Make PERCENTILE work with double data type

2010-06-23 Thread Mayank Lahiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1387:


   Status: Patch Available  (was: Open)
Affects Version/s: 0.6.0
Fix Version/s: 0.6.0

> Make PERCENTILE work with double data type
> --
>
> Key: HIVE-1387
> URL: https://issues.apache.org/jira/browse/HIVE-1387
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Vaibhav Aggarwal
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: HIVE-1387.2.patch, median_approx_quality.png, 
> patch-1387-1.patch
>
>
> The PERCENTILE UDAF does not work with double datatype.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1096) Hive Variables

2010-06-23 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881955#action_12881955
 ] 

Carl Steinbach commented on HIVE-1096:
--

Hi Ed, can you please post this patch to review.hbase.org? Thanks!

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.5.1, 0.6.0, 0.7.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, 
> hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881950#action_12881950
 ] 

John Sichi commented on HIVE-1176:
--

Thanks for the doc Arvind.  

But for the patch:  we need the ORDER BY on the SELECT that produces results in 
the output log (not the INSERT).

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176.lib-files.tar.gz, 
> HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881946#action_12881946
 ] 

Arvind Prabhakar commented on HIVE-1176:


@John: done. Please see the new patch attachment -  HIVE-1176-5.patch

Since a lot of good points came out of the discussion on this jira, I took the 
liberty of adding them to the Hive wiki for posterity. You can find it 
[here|http://wiki.apache.org/hadoop/Hive/TipsForAddingNewTests]. Please add to 
it any other points that you feel contributors should take into consideration 
while adding new tests.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176.lib-files.tar.gz, 
> HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

Attachment: HIVE-1176-5.patch

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176.lib-files.tar.gz, 
> HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1431) Hive CLI can't handle query files that begin with comments

2010-06-23 Thread Carl Steinbach (JIRA)
Hive CLI can't handle query files that begin with comments
--

 Key: HIVE-1431
 URL: https://issues.apache.org/jira/browse/HIVE-1431
 Project: Hadoop Hive
  Issue Type: Bug
  Components: CLI
Reporter: Carl Steinbach
 Fix For: 0.6.0, 0.7.0


{code}
% cat test.q
-- This is a comment, followed by a command
set -v;
-- 
-- Another comment
--
show tables;
-- Last comment
(master) [ ~/Projects/hive ]
% hive < test.q
Hive history file=/tmp/carl/hive_job_log_carl_201006231606_1140875653.txt
hive> -- This is a comment, followed by a command
> set -v;
FAILED: Parse Error: line 2:0 cannot recognize input 'set'

hive> -- 
> -- Another comment
> --
> show tables;
OK
rawchunks
Time taken: 5.334 seconds
hive> -- Last comment
> (master) [ ~/Projects/hive ]
% 
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1430) serializing/deserializing the query plan is useless and expensive

2010-06-23 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881942#action_12881942
 ] 

Namit Jain commented on HIVE-1430:
--

+1

> serializing/deserializing the query plan is useless and expensive
> -
>
> Key: HIVE-1430
> URL: https://issues.apache.org/jira/browse/HIVE-1430
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-1430.patch
>
>
> We should turn it off by default

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881936#action_12881936
 ] 

John Sichi commented on HIVE-1176:
--

Just one more change needed...please add an ORDER BY to the select in the 
testcase.  This is required to avoid spurious diffs later since without ORDER 
BY, the query result order is non-deterministic.

After that I'll run through tests and commit.


> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1430) serializing/deserializing the query plan is useless and expensive

2010-06-23 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1430:
-

Status: Patch Available  (was: Open)

> serializing/deserializing the query plan is useless and expensive
> -
>
> Key: HIVE-1430
> URL: https://issues.apache.org/jira/browse/HIVE-1430
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-1430.patch
>
>
> We should turn it off by default

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1430) serializing/deserializing the query plan is useless and expensive

2010-06-23 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1430:
-

Attachment: HIVE-1430.patch

> serializing/deserializing the query plan is useless and expensive
> -
>
> Key: HIVE-1430
> URL: https://issues.apache.org/jira/browse/HIVE-1430
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-1430.patch
>
>
> We should turn it off by default

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1430) serializing/deserializing the query plan is useless and expensive

2010-06-23 Thread Namit Jain (JIRA)
serializing/deserializing the query plan is useless and expensive
-

 Key: HIVE-1430
 URL: https://issues.apache.org/jira/browse/HIVE-1430
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Ning Zhang


We should turn it off by default

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode

2010-06-23 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881921#action_12881921
 ] 

John Sichi commented on HIVE-1416:
--

Attached junit-noframes.html with the failures (but not the diffs).

Example diff snippet from union6.q:

@@ -233,7 +233,6 @@
 406val_406
 66 val_66
 98 val_98
-tst1   500


> Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
> --
>
> Key: HIVE-1416
> URL: https://issues.apache.org/jira/browse/HIVE-1416
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1416.2.patch, HIVE-1416.patch, junit-noframes.html
>
>
> Hive parses the file name generated by tasks to figure out the task ID in 
> order to generate files for empty buckets. Different hadoop versions and 
> execution mode have different ways of naming  output files by 
> mappers/reducers. We need to move the parsing code to shims. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode

2010-06-23 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1416:
-

Attachment: junit-noframes.html

> Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
> --
>
> Key: HIVE-1416
> URL: https://issues.apache.org/jira/browse/HIVE-1416
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1416.2.patch, HIVE-1416.patch, junit-noframes.html
>
>
> Hive parses the file name generated by tasks to figure out the task ID in 
> order to generate files for empty buckets. Different hadoop versions and 
> execution mode have different ways of naming  output files by 
> mappers/reducers. We need to move the parsing code to shims. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode

2010-06-23 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1416:
-

Status: Open  (was: Patch Available)

Ning, this ran through cleanly with Hadoop 0.17 (where I verified that it fixes 
the problem), but on Hadoop 0.20, it results in a lot of test failures.  These 
aren't just diffs due to missing ORDER BY; values are actually missing from the 
results.


> Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
> --
>
> Key: HIVE-1416
> URL: https://issues.apache.org/jira/browse/HIVE-1416
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1416.2.patch, HIVE-1416.patch
>
>
> Hive parses the file name generated by tasks to figure out the task ID in 
> order to generate files for empty buckets. Different hadoop versions and 
> execution mode have different ways of naming  output files by 
> mappers/reducers. We need to move the parsing code to shims. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: 6.0 and trunk look broken to me

2010-06-23 Thread Ashish Thusoo
Not sure if this is just my env but on 0.6.0 when I run the unit tests I get a 
bunch of errors of the following form:

[junit] Begin query: alter3.q
[junit] java.lang.NoSuchFieldError: HIVESESSIONSILENT
[junit] at 
org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1052)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
[junit] at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
[junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
[junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
[junit] at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
[junit] 

-Original Message-
From: John Sichi [mailto:jsi...@facebook.com] 
Sent: Wednesday, June 23, 2010 2:15 PM
To: 
Subject: Re: 6.0 and trunk look broken to me

(You mean 0.6, right?)

I'm not able to reproduce this (just tested with latest trunk on Linux and 
Mac).  Is anyone else seeing it?

JVS

On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote:

> Trunk and 6.0 both show this in hadoop local mode and hadoop distributed mode.
> 
> export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca
> edw...@ec dist]$ export
> HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$ 
> bin/hive Hive history 
> file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt
> hive> show tables;
> FAILED: Parse Error: line 0:-1 cannot recognize input ''
> 
> [edw...@ec dist]$ more /tmp/edward/hive.log
> 2010-06-23 16:41:00,749 ERROR ql.Driver
> (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1 
> cannot recognize input ''
> 
> org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot 
> recognize input ''
> 
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)



[jira] Updated: (HIVE-1229) replace dependencies on HBase deprecated API

2010-06-23 Thread Basab Maulik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Basab Maulik updated HIVE-1229:
---

Attachment: HIVE-1229.2.patch

fixed checkstyle violations and rebased against trunk.

Tests run successfully:

ant test -Dtestcase=TestLazyHBaseObject
ant test -Dtestcase=TestHBaseSerDe
ant test -Dtestcase=TestHBaseCliDriver -Dqfile=hbase_queries.q

thanks.

> replace dependencies on HBase deprecated API
> 
>
> Key: HIVE-1229
> URL: https://issues.apache.org/jira/browse/HIVE-1229
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Basab Maulik
> Attachments: HIVE-1229.2.patch
>
>
> Some of these dependencies are on the old Hadoop mapred packages; others are 
> HBase-specific.  The former have to wait until the rest of Hive moves over to 
> the new Hadoop mapreduce package, but the HBase-specific ones don't have to 
> wait.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: 6.0 and trunk look broken to me

2010-06-23 Thread John Sichi
(You mean 0.6, right?)

I'm not able to reproduce this (just tested with latest trunk on Linux and 
Mac).  Is anyone else seeing it?

JVS

On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote:

> Trunk and 6.0 both show this in hadoop local mode and hadoop distributed mode.
> 
> export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca
> edw...@ec dist]$ export
> HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$
> bin/hive
> Hive history file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt
> hive> show tables;
> FAILED: Parse Error: line 0:-1 cannot recognize input ''
> 
> [edw...@ec dist]$ more /tmp/edward/hive.log
> 2010-06-23 16:41:00,749 ERROR ql.Driver
> (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1
> cannot recognize input ''
> 
> org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot
> recognize input ''
> 
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)



[jira] Resolved: (HIVE-56) The reducer output is not created if the mapper input is empty

2010-06-23 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-56.


Fix Version/s: 0.6.0
   Resolution: Fixed

This was fixed a long time back. In hive, an empty input is created to start a 
dummy mapper

> The reducer output is not created if the mapper input is empty
> --
>
> Key: HIVE-56
> URL: https://issues.apache.org/jira/browse/HIVE-56
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.6.0
>
>
> For some Hive stuff, I ran into the following scenario:
> For a given map-reduce job, the input was empty. Because of that no mappers 
> and reducers were created. It would have been helpful if an empty output for 
> the reducer would have been created.
> After browsing though the code, it seems that in initTasks() in 
> JobInProgress, no mappers and reducers are initialized if input is empty.
> I was thinking of putting a fix there. If the input is empty, before 
> returning, create the output directory (as specified by the reducer) if 
> needed.Any comments/suggestions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1229) replace dependencies on HBase deprecated API

2010-06-23 Thread Basab Maulik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Basab Maulik updated HIVE-1229:
---

Attachment: (was: HIVE-1129.1.patch)

> replace dependencies on HBase deprecated API
> 
>
> Key: HIVE-1229
> URL: https://issues.apache.org/jira/browse/HIVE-1229
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Basab Maulik
>
> Some of these dependencies are on the old Hadoop mapred packages; others are 
> HBase-specific.  The former have to wait until the rest of Hive moves over to 
> the new Hadoop mapreduce package, but the HBase-specific ones don't have to 
> wait.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



6.0 and trunk look broken to me

2010-06-23 Thread Edward Capriolo
Trunk and 6.0 both show this in hadoop local mode and hadoop distributed mode.

export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca
edw...@ec dist]$ export
HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$
bin/hive
Hive history file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt
hive> show tables;
FAILED: Parse Error: line 0:-1 cannot recognize input ''

[edw...@ec dist]$ more /tmp/edward/hive.log
2010-06-23 16:41:00,749 ERROR ql.Driver
(SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1
cannot recognize input ''

org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot
recognize input ''

at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


[jira] Updated: (HIVE-1359) Unit test should be shim-aware

2010-06-23 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1359:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed.  Thanks Ning!


> Unit test should be shim-aware
> --
>
> Key: HIVE-1359
> URL: https://issues.apache.org/jira/browse/HIVE-1359
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1359.2.patch, HIVE-1359.patch, unit_tests.txt
>
>
> Some features in Hive only works for certain Hadoop versions through shim. 
> However the unit test structure is not shim-aware in that there is only one 
> set of queries and expected outputs for all Hadoop versions. This may not be 
> sufficient when we will have different output for different Hadoop versions. 
> One example is CombineHiveInputFormat wich is only available from Hadoop 
> 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be 
> different. Another example is archival partitions (HAR) which is also only 
> available from 0.20. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: problem with hive to integrate with hbase

2010-06-23 Thread John Sichi
Hi Muhammad,

Just build from the top level of hive trunk (not from the hbase-handler 
component) and everything, including the hbase-handler, will be built for you.  
Follow the normal Hive build instructions in

http://wiki.apache.org/hadoop/Hive/HowToContribute

Note that we currently build against the 0.20.3 version of HBase; if you run 
into trouble due to mismatches with your 0.20.2 version, you'll need to 
downgrade the jars in hbase-handler/lib and then rebuild Hive to produce a 
compatible storage handler.

JVS

On Jun 23, 2010, at 7:37 AM, Muhammad Mudassar wrote:

> Hi all
> I want to integrate hive with hbase. I am running single node Hbase
> 0.20.2 with hadoop 0.20.2 configured in
> single node cluster mode. when I tried to run *ant jar*  from
> Hbase-Handler to get hive_hbase_handler.jar it gives me errors like:
> setup:
>
> compile:
> [echo] Compiling: hbase-handler
>[javac] Compiling 9 source files to
> /home/hadoop/dfs/hive/build/hbase-handler/classes
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:34:
> package org.apache.hadoop.hive.serde does not exist
>[javac] import org.apache.hadoop.hive.serde.Constants;
>[javac]^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:35:
> package org.apache.hadoop.hive.serde2 does not exist
>[javac] import org.apache.hadoop.hive.serde2.ByteStream;
>[javac] ^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:36:
> package org.apache.hadoop.hive.serde2 does not exist
>[javac] import org.apache.hadoop.hive.serde2.SerDe;
>[javac] ^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:37:
> package org.apache.hadoop.hive.serde2 does not exist
>[javac] import org.apache.hadoop.hive.serde2.SerDeException;
>[javac] ^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:38:
> package org.apache.hadoop.hive.serde2 does not exist
>[javac] import org.apache.hadoop.hive.serde2.SerDeUtils;
>[javac] ^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:39:
> package org.apache.hadoop.hive.serde2.lazy does not exist
>[javac] import org.apache.hadoop.hive.serde2.lazy.LazyFactory;
>[javac]  ^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:40:
> package org.apache.hadoop.hive.serde2.lazy does not exist
>[javac] import org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe;
>[javac]  ^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:41:
> package org.apache.hadoop.hive.serde2.lazy does not exist
>[javac] import org.apache.hadoop.hive.serde2.lazy.LazyUtils;
>[javac]  ^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:42:
> package org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe does not
> exist
>[javac] import
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.SerDeParameters;
>[javac]  ^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:43:
> package org.apache.hadoop.hive.serde2.lazy.objectinspector does not
> exist
>[javac] import
> org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector;
>[javac]  ^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:44:
> package org.apache.hadoop.hive.serde2.objectinspector does not exist
>[javac] import
> org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
>[javac] ^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:45:
> package org.apache.hadoop.hive.serde2.objectinspector does not exist
>[javac] import
> org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
>[javac] ^
>[javac] 
> /home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:46:
> package org.apache.hadoop.hive.serde2.objectinspector does not exist
>[javac] import
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
>[javac] ^
>[javac] 
> /

[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-23 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881767#action_12881767
 ] 

Ashish Thusoo commented on HIVE-1271:
-

sounds good to me. Thanks for the explanations.

+1. Will commit after running the tests.

Ashish

> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1018) pushing down group-by before joins

2010-06-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881758#action_12881758
 ] 

Ning Zhang commented on HIVE-1018:
--

Good points Joy. It will be interesting to see what are the typical use cases 
you have combining join and GroupBy. Previous what in my mind here is to 
optimize away the very bad case of skewness in the join (many rows with the 
same join key).  Since GroupBy eliminates the skewness, these rewrite rules 
push down GroupBy before JOIN for these special cases. What you have mentioned 
are definitely what we should optimize for these cases. The are helpful for the 
general cases (non-skewed join) as well. 

> pushing down group-by before joins
> --
>
> Key: HIVE-1018
> URL: https://issues.apache.org/jira/browse/HIVE-1018
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>
> Queries with both Group-by and Joins are very common and they are expensive 
> operations. By default Hive evalutes Joins first then group-by. Sometimes it 
> is possible to rewrite queries to apply group-by (or map-side partial group 
> by) first before join. This will remove a lot of duplicated keys in joins and 
> alleviate skewness in join keys for this case. This rewrite should be 
> cost-based. Before we have the stats and the CB framework, we can give users 
> hints to do the rewrite. 
> A particular case is where the join keys are the same as the grouping keys. 
> Or the group keys is a superset of the join keys (so that grouping won't 
> affect the result of joins). 
> Examples:
> -- Q1
> select A.key, B.key
> from A join B on (A.key=B.key)
> group by A.key, B.key;
> --Q2
> select distinct A.key, B.key
> from A join B on (A.key=B.key);
> --Q3, aggregation function is sum, count, min, max, (avg and median cannot be 
> handled).
> selec A.key, sum(A.value), count(1), min(value), max(value)
> from A left semi join B on (A.key=B.key)
> group by A.key;
> -- Q4. grouping keys is a superset of join keys
> select distinct A.key, A.value
> from A join B on (A.key=B.key)
> In the case of join keys are not a subset of grouping keys, we can introduce 
> a map-side partial grouping operator with the keys of the UNION of the join 
> and grouping keys, to remove unnecessary duplications. This should be 
> cost-based though. 
> Any thoughts and suggestions?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-06-23 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Status: Patch Available  (was: Open)

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, 
> hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-06-23 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Attachment: hive-1096-10-patch.txt

Patch adds variable interpretation. 

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, 
> hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



problem with hive to integrate with hbase

2010-06-23 Thread Muhammad Mudassar
Hi all
I want to integrate hive with hbase. I am running single node Hbase
0.20.2 with hadoop 0.20.2 configured in
single node cluster mode. when I tried to run *ant jar*  from
Hbase-Handler to get hive_hbase_handler.jar it gives me errors like:
setup:

compile:
 [echo] Compiling: hbase-handler
[javac] Compiling 9 source files to
/home/hadoop/dfs/hive/build/hbase-handler/classes
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:34:
package org.apache.hadoop.hive.serde does not exist
[javac] import org.apache.hadoop.hive.serde.Constants;
[javac]^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:35:
package org.apache.hadoop.hive.serde2 does not exist
[javac] import org.apache.hadoop.hive.serde2.ByteStream;
[javac] ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:36:
package org.apache.hadoop.hive.serde2 does not exist
[javac] import org.apache.hadoop.hive.serde2.SerDe;
[javac] ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:37:
package org.apache.hadoop.hive.serde2 does not exist
[javac] import org.apache.hadoop.hive.serde2.SerDeException;
[javac] ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:38:
package org.apache.hadoop.hive.serde2 does not exist
[javac] import org.apache.hadoop.hive.serde2.SerDeUtils;
[javac] ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:39:
package org.apache.hadoop.hive.serde2.lazy does not exist
[javac] import org.apache.hadoop.hive.serde2.lazy.LazyFactory;
[javac]  ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:40:
package org.apache.hadoop.hive.serde2.lazy does not exist
[javac] import org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe;
[javac]  ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:41:
package org.apache.hadoop.hive.serde2.lazy does not exist
[javac] import org.apache.hadoop.hive.serde2.lazy.LazyUtils;
[javac]  ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:42:
package org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe does not
exist
[javac] import
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.SerDeParameters;
[javac]  ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:43:
package org.apache.hadoop.hive.serde2.lazy.objectinspector does not
exist
[javac] import
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector;
[javac]  ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:44:
package org.apache.hadoop.hive.serde2.objectinspector does not exist
[javac] import
org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
[javac] ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:45:
package org.apache.hadoop.hive.serde2.objectinspector does not exist
[javac] import
org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
[javac] ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:46:
package org.apache.hadoop.hive.serde2.objectinspector does not exist
[javac] import
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
[javac] ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:47:
package org.apache.hadoop.hive.serde2.objectinspector does not exist
[javac] import
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
[javac] ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java:48:
package org.apache.hadoop.hive.serde2.objectinspector does not exist
[javac] import org.apache.hadoop.hive.serde2.objectinspector.StructField;
[javac] ^
[javac] 
/home/hadoop/dfs/hive/hbase-handler/src/java/org/apache/h

Re: real time query option

2010-06-23 Thread Edward Capriolo
On Wed, Jun 23, 2010 at 2:12 AM, Amr Awadallah  wrote:
> For low-latency queries you should either use HBase instead, or consider
> Hive over HBase, see:
>
> http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase/
>
> -- amr
>
> On 6/22/2010 11:05 PM, jaydeep vishwakarma wrote:
>>
>> Hi,
>>
>> I want to avoid delta time to execute the queries. Every time even when
>> we fetch single row from hive tables it goes to typical map and reduce
>> process. Is there any platform which built on top of HDFS or hive table
>> which help me to get real time query data, I want to avoid filling data
>> to DB.
>>
>> Regards,
>> Jaydeep
>>
>> The information contained in this communication is intended solely for the
>> use of the individual or entity to whom it is addressed and others
>> authorized to receive it. It may contain confidential or legally privileged
>> information. If you are not the intended recipient you are hereby notified
>> that any disclosure, copying, distribution or taking any action in reliance
>> on the contents of this information is strictly prohibited and may be
>> unlawful. If you have received this communication in error, please notify us
>> immediately by responding to this email and then delete it from your system.
>> The firm is neither liable for the proper and complete transmission of the
>> information contained in this communication nor for any delay in its
>> receipt.
>

Hive by its nature is not real time, but there are some "REAL TIME"
options in hive, that you might be able to take advantage of.

If your dataset is small:

set mapred.job.tracker=local;

This will give you a local 1 mapper 1 reducer job. There is not
jobtracker start up overhead everything happens in thread.

Option: pre compute your results sets you want in real time.

select * from tablea where part=x

Is NOT a map reduce job. So if you have precomputed tablea selecting
it will be as fast as hadoop can stream it to your client.


[jira] Commented: (HIVE-1018) pushing down group-by before joins

2010-06-23 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881597#action_12881597
 ] 

Joydeep Sen Sarma commented on HIVE-1018:
-

interesting idea. in most of the queries i have written (over the course of 
last few months - this has involved a *lot* of joins and group-bys) - either 
the aggregate expressions or the group by clause would have a combination of 
columns from all tables being joined. these would be fairly hard to optimize 
based on the ideas outlined here.

in most of the join+group-by cases i see - people are joining fact with 
dimension and then using the at least some non-join columns of the dimension 
for grouping (typically along with some columns from fact). the join/grouping 
columns being equal/superset seems interesting - but i am not sure about 
practical applicability.

even in the cases mentioned - some alternate trivial but effective 
optimizations are available:
1. join key=grouping key - grouping operator should realize that data is 
already sorted/clustered by grouping key (because it was joined on the same 
key). in this case we don't need partial aggregates - but can generate full 
aggregates off the output of the join. no hash maps required.
2. join key = subset of grouping keys - in this case (for sort merge join) - we 
can sort on the grouping keys (doesn't hurt much) for doing the join and then 
apply strategy #1.

> pushing down group-by before joins
> --
>
> Key: HIVE-1018
> URL: https://issues.apache.org/jira/browse/HIVE-1018
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>
> Queries with both Group-by and Joins are very common and they are expensive 
> operations. By default Hive evalutes Joins first then group-by. Sometimes it 
> is possible to rewrite queries to apply group-by (or map-side partial group 
> by) first before join. This will remove a lot of duplicated keys in joins and 
> alleviate skewness in join keys for this case. This rewrite should be 
> cost-based. Before we have the stats and the CB framework, we can give users 
> hints to do the rewrite. 
> A particular case is where the join keys are the same as the grouping keys. 
> Or the group keys is a superset of the join keys (so that grouping won't 
> affect the result of joins). 
> Examples:
> -- Q1
> select A.key, B.key
> from A join B on (A.key=B.key)
> group by A.key, B.key;
> --Q2
> select distinct A.key, B.key
> from A join B on (A.key=B.key);
> --Q3, aggregation function is sum, count, min, max, (avg and median cannot be 
> handled).
> selec A.key, sum(A.value), count(1), min(value), max(value)
> from A left semi join B on (A.key=B.key)
> group by A.key;
> -- Q4. grouping keys is a superset of join keys
> select distinct A.key, A.value
> from A join B on (A.key=B.key)
> In the case of join keys are not a subset of grouping keys, we can introduce 
> a map-side partial grouping operator with the keys of the UNION of the join 
> and grouping keys, to remove unnecessary duplications. This should be 
> cost-based though. 
> Any thoughts and suggestions?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1304) add row_sequence UDF

2010-06-23 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1304:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks John

> add row_sequence UDF
> 
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); 
> it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic 
> function, so that once we add support for those, there won't be any conflict 
> with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about 
> ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.