Hive order by not working properly

2015-01-07 Thread vengatesh.babu
Hi,


Hive order by not working properly with columns involved in functions.

Column without function:
SELECT ACCOUNTID from table order by ACCOUNTID;
(Above query works fine in both HIVE amp; MYSQL)
Column with function:
SELECT concat('test_',ACCOUNTID) from table order by ACCOUNTID;
In mysql , Above query working fine.
In hive, below exception thrown
FAILED: SemanticException [Error 10004]: Line 1:59 Invalid table alias or 
column reference 'ACCOUNTID': (possible column names are: _c0)
Temporary Solution :
SELECT concat('test_',ACCOUNTID) as tempColumn from table order by tempcolumn;
Is there any solution available without assigning column as tempcolumn ?
Thanks 
Vengatesh Babu K M







Re: Hive order by not working properly

2015-01-07 Thread Furcy Pin
Hi Vengatesh,

AFAIK, in Hive, the columns used in the ORDER BY clause needs to be present
in the SELECT clause.
I guess this is because it has to be passed to the final reducer.

This is why in your case the first query works and the second does not.
The following query, however should work too:

SELECT ACCOUNTID, *concat('test_',ACCOUNTID)* from table *order by*
 ACCOUNTID;

As you already found, the correct way to write this query is to alias your
column and order with it.
Note that this works too:

SELECT *concat('test_',ACCOUNTID) as **ACCOUNTID* from table *order by*
ACCOUNTID;
I am still using 0.12 though, maybe later versions changed some things.

Hope this helps,

Furcy Pin


PS:

On the other hand, Hive has the opposite behavior in GROUP BY clauses, eg
this will not work:

SELECT *concat('test_',ACCOUNTID) as **tempColumn, COUNT(1)* from table
GROUP BY *tempColumn* *order by* tempColumn;

while this will work:

SELECT *concat('test_',ACCOUNTID) as **tempColumn, COUNT(1)* from table
GROUP BY *concat('test_',ACCOUNTID)* *order by* tempColumn;

and this will either:

SELECT *concat('test_',ACCOUNTID) as **tempColumn, COUNT(1)* from table
GROUP BY *concat('test_',ACCOUNTID)* *order by* *concat('test_',ACCOUNTID) *
;

From my understanding, the reason is that the parsing of the query
compiles the GROUP BY clause before the SELECT (as it will depend on it),
so it doesn't know yet that *concat('test_',ACCOUNTID)* will be named
tempColumn.
On the other hand, ORDER BY is compiled after the SELECT so it does not
recognise *concat('test_',ACCOUNTID)* which has been named tempColumn.





2015-01-07 11:11 GMT+01:00 vengatesh.babu vengatesh.b...@zohocorp.com:

 Hi,

 Hive *order by* not working properly with columns involved in *functions.*

 *Column without function:*

 SELECT *ACCOUNTID* from table *order by* ACCOUNTID;

 (Above query works fine in both HIVE  MYSQL)

 *Column with function:*

 SELECT *concat('test_',ACCOUNTID)* from table *order by* ACCOUNTID;

 In mysql , Above query working fine.

 In hive, below exception thrown

 *FAILED: SemanticException [Error 10004]: Line 1:59 Invalid table alias or
 column reference 'ACCOUNTID': (possible column names are: _c0)*

 Temporary Solution :

 SELECT *concat('test_',ACCOUNTID) as tempColumn* from table *order by*
  tempcolumn;

 Is there any solution available without assigning column as tempcolumn ?

 Thanks

 Vengatesh Babu K M






Hive JDBC:: HiveConnection.setReadonly implementation

2015-01-07 Thread Saravana Kumar Vithyananthan
Hi

When instantiating the org.apache.hadoop.hive.jdbc.HiveConnection object
present in the Java HiveDriver library, when trying to call
setReadonly(true), a Method not supported exception is thrown. And
'isReadonly' is false by default. Why is setReadonly() method not
implemented? We want to access Hive Data only for reporting purposes, and
hence, we want to open the connection in the readonly mode.

-- 
Thanks,
Saravana Kumar.


Hive script auto completion and semantic analyzer?

2015-01-07 Thread Xiaoyong Zhu
Hi, hive experts

We are building a script auto-completion framework for Hive to make Hive easier 
to use. We are also trying to new We have read the hive compiler design 
documenthttps://cwiki.apache.org/confluence/display/Hive/Design#Design-Compiler
 and have several questions:

1.   We have already used Antlr to generate a Hive parser, but we don't 
know how to re-use semantic analyzer of Hive - are there someone who is 
specialized in this area or share us some related documents? Is there a 
possibility to re-use it independently with Hive?

2.   Do you know if anyone have done these kind of 
work(auto-completion/script validation) before? If so, could someone give us 
some pointers to those works?

Xiaoyong


Hive 0.13 vs LZO index vs hive.hadoop.supports.splittable.combineinputformat issue

2015-01-07 Thread Nathalie Blais
Hello Hive support team,

Happy new year to you!

Quick question in regards to combining small LZO files in Hive.  As some of our 
HDFS files are indexed (not all, but there is always a few .lzo.index files in 
the directory structure), we are experiencing the problematic behavior 
described in JIRA MAPREDUCE-5537 
(https://issues.apache.org/jira/browse/MAPREDUCE-5537 ); the case is 100% 
reproducible.

We have a separate aggregation process that runs on the cluster to take care of 
the “small files issue”.  However, in between runs, in order to reduce the 
number of mappers (and busy containers), we would have loved to set 
hive.hadoop.supports.splittable.combineinputformat to true and allow Hive to 
combine small files by itself.

We are using Cloudera distro CDH 5.2.0 and ideally we would avoid building 
hadoop-core manually.  Do you know if the patch on JIRA MAPREDUCE-5537 has ever 
been included in any official release?

I will wait for news from you.

Thank you very much,

Nathalie Blais
Ubisoft Montreal

[cid:image002.png@01CFED39.93DB5F20]

Nathalie Blais
BI Developer - DNAhttp://technologygroup/dna
Technology Group Online – Ubisoft Montreal








Adding new columns to parquet based Hive table

2015-01-07 Thread Kumar V
Hi,    I have a Parquet format Hive table with a few columns.  I have loaded a 
lot of data to this table already and it seems to work.I have to add a few new 
columns to this table.  If I add new columns, queries don't work anymore since 
I have not reloaded the old data.Is there a way to add new fields to the table 
and not reload the old Parquet files and make the query work ?
I tried this in Hive 0.10 and also on hive 0.13.  Getting an error in both 
versions.
Please let me know how to handle this.
Regards,Kumar.