Re: Get 'TBLPROPERTIES' of a Hive table in Java

2014-07-22 Thread Tuong Tr.
Hi,

You should be able to find what you need to do in the getSchema() method inside 
 
hive/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java.

The table properties does not get copied into the jobconf in M/R job's passedin 
jobconf.  so there is some complex logic to track it down.

Good luck,
Tuong 


On Tuesday, July 22, 2014 12:52 AM, Something Something 
 wrote:
  


I am writing a custom InputFormat class in which I need to 
get the 'TBLPROPETIES' of a Hive Table.  This InputFormat will extend 
the HiveInputFormat. 
I thought the table properties will be available to me in the 'JobConf', but 
they are not there. 
What's the best way to get TBLPROPERTIES of a Hive table in a Java class such 
as InputFormat? 
Would greatly appreciate any help.  Thanks.

RE: Hive Statistics

2014-07-22 Thread Navdeep Agrawal
Thank you Nitin for reply. I am using mysql database ,and also I can see new 
row created for the partition ,but all values are zero . I think explicitly 
giving mysql data base wont make a difference .

From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Tuesday, July 22, 2014 11:05 PM
To: user@hive.apache.org
Subject: Re: Hive Statistics

by default hive stores the statistics in derby database.

If you want a persistent look at column statistics, you may want to create 
mysql based database for column statistics.

Your queries look fine

On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:

Hi ,



i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch 
these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics 
for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my 
hive.compute.query.using.stats to get correct result(through map reduce 
max(load_inst_id))




--
Nitin Pawar


Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-22 Thread 丁桂涛(桂花)
Recently I developed a Hive Generic UDF *getad*. It accepts a map type and
a string type parameter and outputs a string value. But I found the UDF
output really confusing in different conditions.

Condition A:

select
  getad(map_col, 'tp') as tp,
  getad(map_col, 'p') as p,
  getad(map_col, 'sp') as sp
from
  table_name
where
  id = ;

The output is right: 'tp', 'p', 'sp'.

Condition B:

select
  array(tp, p, sp) as ps
from
  (
  select
getad(map_col, 'tp') as tp,
getad(map_col, 'p') as p,
getad(map_col, 'sp') as sp
  from
table_name
  where
id = 
  ) t;

The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the
same result:

select
  array(
getad(map_col, 'tp'),
getad(map_col, 'p'),
getad(map_col, 'sp')
  ) as ps
from
  table_name
where
  id = ;

Could you please provide me some hints on this? Thanks!

-- 
丁桂涛


Re: Query difference between 2 tables

2014-07-22 Thread Brenden Cobb
Perfect, thanks very much.

-BC

From: Gunther Hagleitner 
mailto:ghagleit...@hortonworks.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, July 22, 2014 3:03 PM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Query difference between 2 tables

select x from t1 left outer join t2 on t1.x = t2.x where t2.x is null

should work.

Thanks,
Gunther.


On Tue, Jul 22, 2014 at 11:01 AM, Brenden Cobb 
mailto:brenden.c...@humedica.com>> wrote:
Hi-

I'm stuck on Hive .10 right now and I'm trying to figure out how to accomplish 
the equivalent of a not exists or minus statement:

Select x from t1 where x not in ( select x from t2)

I know this kind of subquery is available in .13 but would like to find a 
workaround.

Appreciate any suggestions.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


Re: select list for dynamic partition insert

2014-07-22 Thread Kristof Vanbecelaere
I see. The last column in u_data is unixtime while I wanted to partition on
rating. I just assumed Hive would use the same-named column as the one
mentioned in the partition spec. Thanks for clarifying this, I missed that
bit in the documentation.


On Tue, Jul 22, 2014 at 10:13 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> From the error msg it looks like there are too many distinct values in
> partition column. Try increasing the count
> hive.exec.max.dynamic.partitions.pernode to a number >100. If you already
> know the number of distinct values in partition column, try a value greater
> than or equal to that number.
> Hive uses the last column of select * query as the partition column.
> Projecting the list of columns in select query is not always mandatory.
>
> Thanks
> Prasanth Jayachandran
>
> On Jul 22, 2014, at 1:07 PM, Kristof Vanbecelaere <
> kristof.vanbecela...@gmail.com> wrote:
>
> While playing with the movielens data set to learn about dynamic
> partitions I ran
>
> from u_data insert overwrite table u_data_p partition (rating) select *
>
> This failed with
>
> [Error 20004]: Fatal error occurred when node tried to create too many
> dynamic partitions. The maximum number of dynamic partitions is controlled
> by hive.exec.max.dynamic.partitions and
> hive.exec.max.dynamic.partitions.pernode. Maximum was set to: 100
>
> Apparently select * cannot be used and one should explicitly list the
> fields in the select list. I guess a beter error message is required here.
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.


Re: select list for dynamic partition insert

2014-07-22 Thread Prasanth Jayachandran
>From the error msg it looks like there are too many distinct values in 
>partition column. Try increasing the count 
>hive.exec.max.dynamic.partitions.pernode to a number >100. If you already know 
>the number of distinct values in partition column, try a value greater than or 
>equal to that number. 
Hive uses the last column of select * query as the partition column. Projecting 
the list of columns in select query is not always mandatory.

Thanks
Prasanth Jayachandran

On Jul 22, 2014, at 1:07 PM, Kristof Vanbecelaere 
 wrote:

> While playing with the movielens data set to learn about dynamic partitions I 
> ran
> 
> from u_data insert overwrite table u_data_p partition (rating) select *
> 
> This failed with
> 
> [Error 20004]: Fatal error occurred when node tried to create too many 
> dynamic partitions. The maximum number of dynamic partitions is controlled by 
> hive.exec.max.dynamic.partitions and 
> hive.exec.max.dynamic.partitions.pernode. Maximum was set to: 100
> 
> Apparently select * cannot be used and one should explicitly list the fields 
> in the select list. I guess a beter error message is required here.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


select list for dynamic partition insert

2014-07-22 Thread Kristof Vanbecelaere
While playing with the movielens data set to learn about dynamic partitions
I ran

from u_data insert overwrite table u_data_p partition (rating) select *


This failed with


[Error 20004]: Fatal error occurred when node tried to create too many
dynamic partitions. The maximum number of dynamic partitions is controlled
by hive.exec.max.dynamic.partitions and
hive.exec.max.dynamic.partitions.pernode. Maximum was set to: 100

Apparently select * cannot be used and one should explicitly list the
fields in the select list. I guess a beter error message is required here.


Re: Hive Stats

2014-07-22 Thread Bala Krishna Gangisetty
What is the value of "COLUMN_STATS_ACCURATE" field in "DESCRIBE FORMATTED
" output under "Table Parameters" section. My suspect is, it's
false even after executing your analyze query. If so, Hive launches the
mapred jobs.

Try using below analyze query:

analyze table mytable partition(server_date=’2013-11-30′) compute
statistics;

--Bala G.


On Tue, Jul 22, 2014 at 10:14 AM, Navdeep Agrawal <
navdeep_agra...@symantec.com> wrote:

> i am trying to compute statistics on ORC File but i am unable see any
> changes in PART_COL_STATS as well on using
>
> set hive.compute.query.using.stats=true;
> set hive.stats.reliable=true;
> set hive.stats.fetch.column.stats=true;
> set hive.stats.fetch.partition.stats=true;
> set hive.cbo.enable=true;
>
> to get max value of a column it is running full Map reduce on column ..
> what i want to use is max value stored in meta store ,but i am unable to
> catch these statistics .
>
> my table desc is
> load_inst_id int
> src_filename string
> server_date date
>
> my analyze query is
> analyze table mytable partition(server_date=’2013-11-30′) compute
> statistics for columns load_inst_id;
>
> i am always getting 0 as loadinstant id ,i have to turn off my
> hive.compute.query.using.stats to get correct result(through map reduce
> max(load_inst_id))
>
>
>


Re: Query difference between 2 tables

2014-07-22 Thread Gunther Hagleitner
select x from t1 left outer join t2 on t1.x = t2.x where t2.x is null

should work.

Thanks,
Gunther.


On Tue, Jul 22, 2014 at 11:01 AM, Brenden Cobb 
wrote:

>  Hi-
>
>  I'm stuck on Hive .10 right now and I'm trying to figure out how to
> accomplish the equivalent of a not exists or minus statement:
>
>  Select x from t1 where x not in ( select x from t2)
>
>  I know this kind of subquery is available in .13 but would like to find
> a workaround.
>
>  Appreciate any suggestions.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


RE: hive 13: dynamic partition inserts

2014-07-22 Thread Gajendran, Vishnu
Hi Prasanth,

 Thanks a lot for your quick response.

From: Gajendran, Vishnu
Sent: Tuesday, July 22, 2014 11:47 AM
To: user@hive.apache.org
Cc: d...@hive.apache.org
Subject: RE: hive 13: dynamic partition inserts

Hi Prasanth,

 Thanks a lot for your quick response.

From: Prasanth Jayachandran [pjayachand...@hortonworks.com]
Sent: Tuesday, July 22, 2014 11:28 AM
To: user@hive.apache.org
Cc: d...@hive.apache.org
Subject: Re: hive 13: dynamic partition inserts

Hi Vishnu

Yes. There is change in the way dynamic partitions are inserted in hive 13. The 
new dynamic partitioning is highly scalable and uses very less memory. Here is 
the related JIRA  https://issues.apache.org/jira/browse/HIVE-6455.

Setting "hive.optimize.sort.dynamic.partition" to false will fallback to old 
way of insertion. If your destination table uses columnar formats like ORC, 
Parquet etc. then it makes sense leave the optimization ON, as columnar formats 
needs some buffer space for each column before flushing to disk. Buffer space 
(runtime memory) will quickly shoot up when there are lots of partition column 
values and columns. HIVE-6455 addresses this issue.

Thanks
Prasanth Jayachandran

On Jul 22, 2014, at 10:51 AM, Gajendran, Vishnu 
mailto:vis...@amazon.com>> wrote:

adding user@hive.apache.org for wider audience

From: Gajendran, Vishnu
Sent: Tuesday, July 22, 2014 10:42 AM
To: d...@hive.apache.org
Subject: hive 13: dynamic partition inserts

Hello,

I am seeing a difference between hive 11 and hive 13 when inserting to a table 
with dynamic partitions.

In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic 
partition insert, I see number of files (generated my each mapper) in the 
specified hdfs location as expected. But, in Hive 13, when I set 
hive.merge.mapfiles=false, I just see one file in specified hdfs location for 
the same query. I think hive is not honoring the hive.merge.mapfiles parameter 
and it merged all the mapper outputs to a single file.

In Hive 11, 19 mappers were executed for the dynamic partition insert task. But 
in Hive 13, 19 mappers and 2 reducers were executed.

When I checked the query plan for hive 11, there is only a map operator task 
for dynamic partition insert. But, in hive 13, I see both map operator and 
reduce operator task.

Is there any changes in hive 13 regarding dymamic partition inserts? Any 
comments on this issue is greatly appreciated.

Thanks,
vishnu


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


RE: hive 13: dynamic partition inserts

2014-07-22 Thread Gajendran, Vishnu
Hi Prasanth,

 Thanks a lot for your quick response.

From: Prasanth Jayachandran [pjayachand...@hortonworks.com]
Sent: Tuesday, July 22, 2014 11:28 AM
To: user@hive.apache.org
Cc: d...@hive.apache.org
Subject: Re: hive 13: dynamic partition inserts

Hi Vishnu

Yes. There is change in the way dynamic partitions are inserted in hive 13. The 
new dynamic partitioning is highly scalable and uses very less memory. Here is 
the related JIRA  https://issues.apache.org/jira/browse/HIVE-6455.

Setting "hive.optimize.sort.dynamic.partition" to false will fallback to old 
way of insertion. If your destination table uses columnar formats like ORC, 
Parquet etc. then it makes sense leave the optimization ON, as columnar formats 
needs some buffer space for each column before flushing to disk. Buffer space 
(runtime memory) will quickly shoot up when there are lots of partition column 
values and columns. HIVE-6455 addresses this issue.

Thanks
Prasanth Jayachandran

On Jul 22, 2014, at 10:51 AM, Gajendran, Vishnu 
mailto:vis...@amazon.com>> wrote:

adding user@hive.apache.org for wider audience

From: Gajendran, Vishnu
Sent: Tuesday, July 22, 2014 10:42 AM
To: d...@hive.apache.org
Subject: hive 13: dynamic partition inserts

Hello,

I am seeing a difference between hive 11 and hive 13 when inserting to a table 
with dynamic partitions.

In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic 
partition insert, I see number of files (generated my each mapper) in the 
specified hdfs location as expected. But, in Hive 13, when I set 
hive.merge.mapfiles=false, I just see one file in specified hdfs location for 
the same query. I think hive is not honoring the hive.merge.mapfiles parameter 
and it merged all the mapper outputs to a single file.

In Hive 11, 19 mappers were executed for the dynamic partition insert task. But 
in Hive 13, 19 mappers and 2 reducers were executed.

When I checked the query plan for hive 11, there is only a map operator task 
for dynamic partition insert. But, in hive 13, I see both map operator and 
reduce operator task.

Is there any changes in hive 13 regarding dymamic partition inserts? Any 
comments on this issue is greatly appreciated.

Thanks,
vishnu


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


Re: hive 13: dynamic partition inserts

2014-07-22 Thread Prasanth Jayachandran
Hi Vishnu

Yes. There is change in the way dynamic partitions are inserted in hive 13. The 
new dynamic partitioning is highly scalable and uses very less memory. Here is 
the related JIRA  https://issues.apache.org/jira/browse/HIVE-6455. 

Setting "hive.optimize.sort.dynamic.partition" to false will fallback to old 
way of insertion. If your destination table uses columnar formats like ORC, 
Parquet etc. then it makes sense leave the optimization ON, as columnar formats 
needs some buffer space for each column before flushing to disk. Buffer space 
(runtime memory) will quickly shoot up when there are lots of partition column 
values and columns. HIVE-6455 addresses this issue.

Thanks
Prasanth Jayachandran

On Jul 22, 2014, at 10:51 AM, Gajendran, Vishnu  wrote:

> adding user@hive.apache.org for wider audience
> From: Gajendran, Vishnu
> Sent: Tuesday, July 22, 2014 10:42 AM
> To: d...@hive.apache.org
> Subject: hive 13: dynamic partition inserts
> 
> Hello,
> 
> I am seeing a difference between hive 11 and hive 13 when inserting to a 
> table with dynamic partitions.
> 
> In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic 
> partition insert, I see number of files (generated my each mapper) in the 
> specified hdfs location as expected. But, in Hive 13, when I set 
> hive.merge.mapfiles=false, I just see one file in specified hdfs location for 
> the same query. I think hive is not honoring the hive.merge.mapfiles 
> parameter and it merged all the mapper outputs to a single file.
> 
> In Hive 11, 19 mappers were executed for the dynamic partition insert task. 
> But in Hive 13, 19 mappers and 2 reducers were executed.
> 
> When I checked the query plan for hive 11, there is only a map operator task 
> for dynamic partition insert. But, in hive 13, I see both map operator and 
> reduce operator task.
> 
> Is there any changes in hive 13 regarding dymamic partition inserts? Any 
> comments on this issue is greatly appreciated.
> 
> Thanks,
> vishnu


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Query difference between 2 tables

2014-07-22 Thread Brenden Cobb
Hi-

I'm stuck on Hive .10 right now and I'm trying to figure out how to accomplish 
the equivalent of a not exists or minus statement:

Select x from t1 where x not in ( select x from t2)

I know this kind of subquery is available in .13 but would like to find a 
workaround.

Appreciate any suggestions.


RE: hive 13: dynamic partition inserts

2014-07-22 Thread Gajendran, Vishnu
adding user@hive.apache.org for wider audience

From: Gajendran, Vishnu
Sent: Tuesday, July 22, 2014 10:42 AM
To: d...@hive.apache.org
Subject: hive 13: dynamic partition inserts

Hello,

I am seeing a difference between hive 11 and hive 13 when inserting to a table 
with dynamic partitions.

In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic 
partition insert, I see number of files (generated my each mapper) in the 
specified hdfs location as expected. But, in Hive 13, when I set 
hive.merge.mapfiles=false, I just see one file in specified hdfs location for 
the same query. I think hive is not honoring the hive.merge.mapfiles parameter 
and it merged all the mapper outputs to a single file.

In Hive 11, 19 mappers were executed for the dynamic partition insert task. But 
in Hive 13, 19 mappers and 2 reducers were executed.

When I checked the query plan for hive 11, there is only a map operator task 
for dynamic partition insert. But, in hive 13, I see both map operator and 
reduce operator task.

Is there any changes in hive 13 regarding dymamic partition inserts? Any 
comments on this issue is greatly appreciated.

Thanks,
vishnu


Re: Hive Statistics

2014-07-22 Thread Nitin Pawar
by default hive stores the statistics in derby database.

If you want a persistent look at column statistics, you may want to create
mysql based database for column statistics.

Your queries look fine


On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal <
navdeep_agra...@symantec.com> wrote:

> Hi ,
>
>
>
> i am trying to compute statistics on ORC File but i am unable see any
> changes in PART_COL_STATS as well on using
>
> set hive.compute.query.using.stats=true;
> set hive.stats.reliable=true;
> set hive.stats.fetch.column.stats=true;
> set hive.stats.fetch.partition.stats=true;
> set hive.cbo.enable=true;
>
> to get max value of a column it is running full Map reduce on column ..
> what i want to use is max value stored in meta store ,but i am unable to
> catch these statistics .
>
> my table desc is
> load_inst_id int
> src_filename string
> server_date date
>
> my analyze query is
> analyze table mytable partition(server_date=’2013-11-30′) compute
> statistics for columns load_inst_id;
>
> i am always getting 0 as loadinstant id ,i have to turn off my
> hive.compute.query.using.stats to get correct result(through map reduce
> max(load_inst_id))
>
>
>



-- 
Nitin Pawar


dynamic partition inserts in hive 13

2014-07-22 Thread Gajendran, Vishnu
Hello,

I am seeing a difference between hive 11 and hive 13 when inserting to a table 
with dynamic partition.

In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic 
partition insert, I see number of files (generated my each mapper) in the 
specified hdfs location as expected. But, in Hive 13, when I set 
hive.merge.mapfiles=false, I just see one file in specified hdfs location for 
the same query. I think hive is not honoring the hive.merge.mapfiles parameter 
and it merged all the mapper outputs to a single file.

In Hive 11, 19 mappers were executed for the dynamic partition insert task. But 
in Hive 13, 19 mappers and 2 reducers were executed.

When I checked the query plan for hive 11, there is only a map operator task 
for dynamic partition insert. But, in hive 13, I see both map operator and 
reduce operator task.

Is there any changes in hive 13 regarding dymamic partition inserts? Any 
comments on this issue is greatly appreciated.

Thanks,
vishnu


Hive Statistics

2014-07-22 Thread Navdeep Agrawal
Hi ,



i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch 
these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics 
for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my 
hive.compute.query.using.stats to get correct result(through map reduce 
max(load_inst_id))



Hive Stats

2014-07-22 Thread Navdeep Agrawal
i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using


set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch 
these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics 
for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my 
hive.compute.query.using.stats to get correct result(through map reduce 
max(load_inst_id))



Re: Drop Partition by ID

2014-07-22 Thread Devopam Mittra
Please try using escape character around the '%' if not already done so.

regards
Dev


On Mon, Jul 21, 2014 at 7:32 PM, fab wol  wrote:

> Hi everyone,
>
> I have the following problem: I have a partitoned managed table (Partition
> table is a string which represents a date, eg. log-date="2014-07-15").
> Unfortunately there is one partition in there like this:
> log_date=2014-07-15-23%3A45%3A38 (copied from show partitions stmt). This
> partitions most likeley got created to a wrong script 8which is fixed).
>
> Now i want to delete this partition, but it doesn't work:
>
>- alter table ... drop partitition
>(log_date='2014-07-15-23%3A45%3A38') gives no error, but the partition is
>still existing afterwards
>- I tried escaping the %-signs with backslashes but no luck with that
>- I delete the directory in the HDFS and run msck repair table
>afterwards. It recognizes that the folder is missing but is not deleting
>the metadata
>
> So what can I do to get rid of the metadata? My next guess would be to go
> directly to the metastore DB and delete the metadata there. But what
> exactly has to be deleted? I guess there are several dependencies.
>
> Other idea: is there a possibility in Hive to delete a partition by a
> unique ID or something like that?
>
> Or what is needed to delete the table with the normal "alter table drop
> partition" command?
>
> Cheers
> Wolli
>



-- 
Devopam Mittra
Life and Relations are not binary


Querying arrays of structs with regular expressions or like/rlike functions

2014-07-22 Thread Roberto Coluccio
Hello folks,

I am performing some tests with Hive 0.12.0 (cdh5.0.3). I have a quite
complex data model, in particular I modeled a filed in my table as an array
of structs, like:

people array<
  struct<
   name:string,
   surname:string,
   address:string,
   role:string,
   dateofbirth: int,
   id: string>
  >

I am able to query such field by using the general default UDF function
"array_contains(Array , value)", by hitting something like: select *
from table1 where array_contains(people.name, "Roberto");

What I have experienced is that such function performs a 1:1
match/comparison, and this is fine for some problems. But, (how) can I use
a regular expression applied the sub-field name of my struct (inside my
array people) in order to retrieve e.g. all the people whose name starts
with "Ro"? I know that Hive gives us the "like" and "rlike" functions, but
how can I apply them to a field inside a struct that is one of the elements
of an array?

Please, do not just tell me to change my data model: I'm already
considering this, but the problem is that my table is way more complex (it
is made of several more fields, arrays and arrays of structs).

Thank you.
Roberto


Get 'TBLPROPERTIES' of a Hive table in Java

2014-07-22 Thread Something Something
I am writing a custom InputFormat class in which I need to get the
'TBLPROPETIES' of a Hive Table. This InputFormat will extend the
HiveInputFormat.

I thought the table properties will be available to me in the 'JobConf',
but they are not there.

What's the best way to get TBLPROPERTIES of a Hive table in a Java class
such as InputFormat?

Would greatly appreciate any help.  Thanks.