query using stats

2014-10-13 Thread Navdeep Agrawal
Hi ,
I am trying to run query using stats with following flags as set,but it is 
always running map reduce job instead of getting direct result from metastore 
(Hive 0.13.0.2.1.2.1-471).can some please suggest me to run optimized query or 
a workaround it .
Thanks in advance

Set hive.stats.reliable=true
Set hive.fetch.column.stats=true
Set hive.stats.fetch.partition.stats=true
Set hive.compute.query.using.stats=true
query
explain select max(load_inst_id) from ndnslog_orc1 where query_date > 
'2012-01-01';//I have also tried explain select max(load_inst_id) from 
ndnslog_orc1 where query_date  = '2013-01-01' but same result as I have only 
one partition .
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: ndnslog_orc1
filterExpr: (query_date > '2012-01-01') (type: boolean)
Statistics: Num rows: 331575 Data size: 2652601 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: load_inst_id (type: bigint)
  outputColumnNames: load_inst_id
  Statistics: Num rows: 331575 Data size: 2652601 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
aggregations: max(load_inst_id)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  sort order:
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  value expressions: _col0 (type: bigint)
  Reduce Operator Tree:
Group By Operator
  aggregations: max(VALUE._col0)
  mode: mergepartial
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: _col0 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
  table:
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1


where I had already run analyze statement as I can see statistics in hive meta 
store.

My insert statement is

set hive.stats.autogather=false;
set hive.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.compute.query.using.stats=true;
set hive.cbo.enable=true;
--set mapred.reduce.tasks=20;
--set hive.exec.reducers.max=100;
--set mapreduce.job.reduce.slowstart.completedmaps=0.95;
--set mapreduce.task.io.sort.mb=1600;
set mapred.job.queue.name=queue1;

use ${database};
--use ssa;

insert into table ${managed_table} partition (query_date)
select
${hivevar:LOADID} as load_inst_id,
src_filename,
from_unixtime(query_time) as query_time,
client_ip,
query_type,
query_domain,
response_domain,
ttl,
response_type,
response_info,
to_date(from_unixtime(query_time)) as query_date
from  ${external_table} ;

/Navdeep


RE: HIveserver jdbc set configurations

2014-10-12 Thread Navdeep Agrawal
Thank you for reply .but I tried below solution with mapred.job.queuename which 
is was unable to set ,but I set that in connection string after ‘? ‘,which was 
working for me .

From: Prasad Mujumdar [mailto:pras...@cloudera.com]
Sent: Friday, October 10, 2014 10:28 PM
To: user@hive.apache.org
Subject: Re: HIveserver jdbc set configurations


  You can execute a 'set property=value' statements using Statement.execute(). 
Such change will only be visible in the current session.

thanks
Prasad


On Fri, Oct 10, 2014 at 4:11 AM, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:
Hi,
I am connecting to hive server2 through jdbc and want to set some properties 
like mapreduce.queuename;hive.compute.query.using.stats; how can I achieve that 
through java .


Thanks,
Navdeep



RE: mapreduce.job.queuename doesn't work with hiveserver2

2014-10-10 Thread Navdeep Agrawal
You can change the mapreduce.job.queuename by appending it to connection string 
after ‘?’ like jdbc:hive2//url:1/db?mapreduce.job.queuename=q1

From: Chen Song [mailto:chen.song...@gmail.com]
Sent: Friday, October 10, 2014 2:38 AM
To: user
Subject: mapreduce.job.queuename doesn't work with hiveserver2

By setting mapreduce.job.queuename in hive-site.xml, all queries submitted via 
hiveserver or hive cli all go to the specified queue.

However, hiveserver2 doesn't appear to respect this value at all. The queue is 
chosen to be the user who submitted query. I tried with beeline and jdbc.

Is there a way I can configure a specific queue for queries going through 
hiveserver2?

--
Chen Song



HIveserver jdbc set configurations

2014-10-10 Thread Navdeep Agrawal
Hi,
I am connecting to hive server2 through jdbc and want to set some properties 
like mapreduce.queuename;hive.compute.query.using.stats; how can I achieve that 
through java .


Thanks,
Navdeep


RE: Reg:Column Statistics with Parquet

2014-07-25 Thread Navdeep Agrawal
Well not the correct way ,you can check the statistics in mysql part_col_stats 
like tables in mysql data base if you are using mysql stat database .
Or the other way is calling max,min,distinct on int columns ,largest length on 
string columns etc,if they run whole map reduce on these operation then 
statistics are not getting created .

From: Suma Shivaprasad [mailto:sumasai.shivapra...@gmail.com]
Sent: Friday, July 25, 2014 12:43 PM
To: user@hive.apache.org
Subject: Re: Reg:Column Statistics with Parquet

Hi ,

I tried the same with compute statistics for columns a, b,c as above and still 
seeing the same results in explain plan.

How do I confirm if its generating all the column stats for a given column. If 
this is confirmed, we can debug why Hive is still not using it?

Thanks
Suma

On Thu, Jul 24, 2014 at 11:49 PM, Prasanth Jayachandran 
mailto:pjayachand...@hortonworks.com>> wrote:
You have to explicit specifics column list in analyze command for gathering 
columns stats.

This command will only collect basic stats like number of rows, total file 
size, raw data size, number of files.
analyze table user_table partition(dt='2014-06-01',hour='00') compute 
statistics;

To collect column statistics add the column list like below
analyze table user_table partition(dt='2014-06-01',hour='00') compute 
statistics for columns a, b, c;

Thanks
Prasanth Jayachandran

On Jul 24, 2014, at 5:13 AM, Sandeep Samudrala 
mailto:sandeep.samudr...@inmobi.com>> wrote:


I am trying to enable Column statistics usage with Parquet tables. This is the 
query I am executing. However on explain, I see that even though Basic stats: 
COMPLETE is seen Column stats is seen asNONE.
Can someone please explain what else I need to debug/fix this.

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

analyze table user_table partition(dt='2014-06-01',hour='00') compute 
statistics;

explain select min(a), max(b), min(c) from user_table;

hive> explain select min(a), max(b), min(c) from usertable;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: user_table
Statistics: Num rows: 55490383 Data size: 1831182639 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: a (type: double), b (type: double), c (type: int)
  outputColumnNames: a, b, c
  Statistics: Num rows: 55490383 Data size: 1831182639 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
aggregations: min(a), max(b), min(c)
mode: hash
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  sort order:
  Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE 
Column stats: NONE
  value expressions: _col0 (type: double), _col1 (type: 
double), _col2 (type: int)
  Reduce Operator Tree:
Group By Operator
  aggregations: min(VALUE._col0), max(VALUE._col1), min(VALUE._col2)
  mode: mergepartial
  outputColumnNames: _col0, _col1, _col2
  Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: _col0 (type: double), _col1 (type: double), _col2 
(type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1


Thanks,
-sandeep

_
The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you have received 
this communication in error, please notify us immediately by responding to this 
email and then delete it from your system. The firm is neither liable for the 
proper and complete transmiss

RE: [HELP]Hive Statistics

2014-07-24 Thread Navdeep Agrawal
Well the problem exactly didn’t get solved but I observed this kind of behavior 
is persistent when I partition my table by date type otherwise its working . 
may its worth a issue .

Thank you

From: Navdeep Agrawal [mailto:navdeep_agra...@symantec.com]
Sent: Thursday, July 24, 2014 1:22 PM
To: user@hive.apache.org
Subject: [HELP]Hive Statistics


Stuck .need help
I created a small table with multiple partition desc (id int ,term int) 
partitioned by id ,whenever I run analyze on any id I am getting perfectly good 
answers . I am unable to figure out the difference each file is making .

New table
Table Parameters:
transient_lastDdlTime   1406016417

# Storage Information
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets:-1
Bucket Columns: []
Sort Columns:   []
Storage Desc Params:
serialization.format1
inserting
insert into table statdevp partition(id) select id,term from statdev where id 
is not null and term is not null
analyze
analyze table statdevp partition(id=11) compute statistics for columns id;

I am able to see all values in part_col_stat for the partitions I am running 
analyze

and the orginal table :
desc

Table Parameters:
last_modified_byX
last_modified_time  1406047797
transient_lastDdlTime   1406047797

# Storage Information
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets:-1
Bucket Columns: []
Sort Columns:   []
Storage Desc Params:
serialization.format1

load_inst_idint 
 src_filenamestring 
  server_date date  

# Partition Information 
 # col_name  data_type  
 comment
   server_date  
   date

insertion
insert into devStat partition(server_date) select 
load_inst_id,src_filename,server_ts,server_date from rrslog_ext where 
server_date='2013-11-01' and load_inst_id is not null  limit 100
analyze
analyze table devstat partition(server_date='2013-11-30') compute statistics 
for columns load_inst_id;

any help will be highly appreciated .stuck here long time …
thanks in advance .

From: Navdeep Agrawal [mailto:navdeep_agra...@symantec.com]
Sent: Wednesday, July 23, 2014 3:17 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Hive Statistics

I think that is already configured since I am able to get statistics in other 
tables .
Because if I have not set these parameters why every time a new row is getting 
created  in mysql db in part_col_stats table .

From: Andre Araujo [mailto:ara...@pythian.com]
Sent: Wednesday, July 23, 2014 1:22 PM
To: user
Subject: Re: Hive Statistics

Hi, Navdeep,

Please note that the configuration for the stats database is separate from the 
configuration for the metastore db.
Can you confirm you have both to use a mysql db?

The properties for the stats db are:
hive.stats.dbclass=
hive.stats.dbconnectionstring=


On 23 July 2014 16:07, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:
Thank you Nitin for reply. I am using mysql database ,and also I can see new 
row created for the partition ,but all values are zero . I think explicitly 
giving mysql data base wont make a difference .

From: Nitin Pawar 
[mailto:nitinpawar...@gmail.com<mailto:nitinpawar...@gmail.com>]
Sent: Tuesday, July 22, 2014 11:05 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Hive Statistics

by default hive stores the statistics in derby database.

If you want a persistent look at column statistics, you may want to create 
mysql based database for column statistics.

Your queries look fine

On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:

Hi ,



i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column

[HELP]Hive Statistics

2014-07-24 Thread Navdeep Agrawal

Stuck .need help
I created a small table with multiple partition desc (id int ,term int) 
partitioned by id ,whenever I run analyze on any id I am getting perfectly good 
answers . I am unable to figure out the difference each file is making .

New table
Table Parameters:
transient_lastDdlTime   1406016417

# Storage Information
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets:-1
Bucket Columns: []
Sort Columns:   []
Storage Desc Params:
serialization.format1
inserting
insert into table statdevp partition(id) select id,term from statdev where id 
is not null and term is not null
analyze
analyze table statdevp partition(id=11) compute statistics for columns id;

I am able to see all values in part_col_stat for the partitions I am running 
analyze

and the orginal table :
desc

Table Parameters:
last_modified_byX
last_modified_time  1406047797
transient_lastDdlTime   1406047797

# Storage Information
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets:-1
Bucket Columns: []
Sort Columns:   []
Storage Desc Params:
serialization.format1

load_inst_idint 
 src_filenamestring 
  server_date date  

# Partition Information 
 # col_name  data_type  
 comment
   server_date  
   date

insertion
insert into devStat partition(server_date) select 
load_inst_id,src_filename,server_ts,server_date from rrslog_ext where 
server_date='2013-11-01' and load_inst_id is not null  limit 100
analyze
analyze table devstat partition(server_date='2013-11-30') compute statistics 
for columns load_inst_id;

any help will be highly appreciated .stuck here long time …
thanks in advance .

From: Navdeep Agrawal [mailto:navdeep_agra...@symantec.com]
Sent: Wednesday, July 23, 2014 3:17 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Hive Statistics

I think that is already configured since I am able to get statistics in other 
tables .
Because if I have not set these parameters why every time a new row is getting 
created  in mysql db in part_col_stats table .

From: Andre Araujo [mailto:ara...@pythian.com]
Sent: Wednesday, July 23, 2014 1:22 PM
To: user
Subject: Re: Hive Statistics

Hi, Navdeep,

Please note that the configuration for the stats database is separate from the 
configuration for the metastore db.
Can you confirm you have both to use a mysql db?

The properties for the stats db are:
hive.stats.dbclass=
hive.stats.dbconnectionstring=


On 23 July 2014 16:07, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:
Thank you Nitin for reply. I am using mysql database ,and also I can see new 
row created for the partition ,but all values are zero . I think explicitly 
giving mysql data base wont make a difference .

From: Nitin Pawar 
[mailto:nitinpawar...@gmail.com<mailto:nitinpawar...@gmail.com>]
Sent: Tuesday, July 22, 2014 11:05 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Hive Statistics

by default hive stores the statistics in derby database.

If you want a persistent look at column statistics, you may want to create 
mysql based database for column statistics.

Your queries look fine

On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:

Hi ,



i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch 
these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics 
for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to 

RE: Hive Statistics

2014-07-23 Thread Navdeep Agrawal
Stuck .need help
I created a small table with multiple partition desc (id int ,term int) 
partitioned by id ,whenever I run analyze on any id I am getting perfectly good 
answers . I am unable to figure out the difference each file is making .

New table
Table Parameters:
transient_lastDdlTime   1406016417

# Storage Information
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets:-1
Bucket Columns: []
Sort Columns:   []
Storage Desc Params:
serialization.format1
inserting
insert into table statdevp partition(id) select id,term from statdev where id 
is not null and term is not null
analyze
analyze table statdevp partition(id=11) compute statistics for columns id;

I am able to see all values in part_col_stat for the partitions I am running 
analyze

and the orginal table :
desc

Table Parameters:
last_modified_byX
last_modified_time  1406047797
transient_lastDdlTime   1406047797

# Storage Information
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets:-1
Bucket Columns: []
Sort Columns:   []
Storage Desc Params:
serialization.format1

load_inst_idint 
 src_filenamestring 
  server_date date  

# Partition Information 
 # col_name  data_type  
 comment
   server_date  
   date

insertion
insert into devStat partition(server_date) select 
load_inst_id,src_filename,server_ts,server_date from rrslog_ext where 
server_date='2013-11-01' and load_inst_id is not null  limit 100
analyze
analyze table devstat partition(server_date='2013-11-30') compute statistics 
for columns load_inst_id;

any help will be highly appreciated .stuck here long time …
thanks in advance .

From: Navdeep Agrawal [mailto:navdeep_agra...@symantec.com]
Sent: Wednesday, July 23, 2014 3:17 PM
To: user@hive.apache.org
Subject: RE: Hive Statistics

No I have not set these to mysql db . when I set them to the one I am using for 
hive I am getting stat publisher not getting initialized .but if I have not set 
these parameters why every time a new row is getting created  in mysql db in 
part_col_stats table .

From: Andre Araujo [mailto:ara...@pythian.com]
Sent: Wednesday, July 23, 2014 1:22 PM
To: user
Subject: Re: Hive Statistics

Hi, Navdeep,

Please note that the configuration for the stats database is separate from the 
configuration for the metastore db.
Can you confirm you have both to use a mysql db?

The properties for the stats db are:
hive.stats.dbclass=
hive.stats.dbconnectionstring=


On 23 July 2014 16:07, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:
Thank you Nitin for reply. I am using mysql database ,and also I can see new 
row created for the partition ,but all values are zero . I think explicitly 
giving mysql data base wont make a difference .

From: Nitin Pawar 
[mailto:nitinpawar...@gmail.com<mailto:nitinpawar...@gmail.com>]
Sent: Tuesday, July 22, 2014 11:05 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Hive Statistics

by default hive stores the statistics in derby database.

If you want a persistent look at column statistics, you may want to create 
mysql based database for column statistics.

Your queries look fine

On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:

Hi ,



i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch 
these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics 
for columns load_inst_id;

i am always gettin

RE: Hive Statistics

2014-07-23 Thread Navdeep Agrawal
No I have not set these to mysql db . when I set them to the one I am using for 
hive I am getting stat publisher not getting initialized .but if I have not set 
these parameters why every time a new row is getting created  in mysql db in 
part_col_stats table .

From: Andre Araujo [mailto:ara...@pythian.com]
Sent: Wednesday, July 23, 2014 1:22 PM
To: user
Subject: Re: Hive Statistics

Hi, Navdeep,

Please note that the configuration for the stats database is separate from the 
configuration for the metastore db.
Can you confirm you have both to use a mysql db?

The properties for the stats db are:
hive.stats.dbclass=
hive.stats.dbconnectionstring=


On 23 July 2014 16:07, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:
Thank you Nitin for reply. I am using mysql database ,and also I can see new 
row created for the partition ,but all values are zero . I think explicitly 
giving mysql data base wont make a difference .

From: Nitin Pawar 
[mailto:nitinpawar...@gmail.com<mailto:nitinpawar...@gmail.com>]
Sent: Tuesday, July 22, 2014 11:05 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Hive Statistics

by default hive stores the statistics in derby database.

If you want a persistent look at column statistics, you may want to create 
mysql based database for column statistics.

Your queries look fine

On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:

Hi ,



i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch 
these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics 
for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my 
hive.compute.query.using.stats to get correct result(through map reduce 
max(load_inst_id))




--
Nitin Pawar



--
André Araújo
Big Data Consultant/Solutions Architect
The Pythian Group - Australia - www.pythian.com<http://www.pythian.com>
Office (calls from within Australia): 1300 366 021 x1270
Office (international): +61 2 8016 7000  x270 OR +1 613 565 8696   x1270
Mobile: +61 410 323 559
Fax: +61 2 9805 0544
IM: pythianaraujo @ AIM/MSN/Y! or ara...@pythian.com<mailto:ara...@pythian.com> 
@ GTalk

“Success is not about standing at the top, it's the steps you leave behind.” — 
Iker Pou (rock climber)


--




RE: Hive Stats

2014-07-23 Thread Navdeep Agrawal
i digged into partitions params and checked out column_stats_accurate in 
metastore but found it to be true .

From: Bala Krishna Gangisetty [mailto:b...@altiscale.com]
Sent: Wednesday, July 23, 2014 12:46 AM
To: user@hive.apache.org
Subject: Re: Hive Stats

What is the value of "COLUMN_STATS_ACCURATE" field in "DESCRIBE FORMATTED 
" output under "Table Parameters" section. My suspect is, it's 
false even after executing your analyze query. If so, Hive launches the mapred 
jobs.

Try using below analyze query:

analyze table mytable partition(server_date=’2013-11-30′) compute statistics;

--Bala G.

On Tue, Jul 22, 2014 at 10:14 AM, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:

i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch 
these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics 
for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my 
hive.compute.query.using.stats to get correct result(through map reduce 
max(load_inst_id))




RE: Hive Stats

2014-07-23 Thread Navdeep Agrawal
Hi ,
I am unable to see COLUMN_STATS_ACCURATE in desc formatted/extended ,but I 
think u r pointing to stats.reliable I have made that to false  .

From: Bala Krishna Gangisetty [mailto:b...@altiscale.com]
Sent: Wednesday, July 23, 2014 12:46 AM
To: user@hive.apache.org
Subject: Re: Hive Stats

What is the value of "COLUMN_STATS_ACCURATE" field in "DESCRIBE FORMATTED 
" output under "Table Parameters" section. My suspect is, it's 
false even after executing your analyze query. If so, Hive launches the mapred 
jobs.

Try using below analyze query:

analyze table mytable partition(server_date=’2013-11-30′) compute statistics;

--Bala G.

On Tue, Jul 22, 2014 at 10:14 AM, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:

i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch 
these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics 
for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my 
hive.compute.query.using.stats to get correct result(through map reduce 
max(load_inst_id))




RE: Hive Statistics

2014-07-22 Thread Navdeep Agrawal
Thank you Nitin for reply. I am using mysql database ,and also I can see new 
row created for the partition ,but all values are zero . I think explicitly 
giving mysql data base wont make a difference .

From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Tuesday, July 22, 2014 11:05 PM
To: user@hive.apache.org
Subject: Re: Hive Statistics

by default hive stores the statistics in derby database.

If you want a persistent look at column statistics, you may want to create 
mysql based database for column statistics.

Your queries look fine

On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal 
mailto:navdeep_agra...@symantec.com>> wrote:

Hi ,



i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch 
these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics 
for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my 
hive.compute.query.using.stats to get correct result(through map reduce 
max(load_inst_id))




--
Nitin Pawar


Hive Statistics

2014-07-22 Thread Navdeep Agrawal
Hi ,



i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch 
these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics 
for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my 
hive.compute.query.using.stats to get correct result(through map reduce 
max(load_inst_id))



Hive Stats

2014-07-22 Thread Navdeep Agrawal
i am trying to compute statistics on ORC File but i am unable see any changes 
in PART_COL_STATS as well on using


set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch 
these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics 
for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my 
hive.compute.query.using.stats to get correct result(through map reduce 
max(load_inst_id))