query using stats
Hi , I am trying to run query using stats with following flags as set,but it is always running map reduce job instead of getting direct result from metastore (Hive 0.13.0.2.1.2.1-471).can some please suggest me to run optimized query or a workaround it . Thanks in advance Set hive.stats.reliable=true Set hive.fetch.column.stats=true Set hive.stats.fetch.partition.stats=true Set hive.compute.query.using.stats=true query explain select max(load_inst_id) from ndnslog_orc1 where query_date > '2012-01-01';//I have also tried explain select max(load_inst_id) from ndnslog_orc1 where query_date = '2013-01-01' but same result as I have only one partition . OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: ndnslog_orc1 filterExpr: (query_date > '2012-01-01') (type: boolean) Statistics: Num rows: 331575 Data size: 2652601 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: load_inst_id (type: bigint) outputColumnNames: load_inst_id Statistics: Num rows: 331575 Data size: 2652601 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: max(load_inst_id) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Reduce Operator Tree: Group By Operator aggregations: max(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 where I had already run analyze statement as I can see statistics in hive meta store. My insert statement is set hive.stats.autogather=false; set hive.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.compute.query.using.stats=true; set hive.cbo.enable=true; --set mapred.reduce.tasks=20; --set hive.exec.reducers.max=100; --set mapreduce.job.reduce.slowstart.completedmaps=0.95; --set mapreduce.task.io.sort.mb=1600; set mapred.job.queue.name=queue1; use ${database}; --use ssa; insert into table ${managed_table} partition (query_date) select ${hivevar:LOADID} as load_inst_id, src_filename, from_unixtime(query_time) as query_time, client_ip, query_type, query_domain, response_domain, ttl, response_type, response_info, to_date(from_unixtime(query_time)) as query_date from ${external_table} ; /Navdeep
RE: HIveserver jdbc set configurations
Thank you for reply .but I tried below solution with mapred.job.queuename which is was unable to set ,but I set that in connection string after ‘? ‘,which was working for me . From: Prasad Mujumdar [mailto:pras...@cloudera.com] Sent: Friday, October 10, 2014 10:28 PM To: user@hive.apache.org Subject: Re: HIveserver jdbc set configurations You can execute a 'set property=value' statements using Statement.execute(). Such change will only be visible in the current session. thanks Prasad On Fri, Oct 10, 2014 at 4:11 AM, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: Hi, I am connecting to hive server2 through jdbc and want to set some properties like mapreduce.queuename;hive.compute.query.using.stats; how can I achieve that through java . Thanks, Navdeep
RE: mapreduce.job.queuename doesn't work with hiveserver2
You can change the mapreduce.job.queuename by appending it to connection string after ‘?’ like jdbc:hive2//url:1/db?mapreduce.job.queuename=q1 From: Chen Song [mailto:chen.song...@gmail.com] Sent: Friday, October 10, 2014 2:38 AM To: user Subject: mapreduce.job.queuename doesn't work with hiveserver2 By setting mapreduce.job.queuename in hive-site.xml, all queries submitted via hiveserver or hive cli all go to the specified queue. However, hiveserver2 doesn't appear to respect this value at all. The queue is chosen to be the user who submitted query. I tried with beeline and jdbc. Is there a way I can configure a specific queue for queries going through hiveserver2? -- Chen Song
HIveserver jdbc set configurations
Hi, I am connecting to hive server2 through jdbc and want to set some properties like mapreduce.queuename;hive.compute.query.using.stats; how can I achieve that through java . Thanks, Navdeep
RE: Reg:Column Statistics with Parquet
Well not the correct way ,you can check the statistics in mysql part_col_stats like tables in mysql data base if you are using mysql stat database . Or the other way is calling max,min,distinct on int columns ,largest length on string columns etc,if they run whole map reduce on these operation then statistics are not getting created . From: Suma Shivaprasad [mailto:sumasai.shivapra...@gmail.com] Sent: Friday, July 25, 2014 12:43 PM To: user@hive.apache.org Subject: Re: Reg:Column Statistics with Parquet Hi , I tried the same with compute statistics for columns a, b,c as above and still seeing the same results in explain plan. How do I confirm if its generating all the column stats for a given column. If this is confirmed, we can debug why Hive is still not using it? Thanks Suma On Thu, Jul 24, 2014 at 11:49 PM, Prasanth Jayachandran mailto:pjayachand...@hortonworks.com>> wrote: You have to explicit specifics column list in analyze command for gathering columns stats. This command will only collect basic stats like number of rows, total file size, raw data size, number of files. analyze table user_table partition(dt='2014-06-01',hour='00') compute statistics; To collect column statistics add the column list like below analyze table user_table partition(dt='2014-06-01',hour='00') compute statistics for columns a, b, c; Thanks Prasanth Jayachandran On Jul 24, 2014, at 5:13 AM, Sandeep Samudrala mailto:sandeep.samudr...@inmobi.com>> wrote: I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though Basic stats: COMPLETE is seen Column stats is seen asNONE. Can someone please explain what else I need to debug/fix this. set hive.compute.query.using.stats=true; set hive.stats.reliable=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.cbo.enable=true; analyze table user_table partition(dt='2014-06-01',hour='00') compute statistics; explain select min(a), max(b), min(c) from user_table; hive> explain select min(a), max(b), min(c) from usertable; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: user_table Statistics: Num rows: 55490383 Data size: 1831182639 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: a (type: double), b (type: double), c (type: int) outputColumnNames: a, b, c Statistics: Num rows: 55490383 Data size: 1831182639 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: min(a), max(b), min(c) mode: hash outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: double), _col1 (type: double), _col2 (type: int) Reduce Operator Tree: Group By Operator aggregations: min(VALUE._col0), max(VALUE._col1), min(VALUE._col2) mode: mergepartial outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: double), _col1 (type: double), _col2 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Thanks, -sandeep _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmiss
RE: [HELP]Hive Statistics
Well the problem exactly didn’t get solved but I observed this kind of behavior is persistent when I partition my table by date type otherwise its working . may its worth a issue . Thank you From: Navdeep Agrawal [mailto:navdeep_agra...@symantec.com] Sent: Thursday, July 24, 2014 1:22 PM To: user@hive.apache.org Subject: [HELP]Hive Statistics Stuck .need help I created a small table with multiple partition desc (id int ,term int) partitioned by id ,whenever I run analyze on any id I am getting perfectly good answers . I am unable to figure out the difference each file is making . New table Table Parameters: transient_lastDdlTime 1406016417 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 inserting insert into table statdevp partition(id) select id,term from statdev where id is not null and term is not null analyze analyze table statdevp partition(id=11) compute statistics for columns id; I am able to see all values in part_col_stat for the partitions I am running analyze and the orginal table : desc Table Parameters: last_modified_byX last_modified_time 1406047797 transient_lastDdlTime 1406047797 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 load_inst_idint src_filenamestring server_date date # Partition Information # col_name data_type comment server_date date insertion insert into devStat partition(server_date) select load_inst_id,src_filename,server_ts,server_date from rrslog_ext where server_date='2013-11-01' and load_inst_id is not null limit 100 analyze analyze table devstat partition(server_date='2013-11-30') compute statistics for columns load_inst_id; any help will be highly appreciated .stuck here long time … thanks in advance . From: Navdeep Agrawal [mailto:navdeep_agra...@symantec.com] Sent: Wednesday, July 23, 2014 3:17 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: RE: Hive Statistics I think that is already configured since I am able to get statistics in other tables . Because if I have not set these parameters why every time a new row is getting created in mysql db in part_col_stats table . From: Andre Araujo [mailto:ara...@pythian.com] Sent: Wednesday, July 23, 2014 1:22 PM To: user Subject: Re: Hive Statistics Hi, Navdeep, Please note that the configuration for the stats database is separate from the configuration for the metastore db. Can you confirm you have both to use a mysql db? The properties for the stats db are: hive.stats.dbclass= hive.stats.dbconnectionstring= On 23 July 2014 16:07, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: Thank you Nitin for reply. I am using mysql database ,and also I can see new row created for the partition ,but all values are zero . I think explicitly giving mysql data base wont make a difference . From: Nitin Pawar [mailto:nitinpawar...@gmail.com<mailto:nitinpawar...@gmail.com>] Sent: Tuesday, July 22, 2014 11:05 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Hive Statistics by default hive stores the statistics in derby database. If you want a persistent look at column statistics, you may want to create mysql based database for column statistics. Your queries look fine On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: Hi , i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using set hive.compute.query.using.stats=true; set hive.stats.reliable=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.cbo.enable=true; to get max value of a column it is running full Map reduce on column
[HELP]Hive Statistics
Stuck .need help I created a small table with multiple partition desc (id int ,term int) partitioned by id ,whenever I run analyze on any id I am getting perfectly good answers . I am unable to figure out the difference each file is making . New table Table Parameters: transient_lastDdlTime 1406016417 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 inserting insert into table statdevp partition(id) select id,term from statdev where id is not null and term is not null analyze analyze table statdevp partition(id=11) compute statistics for columns id; I am able to see all values in part_col_stat for the partitions I am running analyze and the orginal table : desc Table Parameters: last_modified_byX last_modified_time 1406047797 transient_lastDdlTime 1406047797 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 load_inst_idint src_filenamestring server_date date # Partition Information # col_name data_type comment server_date date insertion insert into devStat partition(server_date) select load_inst_id,src_filename,server_ts,server_date from rrslog_ext where server_date='2013-11-01' and load_inst_id is not null limit 100 analyze analyze table devstat partition(server_date='2013-11-30') compute statistics for columns load_inst_id; any help will be highly appreciated .stuck here long time … thanks in advance . From: Navdeep Agrawal [mailto:navdeep_agra...@symantec.com] Sent: Wednesday, July 23, 2014 3:17 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: RE: Hive Statistics I think that is already configured since I am able to get statistics in other tables . Because if I have not set these parameters why every time a new row is getting created in mysql db in part_col_stats table . From: Andre Araujo [mailto:ara...@pythian.com] Sent: Wednesday, July 23, 2014 1:22 PM To: user Subject: Re: Hive Statistics Hi, Navdeep, Please note that the configuration for the stats database is separate from the configuration for the metastore db. Can you confirm you have both to use a mysql db? The properties for the stats db are: hive.stats.dbclass= hive.stats.dbconnectionstring= On 23 July 2014 16:07, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: Thank you Nitin for reply. I am using mysql database ,and also I can see new row created for the partition ,but all values are zero . I think explicitly giving mysql data base wont make a difference . From: Nitin Pawar [mailto:nitinpawar...@gmail.com<mailto:nitinpawar...@gmail.com>] Sent: Tuesday, July 22, 2014 11:05 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Hive Statistics by default hive stores the statistics in derby database. If you want a persistent look at column statistics, you may want to create mysql based database for column statistics. Your queries look fine On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: Hi , i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using set hive.compute.query.using.stats=true; set hive.stats.reliable=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.cbo.enable=true; to get max value of a column it is running full Map reduce on column .. what i want to use is max value stored in meta store ,but i am unable to catch these statistics . my table desc is load_inst_id int src_filename string server_date date my analyze query is analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns load_inst_id; i am always getting 0 as loadinstant id ,i have to
RE: Hive Statistics
Stuck .need help I created a small table with multiple partition desc (id int ,term int) partitioned by id ,whenever I run analyze on any id I am getting perfectly good answers . I am unable to figure out the difference each file is making . New table Table Parameters: transient_lastDdlTime 1406016417 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 inserting insert into table statdevp partition(id) select id,term from statdev where id is not null and term is not null analyze analyze table statdevp partition(id=11) compute statistics for columns id; I am able to see all values in part_col_stat for the partitions I am running analyze and the orginal table : desc Table Parameters: last_modified_byX last_modified_time 1406047797 transient_lastDdlTime 1406047797 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 load_inst_idint src_filenamestring server_date date # Partition Information # col_name data_type comment server_date date insertion insert into devStat partition(server_date) select load_inst_id,src_filename,server_ts,server_date from rrslog_ext where server_date='2013-11-01' and load_inst_id is not null limit 100 analyze analyze table devstat partition(server_date='2013-11-30') compute statistics for columns load_inst_id; any help will be highly appreciated .stuck here long time … thanks in advance . From: Navdeep Agrawal [mailto:navdeep_agra...@symantec.com] Sent: Wednesday, July 23, 2014 3:17 PM To: user@hive.apache.org Subject: RE: Hive Statistics No I have not set these to mysql db . when I set them to the one I am using for hive I am getting stat publisher not getting initialized .but if I have not set these parameters why every time a new row is getting created in mysql db in part_col_stats table . From: Andre Araujo [mailto:ara...@pythian.com] Sent: Wednesday, July 23, 2014 1:22 PM To: user Subject: Re: Hive Statistics Hi, Navdeep, Please note that the configuration for the stats database is separate from the configuration for the metastore db. Can you confirm you have both to use a mysql db? The properties for the stats db are: hive.stats.dbclass= hive.stats.dbconnectionstring= On 23 July 2014 16:07, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: Thank you Nitin for reply. I am using mysql database ,and also I can see new row created for the partition ,but all values are zero . I think explicitly giving mysql data base wont make a difference . From: Nitin Pawar [mailto:nitinpawar...@gmail.com<mailto:nitinpawar...@gmail.com>] Sent: Tuesday, July 22, 2014 11:05 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Hive Statistics by default hive stores the statistics in derby database. If you want a persistent look at column statistics, you may want to create mysql based database for column statistics. Your queries look fine On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: Hi , i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using set hive.compute.query.using.stats=true; set hive.stats.reliable=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.cbo.enable=true; to get max value of a column it is running full Map reduce on column .. what i want to use is max value stored in meta store ,but i am unable to catch these statistics . my table desc is load_inst_id int src_filename string server_date date my analyze query is analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns load_inst_id; i am always gettin
RE: Hive Statistics
No I have not set these to mysql db . when I set them to the one I am using for hive I am getting stat publisher not getting initialized .but if I have not set these parameters why every time a new row is getting created in mysql db in part_col_stats table . From: Andre Araujo [mailto:ara...@pythian.com] Sent: Wednesday, July 23, 2014 1:22 PM To: user Subject: Re: Hive Statistics Hi, Navdeep, Please note that the configuration for the stats database is separate from the configuration for the metastore db. Can you confirm you have both to use a mysql db? The properties for the stats db are: hive.stats.dbclass= hive.stats.dbconnectionstring= On 23 July 2014 16:07, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: Thank you Nitin for reply. I am using mysql database ,and also I can see new row created for the partition ,but all values are zero . I think explicitly giving mysql data base wont make a difference . From: Nitin Pawar [mailto:nitinpawar...@gmail.com<mailto:nitinpawar...@gmail.com>] Sent: Tuesday, July 22, 2014 11:05 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Hive Statistics by default hive stores the statistics in derby database. If you want a persistent look at column statistics, you may want to create mysql based database for column statistics. Your queries look fine On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: Hi , i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using set hive.compute.query.using.stats=true; set hive.stats.reliable=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.cbo.enable=true; to get max value of a column it is running full Map reduce on column .. what i want to use is max value stored in meta store ,but i am unable to catch these statistics . my table desc is load_inst_id int src_filename string server_date date my analyze query is analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns load_inst_id; i am always getting 0 as loadinstant id ,i have to turn off my hive.compute.query.using.stats to get correct result(through map reduce max(load_inst_id)) -- Nitin Pawar -- André Araújo Big Data Consultant/Solutions Architect The Pythian Group - Australia - www.pythian.com<http://www.pythian.com> Office (calls from within Australia): 1300 366 021 x1270 Office (international): +61 2 8016 7000 x270 OR +1 613 565 8696 x1270 Mobile: +61 410 323 559 Fax: +61 2 9805 0544 IM: pythianaraujo @ AIM/MSN/Y! or ara...@pythian.com<mailto:ara...@pythian.com> @ GTalk “Success is not about standing at the top, it's the steps you leave behind.” — Iker Pou (rock climber) --
RE: Hive Stats
i digged into partitions params and checked out column_stats_accurate in metastore but found it to be true . From: Bala Krishna Gangisetty [mailto:b...@altiscale.com] Sent: Wednesday, July 23, 2014 12:46 AM To: user@hive.apache.org Subject: Re: Hive Stats What is the value of "COLUMN_STATS_ACCURATE" field in "DESCRIBE FORMATTED " output under "Table Parameters" section. My suspect is, it's false even after executing your analyze query. If so, Hive launches the mapred jobs. Try using below analyze query: analyze table mytable partition(server_date=’2013-11-30′) compute statistics; --Bala G. On Tue, Jul 22, 2014 at 10:14 AM, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using set hive.compute.query.using.stats=true; set hive.stats.reliable=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.cbo.enable=true; to get max value of a column it is running full Map reduce on column .. what i want to use is max value stored in meta store ,but i am unable to catch these statistics . my table desc is load_inst_id int src_filename string server_date date my analyze query is analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns load_inst_id; i am always getting 0 as loadinstant id ,i have to turn off my hive.compute.query.using.stats to get correct result(through map reduce max(load_inst_id))
RE: Hive Stats
Hi , I am unable to see COLUMN_STATS_ACCURATE in desc formatted/extended ,but I think u r pointing to stats.reliable I have made that to false . From: Bala Krishna Gangisetty [mailto:b...@altiscale.com] Sent: Wednesday, July 23, 2014 12:46 AM To: user@hive.apache.org Subject: Re: Hive Stats What is the value of "COLUMN_STATS_ACCURATE" field in "DESCRIBE FORMATTED " output under "Table Parameters" section. My suspect is, it's false even after executing your analyze query. If so, Hive launches the mapred jobs. Try using below analyze query: analyze table mytable partition(server_date=’2013-11-30′) compute statistics; --Bala G. On Tue, Jul 22, 2014 at 10:14 AM, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using set hive.compute.query.using.stats=true; set hive.stats.reliable=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.cbo.enable=true; to get max value of a column it is running full Map reduce on column .. what i want to use is max value stored in meta store ,but i am unable to catch these statistics . my table desc is load_inst_id int src_filename string server_date date my analyze query is analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns load_inst_id; i am always getting 0 as loadinstant id ,i have to turn off my hive.compute.query.using.stats to get correct result(through map reduce max(load_inst_id))
RE: Hive Statistics
Thank you Nitin for reply. I am using mysql database ,and also I can see new row created for the partition ,but all values are zero . I think explicitly giving mysql data base wont make a difference . From: Nitin Pawar [mailto:nitinpawar...@gmail.com] Sent: Tuesday, July 22, 2014 11:05 PM To: user@hive.apache.org Subject: Re: Hive Statistics by default hive stores the statistics in derby database. If you want a persistent look at column statistics, you may want to create mysql based database for column statistics. Your queries look fine On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal mailto:navdeep_agra...@symantec.com>> wrote: Hi , i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using set hive.compute.query.using.stats=true; set hive.stats.reliable=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.cbo.enable=true; to get max value of a column it is running full Map reduce on column .. what i want to use is max value stored in meta store ,but i am unable to catch these statistics . my table desc is load_inst_id int src_filename string server_date date my analyze query is analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns load_inst_id; i am always getting 0 as loadinstant id ,i have to turn off my hive.compute.query.using.stats to get correct result(through map reduce max(load_inst_id)) -- Nitin Pawar
Hive Statistics
Hi , i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using set hive.compute.query.using.stats=true; set hive.stats.reliable=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.cbo.enable=true; to get max value of a column it is running full Map reduce on column .. what i want to use is max value stored in meta store ,but i am unable to catch these statistics . my table desc is load_inst_id int src_filename string server_date date my analyze query is analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns load_inst_id; i am always getting 0 as loadinstant id ,i have to turn off my hive.compute.query.using.stats to get correct result(through map reduce max(load_inst_id))
Hive Stats
i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using set hive.compute.query.using.stats=true; set hive.stats.reliable=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.cbo.enable=true; to get max value of a column it is running full Map reduce on column .. what i want to use is max value stored in meta store ,but i am unable to catch these statistics . my table desc is load_inst_id int src_filename string server_date date my analyze query is analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns load_inst_id; i am always getting 0 as loadinstant id ,i have to turn off my hive.compute.query.using.stats to get correct result(through map reduce max(load_inst_id))