num_rows is always 0 in statistics
Hi, I have run analyse table command several times to get statistics, but I always get num_rows=0 like below. (also, raw_data_size is 0) - hive analyze table lineitem compute statistics; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201208291425_0011, Tracking URL = http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011 Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011 Hadoop job information for Stage-0: number of mappers: 3; number of reducers: 0 2012-08-29 15:16:16,133 Stage-0 map = 0%, reduce = 0% 2012-08-29 15:16:20,154 Stage-0 map = 100%, reduce = 0% 2012-08-29 15:16:22,168 Stage-0 map = 100%, reduce = 100% Ended Job = job_201208291425_0011 Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 759863287, raw_data_size: 0] - I tried the version 0.7.1, 0.8.1, 0.9.0 and the same result. Is there anything else I have to do to make it work ? Also, is statistics only works for managed tables ? I tried it for external tables and it doesn't seem working. (all the values are 0 ) Thanks, Hiroyuki
Re: num_rows is always 0 in statistics
Hi, Thank you for the reply. I tried with the following setting, but I got the same result. (with num_rows=0) hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/tmp/TempStatsStore;create=true Is there any clue ? On Wed, Aug 29, 2012 at 4:09 PM, rohithsharma rohithsharm...@huawei.com wrote: I resolved the issue with following way. Configure hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/home/TempStore. This works only in single node cluster. Please check HIVE-3324. -Original Message- From: Hiroyuki Yamada [mailto:mogwa...@gmail.com] Sent: Wednesday, August 29, 2012 11:57 AM To: user@hive.apache.org Subject: num_rows is always 0 in statistics Hi, I have run analyse table command several times to get statistics, but I always get num_rows=0 like below. (also, raw_data_size is 0) - hive analyze table lineitem compute statistics; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201208291425_0011, Tracking URL = http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011 Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011 Hadoop job information for Stage-0: number of mappers: 3; number of reducers: 0 2012-08-29 15:16:16,133 Stage-0 map = 0%, reduce = 0% 2012-08-29 15:16:20,154 Stage-0 map = 100%, reduce = 0% 2012-08-29 15:16:22,168 Stage-0 map = 100%, reduce = 100% Ended Job = job_201208291425_0011 Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 759863287, raw_data_size: 0] - I tried the version 0.7.1, 0.8.1, 0.9.0 and the same result. Is there anything else I have to do to make it work ? Also, is statistics only works for managed tables ? I tried it for external tables and it doesn't seem working. (all the values are 0 ) Thanks, Hiroyuki
Re: num_rows is always 0 in statistics
Hi, Sorry, it works now. Thank you. But, the value is not correct. (about half of real number of rows.) Is this sampled value ? It seems counting every row as far as i checked TableScanOperator.java . Thanks, Hiroyuki On Wed, Aug 29, 2012 at 5:39 PM, Hiroyuki Yamada mogwa...@gmail.com wrote: Hi, Thank you for the reply. I tried with the following setting, but I got the same result. (with num_rows=0) hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/tmp/TempStatsStore;create=true Is there any clue ? On Wed, Aug 29, 2012 at 4:09 PM, rohithsharma rohithsharm...@huawei.com wrote: I resolved the issue with following way. Configure hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/home/TempStore. This works only in single node cluster. Please check HIVE-3324. -Original Message- From: Hiroyuki Yamada [mailto:mogwa...@gmail.com] Sent: Wednesday, August 29, 2012 11:57 AM To: user@hive.apache.org Subject: num_rows is always 0 in statistics Hi, I have run analyse table command several times to get statistics, but I always get num_rows=0 like below. (also, raw_data_size is 0) - hive analyze table lineitem compute statistics; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201208291425_0011, Tracking URL = http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011 Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011 Hadoop job information for Stage-0: number of mappers: 3; number of reducers: 0 2012-08-29 15:16:16,133 Stage-0 map = 0%, reduce = 0% 2012-08-29 15:16:20,154 Stage-0 map = 100%, reduce = 0% 2012-08-29 15:16:22,168 Stage-0 map = 100%, reduce = 100% Ended Job = job_201208291425_0011 Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 759863287, raw_data_size: 0] - I tried the version 0.7.1, 0.8.1, 0.9.0 and the same result. Is there anything else I have to do to make it work ? Also, is statistics only works for managed tables ? I tried it for external tables and it doesn't seem working. (all the values are 0 ) Thanks, Hiroyuki
How to see the intermediate results between AST and optimized logical query plan.
Hello, I have been trying to learn the Hive query compiler and I am wondering if there is a way to see the result of semantic analysis (query block tree) and non-optimized logical query plan. I know we can get AST and optimized logical query plan with explain, but I want to know the intermediate results between them. Also, is there any detailed documentations about Hive query compiler ? I would be very appreciated if anyone answered my questions. Thanks, Hiroyuki