Re: num_rows is always 0 in statistics
Hi, Sorry, it works now. Thank you. But, the value is not correct. (about half of real number of rows.) Is this sampled value ? It seems counting every row as far as i checked TableScanOperator.java . Thanks, Hiroyuki On Wed, Aug 29, 2012 at 5:39 PM, Hiroyuki Yamada wrote: > Hi, > > Thank you for the reply. > I tried with the following setting, but I got the same result. (with > num_rows=0) > > hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/tmp/TempStatsStore;create=true > > Is there any clue ? > > On Wed, Aug 29, 2012 at 4:09 PM, rohithsharma > wrote: >> I resolved the issue with following way. >> >> Configure >> "hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/home/TempStore". >> This works only in single node cluster. >> >> >> Please check HIVE-3324. >> >> >> -Original Message- >> From: Hiroyuki Yamada [mailto:mogwa...@gmail.com] >> Sent: Wednesday, August 29, 2012 11:57 AM >> To: user@hive.apache.org >> Subject: num_rows is always 0 in statistics >> >> Hi, >> >> I have run "analyse table" command several times to get statistics, >> but I always get num_rows=0 like below. >> (also, raw_data_size is 0) >> >> - >> hive> analyze table lineitem compute statistics; >> Total MapReduce jobs = 1 >> Launching Job 1 out of 1 >> Number of reduce tasks is set to 0 since there's no reduce operator >> Starting Job = job_201208291425_0011, Tracking URL = >> http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011 >> Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job >> -Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011 >> Hadoop job information for Stage-0: number of mappers: 3; number of >> reducers: 0 >> 2012-08-29 15:16:16,133 Stage-0 map = 0%, reduce = 0% >> 2012-08-29 15:16:20,154 Stage-0 map = 100%, reduce = 0% >> 2012-08-29 15:16:22,168 Stage-0 map = 100%, reduce = 100% >> Ended Job = job_201208291425_0011 >> Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows: >> 0, total_size: 759863287, raw_data_size: 0] >> - >> >> I tried the version 0.7.1, 0.8.1, 0.9.0 and >> the same result. >> Is there anything else I have to do to make it work ? >> >> Also, is statistics only works for managed tables ? >> I tried it for external tables and it doesn't seem working. (all the >> values are 0 ) >> >> Thanks, >> >> Hiroyuki >>
Re: num_rows is always 0 in statistics
Hi, Thank you for the reply. I tried with the following setting, but I got the same result. (with num_rows=0) hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/tmp/TempStatsStore;create=true Is there any clue ? On Wed, Aug 29, 2012 at 4:09 PM, rohithsharma wrote: > I resolved the issue with following way. > > Configure > "hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/home/TempStore". > This works only in single node cluster. > > > Please check HIVE-3324. > > > -Original Message- > From: Hiroyuki Yamada [mailto:mogwa...@gmail.com] > Sent: Wednesday, August 29, 2012 11:57 AM > To: user@hive.apache.org > Subject: num_rows is always 0 in statistics > > Hi, > > I have run "analyse table" command several times to get statistics, > but I always get num_rows=0 like below. > (also, raw_data_size is 0) > > - > hive> analyze table lineitem compute statistics; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_201208291425_0011, Tracking URL = > http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011 > Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job > -Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011 > Hadoop job information for Stage-0: number of mappers: 3; number of > reducers: 0 > 2012-08-29 15:16:16,133 Stage-0 map = 0%, reduce = 0% > 2012-08-29 15:16:20,154 Stage-0 map = 100%, reduce = 0% > 2012-08-29 15:16:22,168 Stage-0 map = 100%, reduce = 100% > Ended Job = job_201208291425_0011 > Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows: > 0, total_size: 759863287, raw_data_size: 0] > - > > I tried the version 0.7.1, 0.8.1, 0.9.0 and > the same result. > Is there anything else I have to do to make it work ? > > Also, is statistics only works for managed tables ? > I tried it for external tables and it doesn't seem working. (all the > values are 0 ) > > Thanks, > > Hiroyuki >
num_rows is always 0 in statistics
Hi, I have run "analyse table" command several times to get statistics, but I always get num_rows=0 like below. (also, raw_data_size is 0) - hive> analyze table lineitem compute statistics; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201208291425_0011, Tracking URL = http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011 Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011 Hadoop job information for Stage-0: number of mappers: 3; number of reducers: 0 2012-08-29 15:16:16,133 Stage-0 map = 0%, reduce = 0% 2012-08-29 15:16:20,154 Stage-0 map = 100%, reduce = 0% 2012-08-29 15:16:22,168 Stage-0 map = 100%, reduce = 100% Ended Job = job_201208291425_0011 Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 759863287, raw_data_size: 0] - I tried the version 0.7.1, 0.8.1, 0.9.0 and the same result. Is there anything else I have to do to make it work ? Also, is statistics only works for managed tables ? I tried it for external tables and it doesn't seem working. (all the values are 0 ) Thanks, Hiroyuki
How to replace a input table name inside Hive (How Hive pass input file names to Hadoop ?)
Hello, I am trying to understand how hive compiles and opmizes HiveQL queries for future development. I would like to know how to replace a input table name in the compilation process. For example, the following HiveQL is queried, SELECT l_orderkey FROM lineitem WHERE l_shipdate < '1993-01-01'; , and I want to input "lineitem_copy" file instead of lineitem. (lineitem_copy is also created beforehand.) I looked into many of the codes, but I could't do that. I modified some Table objects ans aliases like the following code, but they didn't work. (objects are actually changed, but I think it's not referenced from InputFormat, so lineitem is actually read in.) in parse/SemanticAnalyzer.java - public void analyzeInternal(ASTNode ast) throws SemanticException { // complilations and optimizations for (String alias : qb.getMetaData().getAliasToTable().keySet()) { Table table = qb.getMetaData().getTableForAlias(alias); table.setTableName(table.getTableName() + "_copy"); qb.setTabAlias(alias, qb.getTabNameForAlias(alias) + "_ext"); } genMapRedTasks(qb); LOG.info("Completed plan generation"); return; } - How Hive pass input file names to Hadoop ? And, is there any way I can achieve this or any hints ?
How to see the intermediate results between AST and optimized logical query plan.
Hello, I have been trying to learn the Hive query compiler and I am wondering if there is a way to see the result of semantic analysis (query block tree) and non-optimized logical query plan. I know we can get AST and optimized logical query plan with "explain", but I want to know the intermediate results between them. Also, is there any detailed documentations about Hive query compiler ? I would be very appreciated if anyone answered my questions. Thanks, Hiroyuki