Re: num_rows is always 0 in statistics

2012-08-29 Thread Hiroyuki Yamada
Hi,

Sorry, it works now. Thank you.
But, the value is not correct. (about half of real number of rows.)
Is this sampled value ?
It seems counting every row as far as i checked TableScanOperator.java .

Thanks,

Hiroyuki

On Wed, Aug 29, 2012 at 5:39 PM, Hiroyuki Yamada  wrote:
> Hi,
>
> Thank you for the reply.
> I tried with the following setting, but I got the same result. (with 
> num_rows=0)
>
> hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/tmp/TempStatsStore;create=true
>
> Is there any clue ?
>
> On Wed, Aug 29, 2012 at 4:09 PM, rohithsharma  
> wrote:
>> I resolved the issue with following way.
>>
>> Configure
>> "hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/home/TempStore".
>> This works only in single node cluster.
>>
>>
>> Please check HIVE-3324.
>>
>>
>> -Original Message-
>> From: Hiroyuki Yamada [mailto:mogwa...@gmail.com]
>> Sent: Wednesday, August 29, 2012 11:57 AM
>> To: user@hive.apache.org
>> Subject: num_rows is always 0 in statistics
>>
>> Hi,
>>
>> I have run "analyse table" command several times to get statistics,
>> but I always get num_rows=0 like below.
>> (also, raw_data_size is 0)
>>
>> -
>> hive> analyze table lineitem compute statistics;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> Starting Job = job_201208291425_0011, Tracking URL =
>> http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011
>> Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job
>> -Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011
>> Hadoop job information for Stage-0: number of mappers: 3; number of
>> reducers: 0
>> 2012-08-29 15:16:16,133 Stage-0 map = 0%,  reduce = 0%
>> 2012-08-29 15:16:20,154 Stage-0 map = 100%,  reduce = 0%
>> 2012-08-29 15:16:22,168 Stage-0 map = 100%,  reduce = 100%
>> Ended Job = job_201208291425_0011
>> Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows:
>> 0, total_size: 759863287, raw_data_size: 0]
>> -
>>
>> I tried the version 0.7.1, 0.8.1, 0.9.0 and
>> the same result.
>> Is there anything else I have to do to make it work ?
>>
>> Also, is statistics only works for managed tables ?
>> I tried it for external tables and it doesn't seem working. (all the
>> values are 0 )
>>
>> Thanks,
>>
>> Hiroyuki
>>


Re: num_rows is always 0 in statistics

2012-08-29 Thread Hiroyuki Yamada
Hi,

Thank you for the reply.
I tried with the following setting, but I got the same result. (with num_rows=0)

hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/tmp/TempStatsStore;create=true

Is there any clue ?

On Wed, Aug 29, 2012 at 4:09 PM, rohithsharma  wrote:
> I resolved the issue with following way.
>
> Configure
> "hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/home/TempStore".
> This works only in single node cluster.
>
>
> Please check HIVE-3324.
>
>
> -Original Message-
> From: Hiroyuki Yamada [mailto:mogwa...@gmail.com]
> Sent: Wednesday, August 29, 2012 11:57 AM
> To: user@hive.apache.org
> Subject: num_rows is always 0 in statistics
>
> Hi,
>
> I have run "analyse table" command several times to get statistics,
> but I always get num_rows=0 like below.
> (also, raw_data_size is 0)
>
> -
> hive> analyze table lineitem compute statistics;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201208291425_0011, Tracking URL =
> http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011
> Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job
> -Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011
> Hadoop job information for Stage-0: number of mappers: 3; number of
> reducers: 0
> 2012-08-29 15:16:16,133 Stage-0 map = 0%,  reduce = 0%
> 2012-08-29 15:16:20,154 Stage-0 map = 100%,  reduce = 0%
> 2012-08-29 15:16:22,168 Stage-0 map = 100%,  reduce = 100%
> Ended Job = job_201208291425_0011
> Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows:
> 0, total_size: 759863287, raw_data_size: 0]
> -
>
> I tried the version 0.7.1, 0.8.1, 0.9.0 and
> the same result.
> Is there anything else I have to do to make it work ?
>
> Also, is statistics only works for managed tables ?
> I tried it for external tables and it doesn't seem working. (all the
> values are 0 )
>
> Thanks,
>
> Hiroyuki
>


num_rows is always 0 in statistics

2012-08-28 Thread Hiroyuki Yamada
Hi,

I have run "analyse table" command several times to get statistics,
but I always get num_rows=0 like below.
(also, raw_data_size is 0)

-
hive> analyze table lineitem compute statistics;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201208291425_0011, Tracking URL =
http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011
Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job
-Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011
Hadoop job information for Stage-0: number of mappers: 3; number of reducers: 0
2012-08-29 15:16:16,133 Stage-0 map = 0%,  reduce = 0%
2012-08-29 15:16:20,154 Stage-0 map = 100%,  reduce = 0%
2012-08-29 15:16:22,168 Stage-0 map = 100%,  reduce = 100%
Ended Job = job_201208291425_0011
Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows:
0, total_size: 759863287, raw_data_size: 0]
-

I tried the version 0.7.1, 0.8.1, 0.9.0 and
the same result.
Is there anything else I have to do to make it work ?

Also, is statistics only works for managed tables ?
I tried it for external tables and it doesn't seem working. (all the
values are 0 )

Thanks,

Hiroyuki


How to replace a input table name inside Hive (How Hive pass input file names to Hadoop ?)

2011-10-27 Thread Hiroyuki Yamada
Hello,

I am trying to understand how hive compiles and opmizes HiveQL queries
for future development.
I would like to know how to replace a input table name in the
compilation process.
For example,
the following HiveQL is queried,

SELECT l_orderkey FROM lineitem WHERE l_shipdate < '1993-01-01';

, and I want to input "lineitem_copy" file instead of lineitem.
(lineitem_copy is also created beforehand.)

I looked into many of the codes, but I could't do that.
I modified some Table objects ans aliases like the following code, but
they didn't work.
(objects are actually changed, but I think it's not referenced from
InputFormat,
so lineitem is actually read in.)

in parse/SemanticAnalyzer.java
-
public void analyzeInternal(ASTNode ast) throws SemanticException {

   // complilations and optimizations

   for (String alias : qb.getMetaData().getAliasToTable().keySet()) {
  Table table = qb.getMetaData().getTableForAlias(alias);
  table.setTableName(table.getTableName() + "_copy");
  qb.setTabAlias(alias, qb.getTabNameForAlias(alias) + "_ext");
   }

   genMapRedTasks(qb);

   LOG.info("Completed plan generation");

return;
}

-


How Hive pass input file names to Hadoop ?
And, is there any way I can achieve this or any hints ?


How to see the intermediate results between AST and optimized logical query plan.

2011-10-19 Thread Hiroyuki Yamada
Hello,

I have been trying to learn the Hive query compiler and
I am wondering if there is a way to see the result of semantic
analysis (query block tree)
and non-optimized logical query plan.
I know we can get AST and optimized logical query plan with "explain",
but I want to know the intermediate results between them.

Also, is there any detailed documentations about Hive query compiler ?

I would be very appreciated if anyone answered my questions.

Thanks,
Hiroyuki