Hi,
This is a good start. How do you get the number of rows per table?

I think the biggest missing piece is histogram information so you can
approximate cardinalities. We thought to track this through a stats
collection process done during major compaction. And, of course, the query
engine needs to combine the cardinalities based on the ands/ors used in the
query.

Thanks,
James


On Fri, Jan 31, 2014 at 6:15 PM, abhishek <[email protected]> wrote:

> Hi Taylor
>
> I am currently working on cost modeling for join and scan queries.
>
> Currently, my feature set includes follows things:
> 1) number of region server
> 2) number of thread per region server
> 3) number of client side threads
> 4) number of rows per table
> 5) record size
> 6) rowkey length
> 7) Hdfs block size
> 8) hdfs replication factor
> 9) hbase cache size
> and few more
>
> Would you be able to point out more features that could effect scan and
> join query performance?
>
> Thanks
> Abhishek
>

Reply via email to