Hi, This is a good start. How do you get the number of rows per table? I think the biggest missing piece is histogram information so you can approximate cardinalities. We thought to track this through a stats collection process done during major compaction. And, of course, the query engine needs to combine the cardinalities based on the ands/ors used in the query.
Thanks, James On Fri, Jan 31, 2014 at 6:15 PM, abhishek <[email protected]> wrote: > Hi Taylor > > I am currently working on cost modeling for join and scan queries. > > Currently, my feature set includes follows things: > 1) number of region server > 2) number of thread per region server > 3) number of client side threads > 4) number of rows per table > 5) record size > 6) rowkey length > 7) Hdfs block size > 8) hdfs replication factor > 9) hbase cache size > and few more > > Would you be able to point out more features that could effect scan and > join query performance? > > Thanks > Abhishek >
