Re: join/scan query cost modeling

abhishek Fri, 31 Jan 2014 19:54:43 -0800

Also, I would be interested to know if phoenix gathers any stats now?


Thanks
Abhishek
On 01/31/2014 10:46 PM, abhishek wrote:

Hi James
I am building a schema recommendation system based on cost modeling.Currently, I gather all necessary features off line using phoenixqueries or hbase api. However, as you mentioned in your mail, thesestats can be gathered efficiently during major compaction or by usingother background processes.
I agree with you that cardinalities will contribute to cost. Thank youfor pointing that out. Is there other stats that could have immenseimpact?
Thanks for replying and showing interest in this project.

Abhishek

On 01/31/2014 10:16 PM, James Taylor wrote:
Hi,
This is a good start. How do you get the number of rows per table?
I think the biggest missing piece is histogram information so you canapproximate cardinalities. We thought to track this through a statscollection process done during major compaction. And, of course, thequery engine needs to combine the cardinalities based on the ands/orsused in the query.
Thanks,
James
On Fri, Jan 31, 2014 at 6:15 PM, abhishek <[email protected]<mailto:[email protected]>> wrote:
    Hi Taylor

    I am currently working on cost modeling for join and scan queries.

    Currently, my feature set includes follows things:
    1) number of region server
    2) number of thread per region server
    3) number of client side threads
    4) number of rows per table
    5) record size
    6) rowkey length
    7) Hdfs block size
    8) hdfs replication factor
    9) hbase cache size
    and few more

    Would you be able to point out more features that could effect
    scan and join query performance?

    Thanks
    Abhishek

Re: join/scan query cost modeling

Reply via email to