Hi, hackers! Here is the text of my proposal which I've applied to GSoC. (and link http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/lubennikovaav/5657382461898752 ) Any suggestions and comments are welcome.
*Project name* Bitmap Index-only Count *Brief review* There is a problem of slow counting in PostgreSQL [1]. The reason why this is slow is related to the *MVCC* implementation in PostgreSQL. Index-only scans (implemented since PostgreSQL-9.2) providing some performance improvements where the *visibility map* of the table allows it. That’s good. But it works only for access methods which provide amgettuple method. Unfortunately GIN supports only BitmapIndexScan and has no implementation of index_getnext() interface [2]. Idea of the proposal is to implement Bitmap Index-Only Count method that allows to count elements, without reference to the heap. *Benefits to the PostgreSQL Community* Faster count(*) would be actual for GIN. Probably it would accelerate count(*) for other access methods too, but I do not sure if it would be significant difference. *Quantifiable results* Acceleration of count(*) queries with WHERE clause which use GIN. *Project details* Let’s look at count(*) query: EXPLAIN SELECT count(*) from tbl_mytbl where val @> '{"b":2}'; Now the query plan looks like this: Aggregate -> Bitmap Heap Scan on tbl_mytbl Recheck Cond: (val @> '{"b": 2}'::jsonb) -> Bitmap Index Scan on idx_mytbl Index Cond: (val @> '{"b": 2}'::jsonb) Idea of the proposal is to implement Bitmap Index-Only Count method that allows to count elements, without reference to the heap. Conditions: - all tuples are visible (the same problem as for Index-only count(*)); - result TIDBitmap is not lossy. nchunks = 0; int nchunks <http://doxygen.postgresql.org/structTIDBitmap.html#a85d5883056bad6863cb47a15446581c7>; /* number of lossy entries in pagetable */ - pages in TIDBitmap don’t require recheck * recheck is used only on exact pages --- it indicates that although * only the stated tuples need be checked, the full index qual condition * must be checked for each (ie, these are candidate matches). If all conditions are true, it’s possible to call aggregate COUNT function for each tuple from TIDBitmap returned by Bitmap Index Scan (or BitmapAnd/BitmapOr nodes). And there’s no necessity to call Bitmap Heap Scan. As a GSoC student I will create new Node “Bitmap Index-Only Scan”, which would catch tuples from Bitmap Index Scan node and pass them to Aggregate node. Thus, new query plan will be as follow: Aggregate -> Bitmap Index-only Count -> Bitmap Index Scan on idx_mytbl Index Cond: (val @> '{"b": 2}'::jsonb) *Project Schedule* until May 25 Read documentation and source code, clarify details of implementation. until June 30 Implement Bitmap Index-only Count node. 1 July – 31 July Add Bitmap Index-only Count node to Planner. 1 August -15 August Final refactoring, testing and submitting a patch. *Links* 1. https://wiki.postgresql.org/wiki/Slow_Counting 2. http://postgresql.nabble.com/Todo-item-Support-amgettuple-in-GIN-td5780810.html -- Best regards, Lubennikova Anastasia