[HACKERS] Thick indexes - a look at count(1) query

Gokulakannan Somasundaram Wed, 16 Jan 2008 06:31:28 -0800

Hi,
   I have submitted a new patch against thick indexes(indexes with snapshot)
http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php.<http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php>


I did look closely at improving the performance of count(1) queries. It
worked well, when we are selecting a sub-set of the result-set. But when i
did a query like "select count(1) from table", it showed a improvement in
response time, but not to the extent, i wanted it to be. Let's have a look
at the stats.

gokul=# explain analyze select count(1) from dd;
LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
        !       0.277208 elapsed 0.275457 user 0.000148 system sec
        !       [1.128422 user 0.004976 sys total]
        !       0/0 [0/0] filesystem blocks in/out
        !       0/0 [0/0] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [6/12] messages rcvd/sent
        !       0/39 [5/160] voluntary/involuntary context switches
        ! buffer usage stats:
        !       Shared blocks:       1024 Logical Reads,          0 Physical
Reads,          0 written, buffer hit rate = 100.00%
        !       Local  blocks:          0 read,          0 written, buffer
hit rate = 0.00%
        !       Direct blocks:          0 read,          0 written
STATEMENT:  explain analyze select count(1) from dd;
                                                            QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=7323.10..7323.11 rows=1 width=0) (actual time=
276.838..276.838 rows=1 loops=1)
   ->  Index Only Scan using idx on dd  (cost=0.00..6741.42 rows=232671
width=0) (actual time=0.042..160.753 rows=232679 loops=1)
 Total runtime: 276.928 ms
(3 rows)

gokul=# set enable_indexscan=off;
SET

gokul=# explain analyze select count(1) from dd;
LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
        !       0.331441 elapsed 0.258903 user 0.067953 system sec
        !       [1.906069 user 0.211479 sys total]
        !       0/0 [0/0] filesystem blocks in/out
        !       0/0 [0/0] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [10/19] messages rcvd/sent
        !       0/39 [9/312] voluntary/involuntary context switches
        ! buffer usage stats:
        !       Shared blocks:       5223 Logical Reads,       4391 Physical
Reads,          0 written, buffer hit rate = 15.93%
        !       Local  blocks:          0 read,          0 written, buffer
hit rate = 0.00%
        !       Direct blocks:          0 read,          0 written
STATEMENT:  explain analyze select count(1) from dd;
                                                   QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8131.39..8131.40 rows=1 width=0) (actual time=
331.075..331.076 rows=1 loops=1)
   ->  Seq Scan on dd  (cost=0.00..7549.71 rows=232671 width=0) (actual
time=0.042..203.958 rows=232679 loops=1)
 Total runtime: 331.167 ms
(3 rows)

I have run the query multiple times and it shows the response time, around
what is shown here
The table is just a multiple copy of pg_class table( approx 200000 rows). As
it can be seen, the Logical reads show a ratio of 1:5, but the  response
time is not in the same ratio. I tried to profile and couldn't find anything
significant. Eventhough it shows 4391 physical reads, that's from OS cache,
since i ave already run the query multiple times.

One more disadvantage with using select count(1) using index scan is that,
it pollutes the shared memory, unlike full-table scans. But something can be
done in the regard.



Any thoughts?

Thanks,
Gokul.

[HACKERS] Thick indexes - a look at count(1) query

Reply via email to