Hello Hackers,

I've recently noticed a particularly strange behaviour of one of my simple queries:
SELECT ra,dec FROM sdssdr7.photoobjall WHERE ra BETWEEN 175 and 190 AND
   dec BETWEEN 17 And 23 ;
The table is quite large (1.1Tb, 10^9 rows, and has Btree indexes on "ra","dec" columns).
The plan is a combination of two bitmap scans:
wsdb=# explain SELECT ra,dec FROM sdssdr7.photoobjall WHERE ra BETWEEN 175 and 
190 AND dec BETWEEN 17 And 23 ;
                                                                        QUERY 
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on photoobjall  (cost=1854918.62..13401835.50 rows=3266290 
width=16)
   Recheck Cond: ((ra >= 175::double precision) AND (ra <= 190::double precision) AND ("dec" 
>= 17::double precision) AND ("dec" <= 23::double precision))
   ->  BitmapAnd  (cost=1854918.62..1854918.62 rows=3266290 width=0)
         ->  Bitmap Index Scan on sdssdr7_photoobjall_ra_idx  
(cost=0.00..607217.21 rows=30530306 width=0)
               Index Cond: ((ra >= 175::double precision) AND (ra <= 
190::double precision))
         ->  Bitmap Index Scan on sdssdr7_photoobjall_dec_idx  
(cost=0.00..1246068.01 rows=62654186 width=0)
               Index Cond: (("dec" >= 17::double precision) AND ("dec" <= 
23::double precision))

But the funny thing I noticed is that the query after running a certain amount of time doing I/O, starts to use 100%CPU and spend 99% the time in hash_seq_search. Here is the oprofile of PG during that period:
--------
CPU: Intel Core/i7, speed 2.268e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (No unit mask) count 100000
samples  %        symbol name
303404   99.3562  hash_seq_search
1163      0.3808  tbm_lossify
639       0.2093  hash_search_with_hash_value
....
--------

During that it very rarely tries to actually do any I/O. Every second or so it does issue an read of the main relation(not the index). Here is how strace looks like during that time:
read(455, "\0\0\0\0\0\0\0\0\1\0\0\0t\6\200\6\360\37\4 \0\0\0\0\200\206 \0\340\237 
\0\320"..., 8192) = 8192
... repeated 100 or more  times(issued every second or so) ..
lseek(455, 603635712, SEEK_SET)         = 603635712
read(455, "\0\0\0\0\0\0\0\0\1\0\0\0t\6\200\6\360\37\4 \0\0\0\0\200\206 \0\340\237 
\0\320"..., 8192) = 8192
... repeated a lot of times...
...etc...
So it seems that it actually reads the same thing all over again and again.

Another funny thing is that I have the exactly the same table, but with significantly smaller number of columns (so it occupies 180 gb instead) but has the same indexes. Also all the rows in the table are exactly in the same order (both tables were clustered using the same thing). For that table the query succeeds without problems in 20 seconds or so:
wsdb=> explain analyze SELECT ra,dec FROM sdssdr7.phototag WHERE ra BETWEEN 175 
and 190 AND dec BETWEEN 17 And 23 ;
                                                                             
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on phototag  (cost=1824731.39..10750473.21 rows=3204629 
width=16) (actual time=21146.520..23136.512 rows=2973205 loops=1)
   Recheck Cond: ((ra >= 175::double precision) AND (ra <= 190::double precision) AND ("dec" 
>= 17::double precision) AND ("dec" <= 23::double precision))
   ->  BitmapAnd  (cost=1824731.39..1824731.39 rows=3204629 width=0) (actual 
time=21093.906..21093.906 rows=0 loops=1)
         ->  Bitmap Index Scan on sdssdr7_phototag2_ra_idx  
(cost=0.00..613910.25 rows=30866810 width=0) (actual time=6502.589..6502.589 
rows=30418322 loops=1)
               Index Cond: ((ra >= 175::double precision) AND (ra <= 
190::double precision))
         ->  Bitmap Index Scan on sdssdr7_phototag2_dec_idx  
(cost=0.00..1209218.57 rows=60801242 width=0) (actual time=14201.455..14201.455 
rows=63031352 loops=1)
               Index Cond: (("dec" >= 17::double precision) AND ("dec" <= 
23::double precision))
 Total runtime: 23295.384 ms
(8 rows)

While for "wider" table it is still running the query for more than an hour with 100%CPU an almost no I/O

Additional info:
PG version 8.4.8,
OS: debian 5.0
hardware: 2xXeon E5520, 12GB RAM, RAID50 with BBU
The tables are completely static(e.g. no write activity on them), and have been vacuum analyzed.
The system is not busy at all (e.g. not much if any concurrent queries).
The tables have large number of columns of real/int/bigint/double precision time.

Some config parameters:
 effective_cache_size            | 6GB
 work_mem                        | 1GB
 effective_io_concurrency        | 0
 shared_buffers                  | 4GB

Any ideas what can be wrong? Any info I can provide ?

Thanks in advance,
        Sergey

*******************************************************************
Sergey E. Koposov, PhD
Institute for Astronomy, Cambridge UK /Sternberg Astronomical Institute
Web: http://lnfm1.sai.msu.ru/~math
E-mail: m...@sai.msu.ru

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to