[PERFORM] Idea how to get rid of Bitmap Heap Scan

2009-12-18 Thread Michael N. Mikhulya
Hello!

There are many questions on internet about whether it is possible to
optimize Bitmap Heap Scan somehow without answer, so seems like
problem is rather important.

The query I need to optimize is:

EXPLAIN SELECT date_trunc('day', d.created_at) AS day, COUNT(*) AS
download FROM downloads d WHERE d.file_id in (select id from files
where owner_id = 443) AND d.download_status != 0 AND d.created_at =
'2009-12-05' AND d.created_at  '2009-12-16' GROUP BY 1;

   QUERY PLAN
--
 HashAggregate  (cost=15809.49..17126.20 rows=87781 width=8)
   -  Hash Semi Join  (cost=5809.51..15368.11 rows=88276 width=8)
 Hash Cond: (d.file_id = files.id)
 -  Index Scan using idx_downloads_created_at on downloads d
(cost=0.00..7682.73 rows=88276 width=16)
   Index Cond: ((created_at = '2009-12-05
00:00:00'::timestamp without time zone) AND (created_at  '2009-12-16
00:00:00'::timestamp without time zone))
 -  Hash  (cost=5741.51..5741.51 rows=5440 width=8)
   -  Bitmap Heap Scan on files  (cost=106.42..5741.51
rows=5440 width=8)
 Recheck Cond: (owner_id = 443)
 -  Bitmap Index Scan on idx_files_owner
(cost=0.00..105.06 rows=5440 width=0)
   Index Cond: (owner_id = 443)

The problem here is that we are forced to fetch files in Bitmap Heap Scan.
But actually there is no need for the whole files record. The
necessary data is only files ids.

The idea is to avoid fetching data from files table, and get the ids
from index! (probably it is a little bit tricky, but it is a
performance area...)

I created an index with following command:
create index idx_files_owner_id ON files (owner_id, id);
and even tried to remove old index to enforce postgresql to use newly
created index.
But postgresql still do Bitmap Heap Scan.

(The other idea is to use raw_id as a primary key of files table to
don't extend index. But I don't know whether it is possible at all or
this idea have some drawbacks)

I think it worth to learn postgreql to do this trick especially taking
into account there are many questions about whether it is possible to
optimize such a queries.

If there is an known solution to this problem please provide a link to it.

With best regards,
Michael Mikhulya.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Idea how to get rid of Bitmap Heap Scan

2009-12-18 Thread Michael N. Mikhulya
Thank you very much. I catch the point why it is done so.

But I'm curious whether it is still possible to don't fetch data from
files table just because inappropriate ids (e.g. removed ones) will
not produce any wrong effect just because them indirectly checked on
downloads table?
Here I mean that if we get id (from index) for file which is actually
removed, then we will not find anything in downloads table.
Probably my knowledge about MVCC is too little to see whole picture,
so if it is not hard to you please point the failure scenario (when
we get wrong result) or locking issue, ...

Michael Mikhulya

 Unfortunately, the index does not contain enough information to accomplish
 this. This is due to Postgres' advanced concurrency control system. Postgres
 needs to fetch the actual rows from the files table in order to check
 whether that row is visible in the current transaction, and a Bitmap Index
 Scan is the fastest way to do this.

 You can speed this up in Postgres 8.4 by having a RAID array and setting the
 effective_concurrency configuration to the number of spindles in the RAID
 array, or by having gobs of RAM and keeping everything in cache.

 Matthew

 --
 A good programmer is one who looks both ways before crossing a one-way
 street.
 Considering the quality and quantity of one-way streets in Cambridge, it
 should be no surprise that there are so many good programmers there.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance