Re: [PERFORM] Idea how to get rid of Bitmap Heap Scan

2009-12-20 Thread Greg Stark
On Sun, Dec 20, 2009 at 2:11 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Dec 18, 2009 at 12:29 PM, Greg Stark gsst...@mit.edu wrote:
 A word of warning, in my experience the hardest part for changes like
 this isn't the executor changes (which in this case wouldn't be far
 from easy) but the planner changes to detect when this new plan would
 be better.

 There's also the problem of making the visibility map crash-safe.  I
 think I heard you might have some ideas on that one - has it been
 discussed on -hackers?

Not sure what ideas you mean.

In the original poster's plan that isn't an issue. We could scan the
index, perform the joins and restriction clauses, and only check the
visibility on the resulting tuples which slip through them all. That
would be possible even without crash-safe visibility bits.

-- 
greg

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Idea how to get rid of Bitmap Heap Scan

2009-12-20 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 In the original poster's plan that isn't an issue. We could scan the
 index, perform the joins and restriction clauses, and only check the
 visibility on the resulting tuples which slip through them all. That
 would be possible even without crash-safe visibility bits.

Yeah, this was floated years ago as being a potentially interesting
approach when all the join-condition fields are indexed.  You end up
never having to fetch rows that don't pass the join.

It certainly seems reasonably straightforward on the executor side.
As Greg said, the hard part is planning it sanely.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Idea how to get rid of Bitmap Heap Scan

2009-12-20 Thread Robert Haas
On Sun, Dec 20, 2009 at 11:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark gsst...@mit.edu writes:
 In the original poster's plan that isn't an issue. We could scan the
 index, perform the joins and restriction clauses, and only check the
 visibility on the resulting tuples which slip through them all. That
 would be possible even without crash-safe visibility bits.

 Yeah, this was floated years ago as being a potentially interesting
 approach when all the join-condition fields are indexed.  You end up
 never having to fetch rows that don't pass the join.

 It certainly seems reasonably straightforward on the executor side.
 As Greg said, the hard part is planning it sanely.

Yeah, but that seems REALLY hard.  First, there's the difficulty of
actually generating all the paths and costing them appropriately.  A
plan to perform an index scan but defer the heap fetches until later
has a hidden cost associated with it: the heap fetches will cost
something, but we don't know how much until we get the row estimate
for the node where we choose to implement them.  Without knowing that
cost, it's hard to be confident in discarding other plans that are
apparently more expensive.  That's probably solvable by adopting a
more sophisticated method for comparing costs, but that gets you to
the second problem, which is doing all of this with reasonable
performance.  You're going to have a lot more paths than we do now,
and there will be many queries for which there are only trivial cost
differences between them (like any query where most or all of the
joins have a selectivity of exactly 1.0, which is a very common case
for me).

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Idea how to get rid of Bitmap Heap Scan

2009-12-19 Thread Robert Haas
On Fri, Dec 18, 2009 at 12:29 PM, Greg Stark gsst...@mit.edu wrote:
 A word of warning, in my experience the hardest part for changes like
 this isn't the executor changes (which in this case wouldn't be far
 from easy) but the planner changes to detect when this new plan would
 be better.

There's also the problem of making the visibility map crash-safe.  I
think I heard you might have some ideas on that one - has it been
discussed on -hackers?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Idea how to get rid of Bitmap Heap Scan

2009-12-18 Thread Michael N. Mikhulya
Hello!

There are many questions on internet about whether it is possible to
optimize Bitmap Heap Scan somehow without answer, so seems like
problem is rather important.

The query I need to optimize is:

EXPLAIN SELECT date_trunc('day', d.created_at) AS day, COUNT(*) AS
download FROM downloads d WHERE d.file_id in (select id from files
where owner_id = 443) AND d.download_status != 0 AND d.created_at =
'2009-12-05' AND d.created_at  '2009-12-16' GROUP BY 1;

   QUERY PLAN
--
 HashAggregate  (cost=15809.49..17126.20 rows=87781 width=8)
   -  Hash Semi Join  (cost=5809.51..15368.11 rows=88276 width=8)
 Hash Cond: (d.file_id = files.id)
 -  Index Scan using idx_downloads_created_at on downloads d
(cost=0.00..7682.73 rows=88276 width=16)
   Index Cond: ((created_at = '2009-12-05
00:00:00'::timestamp without time zone) AND (created_at  '2009-12-16
00:00:00'::timestamp without time zone))
 -  Hash  (cost=5741.51..5741.51 rows=5440 width=8)
   -  Bitmap Heap Scan on files  (cost=106.42..5741.51
rows=5440 width=8)
 Recheck Cond: (owner_id = 443)
 -  Bitmap Index Scan on idx_files_owner
(cost=0.00..105.06 rows=5440 width=0)
   Index Cond: (owner_id = 443)

The problem here is that we are forced to fetch files in Bitmap Heap Scan.
But actually there is no need for the whole files record. The
necessary data is only files ids.

The idea is to avoid fetching data from files table, and get the ids
from index! (probably it is a little bit tricky, but it is a
performance area...)

I created an index with following command:
create index idx_files_owner_id ON files (owner_id, id);
and even tried to remove old index to enforce postgresql to use newly
created index.
But postgresql still do Bitmap Heap Scan.

(The other idea is to use raw_id as a primary key of files table to
don't extend index. But I don't know whether it is possible at all or
this idea have some drawbacks)

I think it worth to learn postgreql to do this trick especially taking
into account there are many questions about whether it is possible to
optimize such a queries.

If there is an known solution to this problem please provide a link to it.

With best regards,
Michael Mikhulya.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Idea how to get rid of Bitmap Heap Scan

2009-12-18 Thread Matthew Wakeling

On Fri, 18 Dec 2009, Michael N. Mikhulya wrote:

The problem here is that we are forced to fetch files in Bitmap Heap Scan.
But actually there is no need for the whole files record. The
necessary data is only files ids.

The idea is to avoid fetching data from files table, and get the ids
from index! (probably it is a little bit tricky, but it is a
performance area...)


Unfortunately, the index does not contain enough information to accomplish 
this. This is due to Postgres' advanced concurrency control system. 
Postgres needs to fetch the actual rows from the files table in order to 
check whether that row is visible in the current transaction, and a Bitmap 
Index Scan is the fastest way to do this.


You can speed this up in Postgres 8.4 by having a RAID array and setting 
the effective_concurrency configuration to the number of spindles in the 
RAID array, or by having gobs of RAM and keeping everything in cache.


Matthew

--
A good programmer is one who looks both ways before crossing a one-way street.
Considering the quality and quantity of one-way streets in Cambridge, it
should be no surprise that there are so many good programmers there.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Idea how to get rid of Bitmap Heap Scan

2009-12-18 Thread Michael N. Mikhulya
Thank you very much. I catch the point why it is done so.

But I'm curious whether it is still possible to don't fetch data from
files table just because inappropriate ids (e.g. removed ones) will
not produce any wrong effect just because them indirectly checked on
downloads table?
Here I mean that if we get id (from index) for file which is actually
removed, then we will not find anything in downloads table.
Probably my knowledge about MVCC is too little to see whole picture,
so if it is not hard to you please point the failure scenario (when
we get wrong result) or locking issue, ...

Michael Mikhulya

 Unfortunately, the index does not contain enough information to accomplish
 this. This is due to Postgres' advanced concurrency control system. Postgres
 needs to fetch the actual rows from the files table in order to check
 whether that row is visible in the current transaction, and a Bitmap Index
 Scan is the fastest way to do this.

 You can speed this up in Postgres 8.4 by having a RAID array and setting the
 effective_concurrency configuration to the number of spindles in the RAID
 array, or by having gobs of RAM and keeping everything in cache.

 Matthew

 --
 A good programmer is one who looks both ways before crossing a one-way
 street.
 Considering the quality and quantity of one-way streets in Cambridge, it
 should be no surprise that there are so many good programmers there.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Idea how to get rid of Bitmap Heap Scan

2009-12-18 Thread Greg Stark
On Fri, Dec 18, 2009 at 4:18 PM, Michael N. Mikhulya
m.mikhu...@gmail.com wrote:
 Thank you very much. I catch the point why it is done so.

 But I'm curious whether it is still possible to don't fetch data from
 files table just because inappropriate ids (e.g. removed ones) will
 not produce any wrong effect just because them indirectly checked on
 downloads table?
 Here I mean that if we get id (from index) for file which is actually
 removed, then we will not find anything in downloads table.
 Probably my knowledge about MVCC is too little to see whole picture,
 so if it is not hard to you please point the failure scenario (when
 we get wrong result) or locking issue, ...


Yup this ought to be possible and fruitful, I believe Heikki already
produced a partial patch to this end. If you're interested in working
on it you could skim back in the logs and start with that. I don't
recall any special keywords to search on but it might be in one of the
threads for the visibility map or it might be under index-only
scans.

A word of warning, in my experience the hardest part for changes like
this isn't the executor changes (which in this case wouldn't be far
from easy) but the planner changes to detect when this new plan would
be better.

-- 
greg

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance