Re: [HACKERS] Merge Join with an Index Optimization

2016-10-11 Thread Claudio Freire
On Sun, Sep 11, 2016 at 1:20 PM, Tom Lane  wrote:
> Michael Malis  writes:
>> As I understand it, a merge join will currently read all tuples from both
>> subqueries (besides early termination). I believe it should be possible to
>> take advantages of the indexes on one or both of the tables being read from
>> to skip a large number of tuples that would currently be read.
>
> IIUC, what you're proposing is to replace "read past some tuples in the
> index" with "re-descend the index btree".  Yes, that would win if it
> allows skipping over a large number of index entries, but it could lose
> big-time if not.  The difficulty is that I don't see how the merge join
> level could predict whether it would win for any particular advance step.
> You'd really need most of the smarts to be in the index AM.
>
> You might want to troll the archives for past discussions of index skip
> scan, which is more or less the same idea (could possibly be exactly the
> same idea, depending on how it's implemented).  I think we had arrived
> at the conclusion that re-descent from the root might be appropriate
> when transitioning to another index page, but not intra-page.  Anyway
> no one's produced a concrete patch yet.

Interesting it should be brought up.

I was considering the index skip optimization for vacuum at some
point, might as well consider it for regular scans too if I get to
that.

Basically, instead of a simple get next, I was considering adding a
"get next skip until". The caller would be the one responsible for
sending the hint, and the index am would be free to implement the skip
in a smart way if possible.

The interface for vacuum is a bit different in practice, but
conceptually the same.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Merge Join with an Index Optimization

2016-10-10 Thread Michael Malis
I discovered that the kind of join I proposed is called the leapfrog
triejoin: https://arxiv.org/pdf/1210.0481v5.pdf


Re: [HACKERS] Merge Join with an Index Optimization

2016-09-11 Thread Thomas Munro
On Sun, Sep 11, 2016 at 11:07 PM, Michael Malis  wrote:
> On Sun, Sep 11, 2016 at 9:20 AM Tom Lane  wrote:
>>
>> Michael Malis  writes:
>> >> As I understand it, a merge join will currently read all tuples from
>> >> both
>> >> subqueries (besides early termination). I believe it should be possible
>> >> to
>> >> take advantages of the indexes on one or both of the tables being read
>> >> from
>> >> to skip a large number of tuples that would currently be read.
>> >>
>> > IIUC, what you're proposing is to replace "read past some tuples in the
>> > index" with "re-descend the index btree".  Yes, that would win if it
>
>
> Roughly yes, although is it necessary to rescan the btree from the top? Is
> it not possible to "resume" the scan from the previous location?
>
>>
>> > You might want to troll the archives for past discussions of index skip
>> > scan, which is more or less the same idea (could possibly be exactly the
>> > same idea, depending on how it's implemented).
>
>
> After searching a bit, all I was able to find was this. It looks like
> someone had a rough patch of applying a similar idea to DISTINCT. I can't
> tell what happened to it, but in the patch they mention that it should be
> possible to do a "local traversal ie from the current page".

That was me.  Yeah, reserving the option for traversing other than
from the root was my reason for wanting to introduce a skip operation
to the IM interface as described in the later email in that thread[2],
rather than doing it via rescanning (though maybe it's not strictly
necessary, I don't know).  There are a whole lot of interesting
execution tricks that could be enabled by btree skipping (see Oracle,
DB2, MySQL, SQLite).  DISTINCT was the simplest thing I could think of
where literally every other RDBMS beats us because we don't have it.
But that was nowhere near a useful patch, just an experiment.  I hope
I can get back to looking at it for 10...

[2] 
https://www.postgresql.org/message-id/CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Merge Join with an Index Optimization

2016-09-11 Thread Michael Malis
On Sun, Sep 11, 2016 at 9:20 AM Tom Lane  wrote:

> Michael Malis  writes:
> >> As I understand it, a merge join will currently read all tuples from
> both
> >> subqueries (besides early termination). I believe it should be possible
> to
> >> take advantages of the indexes on one or both of the tables being read
> from
> >> to skip a large number of tuples that would currently be read.
> >>
> > IIUC, what you're proposing is to replace "read past some tuples in the
> > index" with "re-descend the index btree".  Yes, that would win if it
>

Roughly yes, although is it necessary to rescan the btree from the top? Is
it not possible to "resume" the scan from the previous location?


> > You might want to troll the archives for past discussions of index skip
> > scan, which is more or less the same idea (could possibly be exactly the
> > same idea, depending on how it's implemented).
>

After searching a bit, all I was able to find was this
.
It looks like someone had a rough patch of applying a similar idea to
DISTINCT. I can't tell what happened to it, but in the patch they mention
that it should be possible to do a "local traversal ie from the current
page".

A different, but similar optimization, would be to first check the join
condition with the data in the index. Then only fetch the full row only if
the data in the index passes the join condition. I imagine that this would
be easier to implement, although less efficent than what I proposed above
because it has to scan the entire index.

Thanks,
Michael


Re: [HACKERS] Merge Join with an Index Optimization

2016-09-11 Thread Tom Lane
Michael Malis  writes:
> As I understand it, a merge join will currently read all tuples from both
> subqueries (besides early termination). I believe it should be possible to
> take advantages of the indexes on one or both of the tables being read from
> to skip a large number of tuples that would currently be read.

IIUC, what you're proposing is to replace "read past some tuples in the
index" with "re-descend the index btree".  Yes, that would win if it
allows skipping over a large number of index entries, but it could lose
big-time if not.  The difficulty is that I don't see how the merge join
level could predict whether it would win for any particular advance step.
You'd really need most of the smarts to be in the index AM.

You might want to troll the archives for past discussions of index skip
scan, which is more or less the same idea (could possibly be exactly the
same idea, depending on how it's implemented).  I think we had arrived
at the conclusion that re-descent from the root might be appropriate
when transitioning to another index page, but not intra-page.  Anyway
no one's produced a concrete patch yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Merge Join with an Index Optimization

2016-09-10 Thread Michael Malis
Hi,

As I understand it, a merge join will currently read all tuples from both
subqueries (besides early termination). I believe it should be possible to
take advantages of the indexes on one or both of the tables being read from
to skip a large number of tuples that would currently be read. As an
example, let's say we are doing equi merge join between two tables with the
following data:

  (3, 4, 5, 9, 10, 11)
  (0, 1, 2, 6, 7, 8)

Currently, even though no tuples would be returned from the merge join, all
of the data will be read from both tables. If there are indexes on both
tables, pg should be able to execute as follows. Get the tuple with value 3
from the first table and then look up the first value greater than 3 in the
second table using the index. In this case, that value would be 6. Since 3
!= 6, pg would then look up the lowest value greater than 6 in the first
table. This process repeats, and tuples are output whenever the values are
equal. This can be thought of as an alternating nested loop join, where the
positions in the indexes are maintained. In the case where only one of the
tables has an index, this can be thought of as a nested loop join where the
inner relation is the table with the index, and the position in that index
is maintained while iterating over the outer relation (the outer relation
would need to be sorted for this to work).

I can't demonstrate in any benchmarks, but I imagine this would
dramatically speed up cases of a merge join between two tables where the
number of tuples that satisfy the join condition is small. I don't know the
Postgres internals well enough to know if there is anything obvious that
would prevent this optimization.

Thanks,
Michael