Re: Joins on TID

Komяpa Sun, 23 Dec 2018 08:24:49 -0800

Hi,

Writing as someone who used TID joins and group by's in the past.


One use case is having a chance to peek into what will DELETE do.
A lot of GIS tables don't have any notion of ID, and dirty datasets tend to
have many duplicates you need to cross-reference with something else. So,
you write your query in form of

CREATE TABLE ttt as (SELECT distinct on (ctid) ctid as ct, field1, field2,
b.field3, ... from table b join othertable b on ST_Whatever(a.geom,
b.geom));

<connect to table with QGIS, poke around, maybe delete some rows you doubt
you want to remove>

DELETE FROM table a USING ttt b where a.ctid = b.ct;
DROP TABLE ttt;

Here:
 - distinct on ctid is used (hash?)
 - a.ctid = b.ct (hash join candidate?)

I know it's all better with proper IDs, but sometimes it works like that,
usually just once per dataset.


сб, 22 дек. 2018 г. в 19:31, Tom Lane <t...@sss.pgh.pa.us>:

> Simon Riggs <si...@2ndquadrant.com> writes:
> > On Sat, 22 Dec 2018 at 04:31, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >> BTW, if we're to start taking joins on TID seriously, we should also
> >> add the missing hash opclass for TID, so that you can do hash joins
> >> when dealing with a lot of rows.
>
> > I don't think we are trying to do TID joins more seriously, just fix a
> > special case.
> > The case cited requires the batches of work to be small, so nested loops
> > works fine.
> > Looks to me that Edmund is trying to solve the same problem. If so, this
> is
> > the best solution.
>
> No, I think what Edmund is on about is unrelated, except that it touches
> some of the same code.  He's interested in problems like "find the last
> few tuples in this table".  You can solve that today, with e.g.
> "SELECT ... WHERE ctid >= '(n,1)'", but you get a stupidly inefficient
> plan.  If we think that's a use-case worth supporting then it'd be
> reasonable to provide less inefficient implementation(s).
>
> What I'm thinking about in this thread is joins on TID, which we have only
> very weak support for today --- you'll basically always wind up with a
> mergejoin, which requires full-table scan and sort of its inputs.  Still,
> that's better than a naive nestloop, and for years we've been figuring
> that that was good enough.  Several people in the other thread that
> I cited felt that that isn't good enough.  But if we think it's worth
> taking seriously, then IMO we need to add both parameterized scans (for
> nestloop-with-inner-fetch-by-tid) and hash join, because each of those
> can dominate depending on how many tuples you're joining.
>
>                         regards, tom lane
>
> --
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

Re: Joins on TID

Reply via email to