Hi, Writing as someone who used TID joins and group by's in the past.
One use case is having a chance to peek into what will DELETE do. A lot of GIS tables don't have any notion of ID, and dirty datasets tend to have many duplicates you need to cross-reference with something else. So, you write your query in form of CREATE TABLE ttt as (SELECT distinct on (ctid) ctid as ct, field1, field2, b.field3, ... from table b join othertable b on ST_Whatever(a.geom, b.geom)); <connect to table with QGIS, poke around, maybe delete some rows you doubt you want to remove> DELETE FROM table a USING ttt b where a.ctid = b.ct; DROP TABLE ttt; Here: - distinct on ctid is used (hash?) - a.ctid = b.ct (hash join candidate?) I know it's all better with proper IDs, but sometimes it works like that, usually just once per dataset. сб, 22 дек. 2018 г. в 19:31, Tom Lane <t...@sss.pgh.pa.us>: > Simon Riggs <si...@2ndquadrant.com> writes: > > On Sat, 22 Dec 2018 at 04:31, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> BTW, if we're to start taking joins on TID seriously, we should also > >> add the missing hash opclass for TID, so that you can do hash joins > >> when dealing with a lot of rows. > > > I don't think we are trying to do TID joins more seriously, just fix a > > special case. > > The case cited requires the batches of work to be small, so nested loops > > works fine. > > Looks to me that Edmund is trying to solve the same problem. If so, this > is > > the best solution. > > No, I think what Edmund is on about is unrelated, except that it touches > some of the same code. He's interested in problems like "find the last > few tuples in this table". You can solve that today, with e.g. > "SELECT ... WHERE ctid >= '(n,1)'", but you get a stupidly inefficient > plan. If we think that's a use-case worth supporting then it'd be > reasonable to provide less inefficient implementation(s). > > What I'm thinking about in this thread is joins on TID, which we have only > very weak support for today --- you'll basically always wind up with a > mergejoin, which requires full-table scan and sort of its inputs. Still, > that's better than a naive nestloop, and for years we've been figuring > that that was good enough. Several people in the other thread that > I cited felt that that isn't good enough. But if we think it's worth > taking seriously, then IMO we need to add both parameterized scans (for > nestloop-with-inner-fetch-by-tid) and hash join, because each of those > can dominate depending on how many tuples you're joining. > > regards, tom lane > > -- Darafei Praliaskouski Support me: http://patreon.com/komzpa