That produces pretty much the same results as the CROSS JOIN I was using before. Because each "my_value" in the table are different, if I group on just their value then I will always have the full result set and a bunch of essentially duplicated results.
Any other ideas/options? On Thu, May 14, 2015 at 12:08 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > > On Thu, May 14, 2015 at 11:58 AM, Cory Tucker <cory.tuc...@gmail.com> > wrote: > >> [pg version 9.3 or 9.4] >> >> Suppose I have a simple table: >> >> create table data ( >> my_value TEXT NOT NULL >> ); >> CREATE INDEX idx_my_value ON data USING gin(my_value gin_trgm_ops); >> >> >> Now I would like to essentially do group by to get a count of all the >> values that are sufficiently similar. I can do it using something like a >> CROSS JOIN to join the table on itself, but then I still am getting all the >> rows with duplicate counts. >> >> Is there a way to do a group by query and only return a single "my_value" >> column and a count of the number of times other values are similar while >> also not returning the included similar values in the output, too? >> >> > Concept below - not bothering to lookup the functions/operators for > pg_trgm: > > SELECT my_value_src, count(*) > FROM (SELECT my_value AS my_value_src FROM data) src > JOIN (SELECT my_value AS my_value_compareto FROM data) comparedto > ON ( func(my_value_src, my_value_compareto) < # ) > GROUP BY my_value_src > > David J. > >