On Wed, Feb 1, 2012 at 10:57 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> Is spgist intended to support prefix searches with LIKE? > > Too lazy to look at the code right now, but I think indxpath.c contains > hardwired assumptions that LIKE prefix optimizations are only possible > with btree indexes. Possibly it would be worth relaxing that. (The > whole "special index operator" mechanism is undesirably special-purpose, > but I currently have no ideas about how to make it more flexible.)
Is it as simple as the attached? Given the set of operators that are available for spgist, one would assume that it's intended to handle both pattern-matching and regular comparison operators. On the flip side, spg_text_inner_consistent() seems to fall back to a full index-scan for the regular comparison operators, which makes me wonder why anyone would bother having the operator at all. Indeed, in my testing, it's slightly slower than a sequential scan: select sum(1) from person where last_name > 'WAR'; - btree: 113-114ms - btree with text_pattern_ops: can't do it - spgist: 920-935ms - sequential scan: 895-910ms With the attached patch, it seems to work as well as text_pattern_ops for pattern matching: select sum(1) from person where last_name LIKE 'WAR%'; - btree: can't do it - btree with text_pattern_ops: 6-7ms - spgist: 7-9ms - sequential scan: 170-180ms It's fine for equality, too: select sum(1) from person where last_name = 'WARNER'; - btree: 1.7-1.9ms - btree with text_pattern_ops: 1.7-1.9ms - spgist: 1.7-1.9ms - sequential scan: 160-165ms It seems that there's at least the potential for spgist indexes to be extremely efficient, given adequate knowledge about the collation under which comparisons are being done. If for example we're trying to find strings where last_name > 'WAR', we certainly need to chase down the subtrees where the first character is 'W', 'X', 'Y', or 'Z'. If the collation is case-insensitive, we might also need to chase 'w', 'x', 'y', and 'z'. And there might be other characters that need to be chased as well, depending on the collation: some collations might sort numbers before letters, while others might do the reverse. But it's likely that there are few if any collations where we need to chase down the subtree for 'Q' on this query, because anything beginning with 'Q' is likely to sort before anything beginning with 'WAR' pretty much anywhere. I'm not sure there's a good way to get that information without a lot of grotty hard-coding, but it seems like the potential is there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
spgist-pattern-match.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers