2016-06-11 13:47 GMT+03:00 Greg Navis <cont...@gregnavis.com>: > I made some progress but I'm stuck. I'm focused on GiST for now. Please > ignore sloppy naming for now. > > I made the following changes to pg_trgm--1.2.sql: > > CREATE TYPE pg_trgm_match AS (match TEXT, threshold REAL); > > CREATE OR REPLACE FUNCTION trgm_check_match(string TEXT, match > pg_trgm_match) RETURNS bool AS $$ > BEGIN > RETURN match.match <-> string <= 1 - match.threshold; > END; > $$ LANGUAGE plpgsql; > > CREATE OPERATOR %%(leftarg = text, rightarg = pg_trgm_match, > procedure=trgm_check_match); > > ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD > OPERATOR 9 %% (text, > pg_trgm_match); >
You can overload existing % operator: ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD OPERATOR 9 % (text, pg_trgm_match); > > It does indeed make PostgreSQL complain about undefined strategy 9. I > added the following define to trgm.h: > > #define ThresholdStrategyNumber 9 > > It seems StrategyNumber is used in gtrgm_consistent and gtrgm_distance. > > In gtrgm_consistent, I need change the way `nlimit` is obtained: > > nlimit = (strategy == SimilarityStrategyNumber) ? > similarity_threshold : word_similarity_threshold; > > I need to add a case for ThresholdStrategyNumber and extract `nlimit` from > the argument of `pg_trgm_match`. I'm not sure what to do in > `gtrgm_distance`. > > My questions: > > 1a. Is it possible to make `gtrgm_consistent` accept `text` or > `pg_trgm_match` as the second argument? > I think you can change definition of the gtrgm_consistent() in .sql file in CREATE FUNCTION and CREATE OPERATOR CLASS commands to: gtrgm_consistent(internal,anynonarray,smallint,oid,internal) But I do not sure that anynonarray is good here. > 1b. What's the equivalent of `match.match` and `match.threshold` (where > `match` is a `pg_trgm_match`) in C? > After changing the definition you can extract values from composite type in the gtrgm_consistent(). I think the code in the beginning of function may looks like this: if (strategy == SimilarityStrategyNumber || strategy == WordSimilarityStrategyNumber) { query = PG_GETARG_TEXT_P(1); nlimit = (strategy == SimilarityStrategyNumber) ? similarity_threshold : word_similarity_threshold; } else if (strategy == ThresholdStrategyNumber) { HeapTupleHeader query_match = PG_GETARG_HEAPTUPLEHEADER(1); Oid tupType = HeapTupleHeaderGetTypeId(query_match); int32 tupTypmod = HeapTupleHeaderGetTypMod(query_match); TupleDesc tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod); HeapTupleData tuple; bool isnull; tuple.t_len = HeapTupleHeaderGetDatumLength(query_match); ItemPointerSetInvalid(&(tuple.t_self)); tuple.t_tableOid = InvalidOid; tuple.t_data = query_match; query = DatumGetTextP(fastgetattr(&tuple, 1, tupdesc, &isnull)); nlimit = DatumGetFloat4(fastgetattr(&tuple, 2, tupdesc, &isnull)); ReleaseTupleDesc(tupdesc); } else query = PG_GETARG_TEXT_P(1); After this code you should execute the query using index: select t,similarity(t,'qwertyu0988') as sml from test_trgm where t % row('qwertyu0988', 0.6)::pg_trgm_match; I got the query from the regression test. And of course the code need to be checked for bugs. > 2. What to do with `gtrgm_distance`? > You do not need to change gtrgm_distance(). It is used only in ORDER BY clause to calculate distances. To calculate distance you do not need threshold. > > Thanks for help. > -- > Greg Navis > I help tech companies to scale Heroku-hosted Rails apps. > Free, biweekly scalability newsletter for SaaS CEOs > <http://www.gregnavis.com/newsletter/> > > -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company