2016-06-11 13:47 GMT+03:00 Greg Navis <cont...@gregnavis.com>:

> I made some progress but I'm stuck. I'm focused on GiST for now. Please
> ignore sloppy naming for now.
>
> I made the following changes to pg_trgm--1.2.sql:
>
> CREATE TYPE pg_trgm_match AS (match TEXT, threshold REAL);
>
> CREATE OR REPLACE FUNCTION trgm_check_match(string TEXT, match
> pg_trgm_match) RETURNS bool AS $$
> BEGIN
>     RETURN match.match <-> string <= 1 - match.threshold;
> END;
> $$ LANGUAGE plpgsql;
>
> CREATE OPERATOR %%(leftarg = text, rightarg = pg_trgm_match,
> procedure=trgm_check_match);
>
> ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
>                OPERATOR                9               %% (text,
> pg_trgm_match);
>

You can overload existing % operator:

ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
OPERATOR        9       % (text, pg_trgm_match);


>
> It does indeed make PostgreSQL complain about undefined strategy 9. I
> added the following define to trgm.h:
>
> #define ThresholdStrategyNumber 9
>
> It seems StrategyNumber is used in gtrgm_consistent and gtrgm_distance.
>
> In gtrgm_consistent, I need change the way `nlimit` is obtained:
>
> nlimit = (strategy == SimilarityStrategyNumber) ?
> similarity_threshold : word_similarity_threshold;
>
> I need to add a case for ThresholdStrategyNumber and extract `nlimit` from
> the argument of `pg_trgm_match`. I'm not sure what to do in
> `gtrgm_distance`.
>
> My questions:
>
> 1a. Is it possible to make `gtrgm_consistent` accept `text` or
> `pg_trgm_match` as the second argument?
>

I think you can change definition of the gtrgm_consistent() in .sql file in
CREATE FUNCTION and CREATE OPERATOR CLASS commands to:

gtrgm_consistent(internal,anynonarray,smallint,oid,internal)

But I do not sure that anynonarray is good here.


> 1b. What's the equivalent of `match.match` and `match.threshold` (where
> `match` is a `pg_trgm_match`) in C?
>

After changing the definition you can extract values from composite type in
the gtrgm_consistent(). I think the code in the beginning of function may
looks like this:

if (strategy == SimilarityStrategyNumber ||
strategy == WordSimilarityStrategyNumber)
{
query = PG_GETARG_TEXT_P(1);
nlimit = (strategy == SimilarityStrategyNumber) ?
similarity_threshold : word_similarity_threshold;
}
else if (strategy == ThresholdStrategyNumber)
{
HeapTupleHeader query_match = PG_GETARG_HEAPTUPLEHEADER(1);
Oid tupType = HeapTupleHeaderGetTypeId(query_match);
int32 tupTypmod = HeapTupleHeaderGetTypMod(query_match);
TupleDesc tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
HeapTupleData tuple;
bool isnull;

tuple.t_len = HeapTupleHeaderGetDatumLength(query_match);
ItemPointerSetInvalid(&(tuple.t_self));
tuple.t_tableOid = InvalidOid;
tuple.t_data = query_match;

query = DatumGetTextP(fastgetattr(&tuple, 1, tupdesc, &isnull));
nlimit = DatumGetFloat4(fastgetattr(&tuple, 2, tupdesc, &isnull));

ReleaseTupleDesc(tupdesc);
}
else
query = PG_GETARG_TEXT_P(1);

After this code you should execute the query using index:

select t,similarity(t,'qwertyu0988') as sml from test_trgm where t %
row('qwertyu0988', 0.6)::pg_trgm_match;

I got the query from the regression test. And of course the code need to be
checked for bugs.


> 2. What to do with `gtrgm_distance`?
>

You do not need to change gtrgm_distance(). It is used only in ORDER BY
clause to calculate distances. To calculate distance you do not need
threshold.


>
> Thanks for help.
> --
> Greg Navis
> I help tech companies to scale Heroku-hosted Rails apps.
> Free, biweekly scalability newsletter for SaaS CEOs
> <http://www.gregnavis.com/newsletter/>
>
>


-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Reply via email to