Here's a v5 which is v4 + pgindent



On Fri, Apr 3, 2026 at 12:56 PM Florents Tselai <[email protected]>
wrote:

>
>
>
> On Mon, Mar 2, 2026 at 5:44 AM Chao Li <[email protected]> wrote:
>
>>
>>
>> > On Feb 27, 2026, at 13:59, Florents Tselai <[email protected]>
>> wrote:
>> >
>> >
>> >
>> > On Thu, Feb 26, 2026 at 8:48 AM Chao Li <[email protected]> wrote:
>> >
>> >
>> > > On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]>
>> wrote:
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <
>> [email protected]> wrote:
>> > > Hi,
>> > >
>> > > in real-life I work a lot with json & fts search, here's a feature
>> I've always wished I had,
>> > > but never tackle it. Until yesterday that is.
>> > >
>> > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" &&
>> @.body tsmatch "performance")');
>> > >
>> > > This patch introduces a tsmatch boolean operator to the JSONPath
>> engine.
>> > > By integrating FTS natively into path expressions,
>> > > this operator allows for high-precision filtering of nested JSONB
>> structures—
>> > > solving issues with structural ambiguity and query complexity.
>> > >
>> > > Currently, users must choose between two suboptimal paths for FTS-ing
>> nested JSON:
>> > > - Imprecise Global Indexing
>> > > jsonb_to_tsvector aggregates text into a flat vector.
>> > > This ignores JSON boundaries, leading to false positives when the
>> same key (e.g., "body")
>> > > appears in different contexts (e.g., a "Product Description" vs. a
>> "Customer Review").
>> > >
>> > > - Complex SQL Workarounds
>> > > Achieving 100% precision requires unnesting the document via
>> jsonb_array_elements and LATERAL joins.
>> > > This leads to verbose SQL and high memory overhead from generating
>> intermediate heap tuples.
>> > >
>> > > One of the most significant advantages of tsmatch is its ability to
>> participate in multi-condition predicates
>> > > within the same JSON object - something jsonb_to_tsvector cannot do.
>> > >
>> > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" &&
>> @.body tsmatch "performance")');
>> > >
>> > > In a flat vector, the association between "Alice" and "performance"
>> is lost.
>> > > tsmatch preserves this link by evaluating the FTS predicate in-place
>> during path traversal.
>> > >
>> > > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly
>> define an FTS operator,
>> > > tsmatch is architecturally modeled after the standard-defined
>> like_regex.
>> > >
>> > > The implementation follows the like_regex precedent:
>> > > it is a non-indexable predicate that relies on GIN path-matching for
>> pruning and heap re-checks for precision.
>> > > Caching is scoped to the JsonPathExecContext,
>> > > ensuring 'compile-once' efficiency per execution without violating
>> the stability requirements of prepared statements.
>> > >
>> > > This initial implementation uses plainto_tsquery.
>> > > However, the grammar is designed to support a "mode" flag (similar to
>> like_regex flags)
>> > > in future iterations to toggle between to_tsquery,
>> websearch_to_tsquery, and phraseto_tsquery.
>> > >
>> > > Here's a v2, that implements the tsqparser clause
>> > >
>> > > So this should now work too
>> > >
>> > > select jsonb_path_query_array('["fast car", "slow car", "fast and
>> furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w")
>> <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
>> >
>> > Hi Florents,
>> >
>> > Grant pinged me about this. I can review it in coming days. Can you
>> please rebase it? I failed to apply to current master. Also, the CF
>> reported a failure test case, please take a look.
>> >
>> >  Hi Evan,
>> > thanks for having a look. The conflict was due to the intro of
>> pg_fallthrough. Not related to this patch .
>> >
>> > I noticed the failure too, but I'm having a hard time reproducing it
>> tbh.
>> > This fails for Debian Trixie with Meson. The same with Autoconf
>> passes...
>> >
>> > https://github.com/Florents-Tselai/postgres/runs/65098077968
>> >
>> >
>> >
>> >
>> > <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
>>
>> I have reviewed v3 and traced a few test cases. Here comes my review
>> comments:
>>
>> 1
>> ```
>> +        <replaceable>string</replaceable> <literal>tsmatch</literal>
>> <replaceable>string</replaceable>
>> +        <optional> <literal>tsconfig</literal>
>> <replaceable>string</replaceable> </optional>
>> +        <optional> <literal>tsqparser</literal>
>> <replaceable>string</replaceable> </optional>
>> ```
>>
>> For all “replaceable”, instead of “string”, would it be better to use
>> something more descriptive? For example:
>> ```
>> <replaceable>json_string</replaceable> <literal>tsmatch</literal>
>> <replaceable>query</replaceable>
>> <optional> <literal>tsconfig</literal>
>> <replaceable>config_name</replaceable> </optional>
>> <optional> <literal>tsqparser</literal>
>> <replaceable>parser_mode</replaceable> </optional>
>> ```
>>
>> 2 - jsonpath_gram.y
>> ```
>> +static bool makeItemTsMatch(JsonPathParseItem *doc,
>> +                                                         JsonPathString
>> *tsquery,
>> +                                                         JsonPathString
>> *tsconfig,
>> +                                                         JsonPathString
>> *tsquery_parser,
>> +
>>  JsonPathParseItem ** result,
>> +                                                         struct Node
>> *escontext);
>> ```
>>
>> Format Nit: Looking at the existing code, the J in the second and
>> following lines, should be placed in the same column as the J in the first
>> line.
>>
>> 3 - jsonpath_gram.y
>> ```
>> +       | expr TSMATCH_P STRING_P
>> +       {
>> +               JsonPathParseItem *jppitem;
>> +               /* Pass NULL for tsconfig (3rd) and NULL for
>> tsquery_parser (4th) */
>> +               if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem,
>> escontext))
>> +                  YYABORT;
>> +               $$ = jppitem;
>> +       }
>> +       | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P
>> +       {
>> +               JsonPathParseItem *jppitem;
>> +               /* Pass NULL for tsquery_parser (4th) */
>> +               if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem,
>> escontext))
>> +                  YYABORT;
>> +               $$ = jppitem;
>> +       }
>> +       | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P
>> +       {
>> +               JsonPathParseItem *jppitem;
>> +               /* Pass NULL for tsconfig (3rd) */
>> +               if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem,
>> escontext))
>> +                  YYABORT;
>> +               $$ = jppitem;
>> +       }
>> +       | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P
>> STRING_P
>> +       {
>> +               JsonPathParseItem *jppitem;
>> +               if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem,
>> escontext))
>> +                  YYABORT;
>> +               $$ = jppitem;
>> +       }
>> ```
>>
>> Feels a little redundant, repeatedly calls makeItemTsMatch. See the
>> attached diff for a simplification. But my version is a bit longer in terms
>> of number of lines. So, up to you.
>>
>> 4 - jsonpath_gram.y
>> ```
>> +static bool
>> +makeItemTsMatch(JsonPathParseItem *doc,
>> +                        JsonPathString *tsquery,
>> +                        JsonPathString *tsconfig,
>> +                        JsonPathString *tsquery_parser,
>> +                        JsonPathParseItem **result,
>> +                        struct Node *escontext)
>> ```
>>
>> makeItemTsMatch doesn’t need to return a bool. Actually, now it never
>> returns false, instead, it just ereport(ERROR).
>>
>> 5 - jsonpath.h
>> ```
>> +               struct
>> +               {
>> +                       int32           doc;
>> +                       char       *tsquery;
>> +                       uint32          tsquerylen;
>> +                       int32           tsconfig;
>> +                       char       *tsqparser;
>> +                       uint32          tsqparser_len;
>> +               }                       tsmatch;
>>
>> +               struct
>> +               {
>> +                       JsonPathParseItem *doc;
>> +                       char       *tsquery;
>> +                       uint32          tsquerylen;
>> +                       JsonPathParseItem *tsconfig;
>> +                       char       *tsqparser;
>> +                       uint32          tsqparser_len;
>> +               }                       tsmatch;
>>         }                       value;
>> ```
>>
>> tsquerylen doesn’t have _ before len, and tsqparser_len, would it be
>> better to make naming conventions consistent in the same structure?
>>
>> 6 - jsonpath_exec.c
>> ```
>> #include "tsearch/ts_utils.h"
>> #include "tsearch/ts_cache.h"
>> #include "utils/regproc.h"
>> #include "catalog/namespace.h"
>>
>> static JsonPathBool
>> executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
>> void *param)
>> ```
>>
>> Why don’t put these includes to the header section together with other
>> includes?
>>
>> 7 - jsonpath_exec.c
>> ```
>> +                       else
>> +                       {
>> +                               /*
>> +                                * Fallback or Error for unknown flags
>> (should be caught by
>> +                                * parser)
>> +                                */
>> +                               ereport(ERROR,
>> +
>>  (errcode(ERRCODE_SYNTAX_ERROR),
>> +                                                errmsg("unrecognized
>> tsqparser flag")));
>> +                       }
>> ```
>>
>> This “else” should never be entered as the same check has been done by
>> makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable().
>>
>> 8 -  jsonpath_exec.c
>> ```
>> +       /* Setup Context (Run ONLY once per predicate) */
>> +       if (!cxt->initialized)
>> ```
>>
>> While tracing this SQL:
>> ```
>> evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb
>> evantest-#   @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")';
>>  ?column?
>> ----------
>>
>> (1 row)
>> ```
>>
>> I noticed that, when process “jogging”, cxt->initialized is still false,
>> meaning that, the cxt is not reused across array items. Given the same
>> tsconfig should apply to all array items, I think cxt should be reused.
>>
>> 9 -  jsonpath_exec.c
>> ```
>> +               /* Select Parser and Compile Query */
>> +               parser_mode = jsp->content.tsmatch.tsqparser;
>> +               parser_len = jsp->content.tsmatch.tsqparser_len;
>> +
>> +               if (parser_len > 0)
>> +               {
>> +                       /* Dispatch based on flag */
>> +                       if (pg_strncasecmp(parser_mode, "pl", parser_len)
>> == 0)
>> ```
>>
>> Nit: parser_mode is only used inside if (parser_len > 0), it can be
>> defined inside the “if”.
>>
>> 10 - jsonpath_gram.y
>> ```
>> +                        ereport(ERROR,
>> +                                        (errcode(ERRCODE_SYNTAX_ERROR),
>> +                                         errmsg("invalid tsquery_parser
>> value: \"%s\"", tsquery_parser->val),
>> +                                         errhint("Valid values are
>> \"pl\", \"ph\", and \"w\".")));
>> ```
>>
>> When tested a case with an invalid parser, I got:
>> ```
>> evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb
>>
>>             @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser
>> "pss")';
>> ERROR:  invalid tsquery_parser value: "pss                             @"
>> LINE 2:   @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar...
>>              ^
>> HINT:  Valid values are "pl", "ph", and "w".
>> ```
>>
>> You can see the it shows a bad looking invalid value. I think that’s
>> because tsquery_parser->val is not NULL terminated. I fixed this problem
>> with:
>> ```
>> errmsg("invalid tsquery_parser value: \"%.*s\"", (int)
>> tsquery_parser->len, tsquery_parser->val),
>> ```
>>
>> This change is also included in the attached diff file.
>>
>> 11 - jsonpath.c
>> ```
>> +                       if (printBracketes)
>> +                               appendStringInfoChar(buf, ')');
>> +                       break;
>> +
>>                         if (printBracketes)
>>                                 appendStringInfoChar(buf, ')');
>> ```
>>
>> Duplicate code. Looks like a copy-pasto.
>>
>> 12 - jsonpath.c
>> ```
>> +                               /* Write the Main Query String */
>> +                               appendBinaryStringInfo(buf,
>> +
>>   &item->value.tsmatch.tsquerylen,
>> +
>>   sizeof(item->value.tsmatch.tsquerylen));
>> +                               appendBinaryStringInfo(buf,
>> +
>>   item->value.tsmatch.tsquery,
>> +
>>   item->value.tsmatch.tsquerylen);
>> +                               appendStringInfoChar(buf, '\0');
>> ```
>>
>> I don’t think we need to manually append ‘\0’ after
>> appendBinaryStringInfo. Looking at the header comment of
>> appendBinaryStringInfo, it says that a trailing null will be added.
>> ```
>> /*
>> * appendBinaryStringInfo
>> *
>> * Append arbitrary binary data to a StringInfo, allocating more space
>> * if necessary. Ensures that a trailing null byte is present.
>> */
>> void
>> appendBinaryStringInfo(StringInfo str, const void *data, int datalen)
>> ```
>>
>
> Here's a v4 which incorporates most of Evan's comments & feedback
> - shifted tsquery compilation logic to use persistent cache within
> JsonPathExecContext
> - Fixed a binary serialization alignment issue which caused Dixie to fail
> earlier
> - I've refactor and simplified the grammar per Evan's input by adding a
> tsmatch_opts rule.
> - Also updated the docs per Evan's comments
>
>
>

Attachment: v5-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch
Description: Binary data

Reply via email to