On Mon, Mar 2, 2026 at 5:44 AM Chao Li <[email protected]> wrote:
> > > > On Feb 27, 2026, at 13:59, Florents Tselai <[email protected]> > wrote: > > > > > > > > On Thu, Feb 26, 2026 at 8:48 AM Chao Li <[email protected]> wrote: > > > > > > > On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]> > wrote: > > > > > > > > > > > > > > > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai < > [email protected]> wrote: > > > Hi, > > > > > > in real-life I work a lot with json & fts search, here's a feature > I've always wished I had, > > > but never tackle it. Until yesterday that is. > > > > > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && > @.body tsmatch "performance")'); > > > > > > This patch introduces a tsmatch boolean operator to the JSONPath > engine. > > > By integrating FTS natively into path expressions, > > > this operator allows for high-precision filtering of nested JSONB > structures— > > > solving issues with structural ambiguity and query complexity. > > > > > > Currently, users must choose between two suboptimal paths for FTS-ing > nested JSON: > > > - Imprecise Global Indexing > > > jsonb_to_tsvector aggregates text into a flat vector. > > > This ignores JSON boundaries, leading to false positives when the same > key (e.g., "body") > > > appears in different contexts (e.g., a "Product Description" vs. a > "Customer Review"). > > > > > > - Complex SQL Workarounds > > > Achieving 100% precision requires unnesting the document via > jsonb_array_elements and LATERAL joins. > > > This leads to verbose SQL and high memory overhead from generating > intermediate heap tuples. > > > > > > One of the most significant advantages of tsmatch is its ability to > participate in multi-condition predicates > > > within the same JSON object - something jsonb_to_tsvector cannot do. > > > > > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && > @.body tsmatch "performance")'); > > > > > > In a flat vector, the association between "Alice" and "performance" is > lost. > > > tsmatch preserves this link by evaluating the FTS predicate in-place > during path traversal. > > > > > > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly > define an FTS operator, > > > tsmatch is architecturally modeled after the standard-defined > like_regex. > > > > > > The implementation follows the like_regex precedent: > > > it is a non-indexable predicate that relies on GIN path-matching for > pruning and heap re-checks for precision. > > > Caching is scoped to the JsonPathExecContext, > > > ensuring 'compile-once' efficiency per execution without violating the > stability requirements of prepared statements. > > > > > > This initial implementation uses plainto_tsquery. > > > However, the grammar is designed to support a "mode" flag (similar to > like_regex flags) > > > in future iterations to toggle between to_tsquery, > websearch_to_tsquery, and phraseto_tsquery. > > > > > > Here's a v2, that implements the tsqparser clause > > > > > > So this should now work too > > > > > > select jsonb_path_query_array('["fast car", "slow car", "fast and > furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") > <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> > > > > Hi Florents, > > > > Grant pinged me about this. I can review it in coming days. Can you > please rebase it? I failed to apply to current master. Also, the CF > reported a failure test case, please take a look. > > > > Hi Evan, > > thanks for having a look. The conflict was due to the intro of > pg_fallthrough. Not related to this patch . > > > > I noticed the failure too, but I'm having a hard time reproducing it tbh. > > This fails for Debian Trixie with Meson. The same with Autoconf passes... > > > > https://github.com/Florents-Tselai/postgres/runs/65098077968 > > > > > > > > > > <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> > > I have reviewed v3 and traced a few test cases. Here comes my review > comments: > > 1 > ``` > + <replaceable>string</replaceable> <literal>tsmatch</literal> > <replaceable>string</replaceable> > + <optional> <literal>tsconfig</literal> > <replaceable>string</replaceable> </optional> > + <optional> <literal>tsqparser</literal> > <replaceable>string</replaceable> </optional> > ``` > > For all “replaceable”, instead of “string”, would it be better to use > something more descriptive? For example: > ``` > <replaceable>json_string</replaceable> <literal>tsmatch</literal> > <replaceable>query</replaceable> > <optional> <literal>tsconfig</literal> > <replaceable>config_name</replaceable> </optional> > <optional> <literal>tsqparser</literal> > <replaceable>parser_mode</replaceable> </optional> > ``` > > 2 - jsonpath_gram.y > ``` > +static bool makeItemTsMatch(JsonPathParseItem *doc, > + JsonPathString > *tsquery, > + JsonPathString > *tsconfig, > + JsonPathString > *tsquery_parser, > + > JsonPathParseItem ** result, > + struct Node > *escontext); > ``` > > Format Nit: Looking at the existing code, the J in the second and > following lines, should be placed in the same column as the J in the first > line. > > 3 - jsonpath_gram.y > ``` > + | expr TSMATCH_P STRING_P > + { > + JsonPathParseItem *jppitem; > + /* Pass NULL for tsconfig (3rd) and NULL for > tsquery_parser (4th) */ > + if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, > escontext)) > + YYABORT; > + $$ = jppitem; > + } > + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P > + { > + JsonPathParseItem *jppitem; > + /* Pass NULL for tsquery_parser (4th) */ > + if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, > escontext)) > + YYABORT; > + $$ = jppitem; > + } > + | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P > + { > + JsonPathParseItem *jppitem; > + /* Pass NULL for tsconfig (3rd) */ > + if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, > escontext)) > + YYABORT; > + $$ = jppitem; > + } > + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P > STRING_P > + { > + JsonPathParseItem *jppitem; > + if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, > escontext)) > + YYABORT; > + $$ = jppitem; > + } > ``` > > Feels a little redundant, repeatedly calls makeItemTsMatch. See the > attached diff for a simplification. But my version is a bit longer in terms > of number of lines. So, up to you. > > 4 - jsonpath_gram.y > ``` > +static bool > +makeItemTsMatch(JsonPathParseItem *doc, > + JsonPathString *tsquery, > + JsonPathString *tsconfig, > + JsonPathString *tsquery_parser, > + JsonPathParseItem **result, > + struct Node *escontext) > ``` > > makeItemTsMatch doesn’t need to return a bool. Actually, now it never > returns false, instead, it just ereport(ERROR). > > 5 - jsonpath.h > ``` > + struct > + { > + int32 doc; > + char *tsquery; > + uint32 tsquerylen; > + int32 tsconfig; > + char *tsqparser; > + uint32 tsqparser_len; > + } tsmatch; > > + struct > + { > + JsonPathParseItem *doc; > + char *tsquery; > + uint32 tsquerylen; > + JsonPathParseItem *tsconfig; > + char *tsqparser; > + uint32 tsqparser_len; > + } tsmatch; > } value; > ``` > > tsquerylen doesn’t have _ before len, and tsqparser_len, would it be > better to make naming conventions consistent in the same structure? > > 6 - jsonpath_exec.c > ``` > #include "tsearch/ts_utils.h" > #include "tsearch/ts_cache.h" > #include "utils/regproc.h" > #include "catalog/namespace.h" > > static JsonPathBool > executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, > void *param) > ``` > > Why don’t put these includes to the header section together with other > includes? > > 7 - jsonpath_exec.c > ``` > + else > + { > + /* > + * Fallback or Error for unknown flags > (should be caught by > + * parser) > + */ > + ereport(ERROR, > + > (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("unrecognized > tsqparser flag"))); > + } > ``` > > This “else” should never be entered as the same check has been done by > makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable(). > > 8 - jsonpath_exec.c > ``` > + /* Setup Context (Run ONLY once per predicate) */ > + if (!cxt->initialized) > ``` > > While tracing this SQL: > ``` > evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb > evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")'; > ?column? > ---------- > > (1 row) > ``` > > I noticed that, when process “jogging”, cxt->initialized is still false, > meaning that, the cxt is not reused across array items. Given the same > tsconfig should apply to all array items, I think cxt should be reused. > > 9 - jsonpath_exec.c > ``` > + /* Select Parser and Compile Query */ > + parser_mode = jsp->content.tsmatch.tsqparser; > + parser_len = jsp->content.tsmatch.tsqparser_len; > + > + if (parser_len > 0) > + { > + /* Dispatch based on flag */ > + if (pg_strncasecmp(parser_mode, "pl", parser_len) > == 0) > ``` > > Nit: parser_mode is only used inside if (parser_len > 0), it can be > defined inside the “if”. > > 10 - jsonpath_gram.y > ``` > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("invalid tsquery_parser > value: \"%s\"", tsquery_parser->val), > + errhint("Valid values are > \"pl\", \"ph\", and \"w\"."))); > ``` > > When tested a case with an invalid parser, I got: > ``` > evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb > > @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser > "pss")'; > ERROR: invalid tsquery_parser value: "pss @" > LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar... > ^ > HINT: Valid values are "pl", "ph", and "w". > ``` > > You can see the it shows a bad looking invalid value. I think that’s > because tsquery_parser->val is not NULL terminated. I fixed this problem > with: > ``` > errmsg("invalid tsquery_parser value: \"%.*s\"", (int) > tsquery_parser->len, tsquery_parser->val), > ``` > > This change is also included in the attached diff file. > > 11 - jsonpath.c > ``` > + if (printBracketes) > + appendStringInfoChar(buf, ')'); > + break; > + > if (printBracketes) > appendStringInfoChar(buf, ')'); > ``` > > Duplicate code. Looks like a copy-pasto. > > 12 - jsonpath.c > ``` > + /* Write the Main Query String */ > + appendBinaryStringInfo(buf, > + > &item->value.tsmatch.tsquerylen, > + > sizeof(item->value.tsmatch.tsquerylen)); > + appendBinaryStringInfo(buf, > + > item->value.tsmatch.tsquery, > + > item->value.tsmatch.tsquerylen); > + appendStringInfoChar(buf, '\0'); > ``` > > I don’t think we need to manually append ‘\0’ after > appendBinaryStringInfo. Looking at the header comment of > appendBinaryStringInfo, it says that a trailing null will be added. > ``` > /* > * appendBinaryStringInfo > * > * Append arbitrary binary data to a StringInfo, allocating more space > * if necessary. Ensures that a trailing null byte is present. > */ > void > appendBinaryStringInfo(StringInfo str, const void *data, int datalen) > ``` > Here's a v4 which incorporates most of Evan's comments & feedback - shifted tsquery compilation logic to use persistent cache within JsonPathExecContext - Fixed a binary serialization alignment issue which caused Dixie to fail earlier - I've refactor and simplified the grammar per Evan's input by adding a tsmatch_opts rule. - Also updated the docs per Evan's comments
v4-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch
Description: Binary data
