Here's a v5 which is v4 + pgindent
On Fri, Apr 3, 2026 at 12:56 PM Florents Tselai <[email protected]> wrote: > > > > On Mon, Mar 2, 2026 at 5:44 AM Chao Li <[email protected]> wrote: > >> >> >> > On Feb 27, 2026, at 13:59, Florents Tselai <[email protected]> >> wrote: >> > >> > >> > >> > On Thu, Feb 26, 2026 at 8:48 AM Chao Li <[email protected]> wrote: >> > >> > >> > > On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]> >> wrote: >> > > >> > > >> > > >> > > >> > > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai < >> [email protected]> wrote: >> > > Hi, >> > > >> > > in real-life I work a lot with json & fts search, here's a feature >> I've always wished I had, >> > > but never tackle it. Until yesterday that is. >> > > >> > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && >> @.body tsmatch "performance")'); >> > > >> > > This patch introduces a tsmatch boolean operator to the JSONPath >> engine. >> > > By integrating FTS natively into path expressions, >> > > this operator allows for high-precision filtering of nested JSONB >> structures— >> > > solving issues with structural ambiguity and query complexity. >> > > >> > > Currently, users must choose between two suboptimal paths for FTS-ing >> nested JSON: >> > > - Imprecise Global Indexing >> > > jsonb_to_tsvector aggregates text into a flat vector. >> > > This ignores JSON boundaries, leading to false positives when the >> same key (e.g., "body") >> > > appears in different contexts (e.g., a "Product Description" vs. a >> "Customer Review"). >> > > >> > > - Complex SQL Workarounds >> > > Achieving 100% precision requires unnesting the document via >> jsonb_array_elements and LATERAL joins. >> > > This leads to verbose SQL and high memory overhead from generating >> intermediate heap tuples. >> > > >> > > One of the most significant advantages of tsmatch is its ability to >> participate in multi-condition predicates >> > > within the same JSON object - something jsonb_to_tsvector cannot do. >> > > >> > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && >> @.body tsmatch "performance")'); >> > > >> > > In a flat vector, the association between "Alice" and "performance" >> is lost. >> > > tsmatch preserves this link by evaluating the FTS predicate in-place >> during path traversal. >> > > >> > > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly >> define an FTS operator, >> > > tsmatch is architecturally modeled after the standard-defined >> like_regex. >> > > >> > > The implementation follows the like_regex precedent: >> > > it is a non-indexable predicate that relies on GIN path-matching for >> pruning and heap re-checks for precision. >> > > Caching is scoped to the JsonPathExecContext, >> > > ensuring 'compile-once' efficiency per execution without violating >> the stability requirements of prepared statements. >> > > >> > > This initial implementation uses plainto_tsquery. >> > > However, the grammar is designed to support a "mode" flag (similar to >> like_regex flags) >> > > in future iterations to toggle between to_tsquery, >> websearch_to_tsquery, and phraseto_tsquery. >> > > >> > > Here's a v2, that implements the tsqparser clause >> > > >> > > So this should now work too >> > > >> > > select jsonb_path_query_array('["fast car", "slow car", "fast and >> furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") >> <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> >> > >> > Hi Florents, >> > >> > Grant pinged me about this. I can review it in coming days. Can you >> please rebase it? I failed to apply to current master. Also, the CF >> reported a failure test case, please take a look. >> > >> > Hi Evan, >> > thanks for having a look. The conflict was due to the intro of >> pg_fallthrough. Not related to this patch . >> > >> > I noticed the failure too, but I'm having a hard time reproducing it >> tbh. >> > This fails for Debian Trixie with Meson. The same with Autoconf >> passes... >> > >> > https://github.com/Florents-Tselai/postgres/runs/65098077968 >> > >> > >> > >> > >> > <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> >> >> I have reviewed v3 and traced a few test cases. Here comes my review >> comments: >> >> 1 >> ``` >> + <replaceable>string</replaceable> <literal>tsmatch</literal> >> <replaceable>string</replaceable> >> + <optional> <literal>tsconfig</literal> >> <replaceable>string</replaceable> </optional> >> + <optional> <literal>tsqparser</literal> >> <replaceable>string</replaceable> </optional> >> ``` >> >> For all “replaceable”, instead of “string”, would it be better to use >> something more descriptive? For example: >> ``` >> <replaceable>json_string</replaceable> <literal>tsmatch</literal> >> <replaceable>query</replaceable> >> <optional> <literal>tsconfig</literal> >> <replaceable>config_name</replaceable> </optional> >> <optional> <literal>tsqparser</literal> >> <replaceable>parser_mode</replaceable> </optional> >> ``` >> >> 2 - jsonpath_gram.y >> ``` >> +static bool makeItemTsMatch(JsonPathParseItem *doc, >> + JsonPathString >> *tsquery, >> + JsonPathString >> *tsconfig, >> + JsonPathString >> *tsquery_parser, >> + >> JsonPathParseItem ** result, >> + struct Node >> *escontext); >> ``` >> >> Format Nit: Looking at the existing code, the J in the second and >> following lines, should be placed in the same column as the J in the first >> line. >> >> 3 - jsonpath_gram.y >> ``` >> + | expr TSMATCH_P STRING_P >> + { >> + JsonPathParseItem *jppitem; >> + /* Pass NULL for tsconfig (3rd) and NULL for >> tsquery_parser (4th) */ >> + if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, >> escontext)) >> + YYABORT; >> + $$ = jppitem; >> + } >> + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P >> + { >> + JsonPathParseItem *jppitem; >> + /* Pass NULL for tsquery_parser (4th) */ >> + if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, >> escontext)) >> + YYABORT; >> + $$ = jppitem; >> + } >> + | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P >> + { >> + JsonPathParseItem *jppitem; >> + /* Pass NULL for tsconfig (3rd) */ >> + if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, >> escontext)) >> + YYABORT; >> + $$ = jppitem; >> + } >> + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P >> STRING_P >> + { >> + JsonPathParseItem *jppitem; >> + if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, >> escontext)) >> + YYABORT; >> + $$ = jppitem; >> + } >> ``` >> >> Feels a little redundant, repeatedly calls makeItemTsMatch. See the >> attached diff for a simplification. But my version is a bit longer in terms >> of number of lines. So, up to you. >> >> 4 - jsonpath_gram.y >> ``` >> +static bool >> +makeItemTsMatch(JsonPathParseItem *doc, >> + JsonPathString *tsquery, >> + JsonPathString *tsconfig, >> + JsonPathString *tsquery_parser, >> + JsonPathParseItem **result, >> + struct Node *escontext) >> ``` >> >> makeItemTsMatch doesn’t need to return a bool. Actually, now it never >> returns false, instead, it just ereport(ERROR). >> >> 5 - jsonpath.h >> ``` >> + struct >> + { >> + int32 doc; >> + char *tsquery; >> + uint32 tsquerylen; >> + int32 tsconfig; >> + char *tsqparser; >> + uint32 tsqparser_len; >> + } tsmatch; >> >> + struct >> + { >> + JsonPathParseItem *doc; >> + char *tsquery; >> + uint32 tsquerylen; >> + JsonPathParseItem *tsconfig; >> + char *tsqparser; >> + uint32 tsqparser_len; >> + } tsmatch; >> } value; >> ``` >> >> tsquerylen doesn’t have _ before len, and tsqparser_len, would it be >> better to make naming conventions consistent in the same structure? >> >> 6 - jsonpath_exec.c >> ``` >> #include "tsearch/ts_utils.h" >> #include "tsearch/ts_cache.h" >> #include "utils/regproc.h" >> #include "catalog/namespace.h" >> >> static JsonPathBool >> executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, >> void *param) >> ``` >> >> Why don’t put these includes to the header section together with other >> includes? >> >> 7 - jsonpath_exec.c >> ``` >> + else >> + { >> + /* >> + * Fallback or Error for unknown flags >> (should be caught by >> + * parser) >> + */ >> + ereport(ERROR, >> + >> (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("unrecognized >> tsqparser flag"))); >> + } >> ``` >> >> This “else” should never be entered as the same check has been done by >> makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable(). >> >> 8 - jsonpath_exec.c >> ``` >> + /* Setup Context (Run ONLY once per predicate) */ >> + if (!cxt->initialized) >> ``` >> >> While tracing this SQL: >> ``` >> evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb >> evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")'; >> ?column? >> ---------- >> >> (1 row) >> ``` >> >> I noticed that, when process “jogging”, cxt->initialized is still false, >> meaning that, the cxt is not reused across array items. Given the same >> tsconfig should apply to all array items, I think cxt should be reused. >> >> 9 - jsonpath_exec.c >> ``` >> + /* Select Parser and Compile Query */ >> + parser_mode = jsp->content.tsmatch.tsqparser; >> + parser_len = jsp->content.tsmatch.tsqparser_len; >> + >> + if (parser_len > 0) >> + { >> + /* Dispatch based on flag */ >> + if (pg_strncasecmp(parser_mode, "pl", parser_len) >> == 0) >> ``` >> >> Nit: parser_mode is only used inside if (parser_len > 0), it can be >> defined inside the “if”. >> >> 10 - jsonpath_gram.y >> ``` >> + ereport(ERROR, >> + (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("invalid tsquery_parser >> value: \"%s\"", tsquery_parser->val), >> + errhint("Valid values are >> \"pl\", \"ph\", and \"w\"."))); >> ``` >> >> When tested a case with an invalid parser, I got: >> ``` >> evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb >> >> @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser >> "pss")'; >> ERROR: invalid tsquery_parser value: "pss @" >> LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar... >> ^ >> HINT: Valid values are "pl", "ph", and "w". >> ``` >> >> You can see the it shows a bad looking invalid value. I think that’s >> because tsquery_parser->val is not NULL terminated. I fixed this problem >> with: >> ``` >> errmsg("invalid tsquery_parser value: \"%.*s\"", (int) >> tsquery_parser->len, tsquery_parser->val), >> ``` >> >> This change is also included in the attached diff file. >> >> 11 - jsonpath.c >> ``` >> + if (printBracketes) >> + appendStringInfoChar(buf, ')'); >> + break; >> + >> if (printBracketes) >> appendStringInfoChar(buf, ')'); >> ``` >> >> Duplicate code. Looks like a copy-pasto. >> >> 12 - jsonpath.c >> ``` >> + /* Write the Main Query String */ >> + appendBinaryStringInfo(buf, >> + >> &item->value.tsmatch.tsquerylen, >> + >> sizeof(item->value.tsmatch.tsquerylen)); >> + appendBinaryStringInfo(buf, >> + >> item->value.tsmatch.tsquery, >> + >> item->value.tsmatch.tsquerylen); >> + appendStringInfoChar(buf, '\0'); >> ``` >> >> I don’t think we need to manually append ‘\0’ after >> appendBinaryStringInfo. Looking at the header comment of >> appendBinaryStringInfo, it says that a trailing null will be added. >> ``` >> /* >> * appendBinaryStringInfo >> * >> * Append arbitrary binary data to a StringInfo, allocating more space >> * if necessary. Ensures that a trailing null byte is present. >> */ >> void >> appendBinaryStringInfo(StringInfo str, const void *data, int datalen) >> ``` >> > > Here's a v4 which incorporates most of Evan's comments & feedback > - shifted tsquery compilation logic to use persistent cache within > JsonPathExecContext > - Fixed a binary serialization alignment issue which caused Dixie to fail > earlier > - I've refactor and simplified the grammar per Evan's input by adding a > tsmatch_opts rule. > - Also updated the docs per Evan's comments > > >
v5-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch
Description: Binary data
