> On Feb 27, 2026, at 13:59, Florents Tselai <[email protected]> wrote: > > > > On Thu, Feb 26, 2026 at 8:48 AM Chao Li <[email protected]> wrote: > > > > On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]> wrote: > > > > > > > > > > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <[email protected]> > > wrote: > > Hi, > > > > in real-life I work a lot with json & fts search, here's a feature I've > > always wished I had, > > but never tackle it. Until yesterday that is. > > > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body > > tsmatch "performance")'); > > > > This patch introduces a tsmatch boolean operator to the JSONPath engine. > > By integrating FTS natively into path expressions, > > this operator allows for high-precision filtering of nested JSONB > > structures— > > solving issues with structural ambiguity and query complexity. > > > > Currently, users must choose between two suboptimal paths for FTS-ing > > nested JSON: > > - Imprecise Global Indexing > > jsonb_to_tsvector aggregates text into a flat vector. > > This ignores JSON boundaries, leading to false positives when the same key > > (e.g., "body") > > appears in different contexts (e.g., a "Product Description" vs. a > > "Customer Review"). > > > > - Complex SQL Workarounds > > Achieving 100% precision requires unnesting the document via > > jsonb_array_elements and LATERAL joins. > > This leads to verbose SQL and high memory overhead from generating > > intermediate heap tuples. > > > > One of the most significant advantages of tsmatch is its ability to > > participate in multi-condition predicates > > within the same JSON object - something jsonb_to_tsvector cannot do. > > > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body > > tsmatch "performance")'); > > > > In a flat vector, the association between "Alice" and "performance" is lost. > > tsmatch preserves this link by evaluating the FTS predicate in-place during > > path traversal. > > > > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an > > FTS operator, > > tsmatch is architecturally modeled after the standard-defined like_regex. > > > > The implementation follows the like_regex precedent: > > it is a non-indexable predicate that relies on GIN path-matching for > > pruning and heap re-checks for precision. > > Caching is scoped to the JsonPathExecContext, > > ensuring 'compile-once' efficiency per execution without violating the > > stability requirements of prepared statements. > > > > This initial implementation uses plainto_tsquery. > > However, the grammar is designed to support a "mode" flag (similar to > > like_regex flags) > > in future iterations to toggle between to_tsquery, websearch_to_tsquery, > > and phraseto_tsquery. > > > > Here's a v2, that implements the tsqparser clause > > > > So this should now work too > > > > select jsonb_path_query_array('["fast car", "slow car", "fast and > > furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") > > <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> > > Hi Florents, > > Grant pinged me about this. I can review it in coming days. Can you please > rebase it? I failed to apply to current master. Also, the CF reported a > failure test case, please take a look. > > Hi Evan, > thanks for having a look. The conflict was due to the intro of > pg_fallthrough. Not related to this patch . > > I noticed the failure too, but I'm having a hard time reproducing it tbh. > This fails for Debian Trixie with Meson. The same with Autoconf passes... > > https://github.com/Florents-Tselai/postgres/runs/65098077968 > > > > > <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
I have reviewed v3 and traced a few test cases. Here comes my review comments:
1
```
+ <replaceable>string</replaceable> <literal>tsmatch</literal>
<replaceable>string</replaceable>
+ <optional> <literal>tsconfig</literal>
<replaceable>string</replaceable> </optional>
+ <optional> <literal>tsqparser</literal>
<replaceable>string</replaceable> </optional>
```
For all “replaceable”, instead of “string”, would it be better to use something
more descriptive? For example:
```
<replaceable>json_string</replaceable> <literal>tsmatch</literal>
<replaceable>query</replaceable>
<optional> <literal>tsconfig</literal> <replaceable>config_name</replaceable>
</optional>
<optional> <literal>tsqparser</literal> <replaceable>parser_mode</replaceable>
</optional>
```
2 - jsonpath_gram.y
```
+static bool makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString
*tsquery,
+ JsonPathString
*tsconfig,
+ JsonPathString
*tsquery_parser,
+ JsonPathParseItem **
result,
+ struct Node
*escontext);
```
Format Nit: Looking at the existing code, the J in the second and following
lines, should be placed in the same column as the J in the first line.
3 - jsonpath_gram.y
```
+ | expr TSMATCH_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser
(4th) */
+ if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) */
+ if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
```
Feels a little redundant, repeatedly calls makeItemTsMatch. See the attached
diff for a simplification. But my version is a bit longer in terms of number of
lines. So, up to you.
4 - jsonpath_gram.y
```
+static bool
+makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem **result,
+ struct Node *escontext)
```
makeItemTsMatch doesn’t need to return a bool. Actually, now it never returns
false, instead, it just ereport(ERROR).
5 - jsonpath.h
```
+ struct
+ {
+ int32 doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ int32 tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
+ struct
+ {
+ JsonPathParseItem *doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ JsonPathParseItem *tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
} value;
```
tsquerylen doesn’t have _ before len, and tsqparser_len, would it be better to
make naming conventions consistent in the same structure?
6 - jsonpath_exec.c
```
#include "tsearch/ts_utils.h"
#include "tsearch/ts_cache.h"
#include "utils/regproc.h"
#include "catalog/namespace.h"
static JsonPathBool
executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
void *param)
```
Why don’t put these includes to the header section together with other includes?
7 - jsonpath_exec.c
```
+ else
+ {
+ /*
+ * Fallback or Error for unknown flags (should
be caught by
+ * parser)
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized tsqparser
flag")));
+ }
```
This “else” should never be entered as the same check has been done by
makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable().
8 - jsonpath_exec.c
```
+ /* Setup Context (Run ONLY once per predicate) */
+ if (!cxt->initialized)
```
While tracing this SQL:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb
evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")';
?column?
----------
(1 row)
```
I noticed that, when process “jogging”, cxt->initialized is still false,
meaning that, the cxt is not reused across array items. Given the same tsconfig
should apply to all array items, I think cxt should be reused.
9 - jsonpath_exec.c
```
+ /* Select Parser and Compile Query */
+ parser_mode = jsp->content.tsmatch.tsqparser;
+ parser_len = jsp->content.tsmatch.tsqparser_len;
+
+ if (parser_len > 0)
+ {
+ /* Dispatch based on flag */
+ if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0)
```
Nit: parser_mode is only used inside if (parser_len > 0), it can be defined
inside the “if”.
10 - jsonpath_gram.y
```
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid tsquery_parser value:
\"%s\"", tsquery_parser->val),
+ errhint("Valid values are \"pl\",
\"ph\", and \"w\".")));
```
When tested a case with an invalid parser, I got:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb
@? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser "pss")';
ERROR: invalid tsquery_parser value: "pss@"
LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar...
^
HINT: Valid values are "pl", "ph", and "w".
```
You can see the it shows a bad looking invalid value. I think that’s because
tsquery_parser->val is not NULL terminated. I fixed this problem with:
```
errmsg("invalid tsquery_parser value: \"%.*s\"", (int) tsquery_parser->len,
tsquery_parser->val),
```
This change is also included in the attached diff file.
11 - jsonpath.c
```
+ if (printBracketes)
+ appendStringInfoChar(buf, ')');
+ break;
+
if (printBracketes)
appendStringInfoChar(buf, ')');
```
Duplicate code. Looks like a copy-pasto.
12 - jsonpath.c
```
+ /* Write the Main Query String */
+ appendBinaryStringInfo(buf,
+
&item->value.tsmatch.tsquerylen,
+
sizeof(item->value.tsmatch.tsquerylen));
+ appendBinaryStringInfo(buf,
+
item->value.tsmatch.tsquery,
+
item->value.tsmatch.tsquerylen);
+ appendStringInfoChar(buf, '\0');
```
I don’t think we need to manually append ‘\0’ after appendBinaryStringInfo.
Looking at the header comment of appendBinaryStringInfo, it says that a
trailing null will be added.
```
/*
* appendBinaryStringInfo
*
* Append arbitrary binary data to a StringInfo, allocating more space
* if necessary. Ensures that a trailing null byte is present.
*/
void
appendBinaryStringInfo(StringInfo str, const void *data, int datalen)
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
nocfbot_jsonpath_gram_y.diff
Description: Binary data
