Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2024-09-27 Thread Florents Tselai



> On 27 Sep 2024, at 12:45 PM, David E. Wheeler  wrote:
> 
> On Sep 26, 2024, at 13:59, Florents Tselai  wrote:
> 
>> Speaking of extensible: the jsonpath standard does mention function 
>> extensions [1] ,
>> so it looks like we're covered by the standard, and the mutability aspect is 
>> an implementation detail. No?
> 
> That’s not the standard used for Postgres jsonpath. Postgres follows the 
> SQL/JSON standard in the SQL standard, which is not publicly available, but a 
> few people on the list have copies they’ve purchased and so could provide 
> some context.
> 
> In a previous post I wondered if the SQL standard had some facility for 
> function extensions, but I suspect not. Maybe in the next iteration?
> 
>> And having said that, the whole jsonb/jsonpath parser/executor 
>> infrastructure is extremely powerful
>> and kinda under-utilized if we use it "only" for jsonpath.
>> Tbh, I can see it supporting more specific DSLs and even offering hooks for 
>> extensions.
>> And I know for certain I'm not the only one thinking about this.
>> See [2] for example where they've lifted, shifted and renamed the 
>> jsonb/jsonpath infra to build a separate language for graphs
> 
> I’m all for extensibility, though jsonpath does need to continue to comply 
> with the SQL standard. Do you have some idea of the sorts of hooks that would 
> allow extension authors to use some of that underlying capability?

Re-tracing what I had to do

1. Define a new JsonPathItemType jpiMyExtType and map it to a JsonPathKeyword
2. Add a new JsonPathKeyword and make the lexer and parser aware of that,
3. Tell the main executor executeItemOptUnwrapTarget what to do when the new 
type is matched.

I think 1, 2 are the trickiest because they require hooks to  jsonpath_scan.l 
and parser jsonpath_gram.y 

3. is the meat of a potential hook, which would be something like 
extern JsonPathExecResult executeOnMyJsonpathItem(JsonPathExecContext *cxt, 
JsonbValue *jb, JsonValueList *found);
This should be called by the main executor executeItemOptUnwrapTarget when it 
encounters case jpiMyExtType

It looks like quite an endeavor, to be honest.



Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2024-09-26 Thread Florents Tselai
On Thu, Sep 26, 2024 at 1:55 PM Alexander Korotkov 
wrote:

> On Thu, Sep 26, 2024 at 12:04 AM Tom Lane  wrote:
> > Florents Tselai  writes:
> > > This patch is a follow-up and generalization to [0].
> > > It adds the following jsonpath methods:  lower, upper, initcap,
> l/r/btrim,
> > > replace, split_part.
> >
> > How are you going to deal with the fact that this makes jsonpath
> > operations not guaranteed immutable?  (See commit cb599b9dd
> > for some context.)  Those are all going to have behavior that's
> > dependent on the underlying locale.
> >
> > We have the kluge of having separate "_tz" functions to support
> > non-immutable datetime operations, but that way doesn't seem like
> > it's going to scale well to multiple sources of mutability.
>
> While inventing "_tz" functions I was thinking about jsonpath methods
> and operators defined in standard then.  Now I see huge interest on
> extending that.  I wonder if we can introduce a notion of flexible
> mutability?  Imagine that jsonb_path_query() function (and others) has
> another function which analyzes arguments and reports mutability.  If
> jsonpath argument is constant and all methods inside are safe then
> jsonb_path_query() is immutable otherwise it is stable.  I was
> thinking about that back working on jsonpath, but that time problem
> seemed too limited for this kind of solution.  Now, it's possibly time
> to shake off the dust from this idea.  What do you think?
>
> --
> Regards,
> Alexander Korotkov
> Supabase
>

In case you're having a deja vu, while researching this
I did come across [0] where disussing this back in 2019.

In this patch I've conveniently left jspIsMutable and jspIsMutableWalker
untouched and under the rug,
but for the few seconds I pondered over this,the best answer I came with
was
a simple heuristic to what Alexander says above:
if all elements are safe, then the whole jsp is immutable.

If we really want to tackle this and make jsonpath richer though,
I don't think we can avoid being a little more flexible/explicit wrt
mutability.

Speaking of extensible: the jsonpath standard does mention function
extensions [1] ,
so it looks like we're covered by the standard, and the mutability aspect
is an implementation detail. No?
And having said that, the whole jsonb/jsonpath parser/executor
infrastructure is extremely powerful
and kinda under-utilized if we use it "only" for jsonpath.
Tbh, I can see it supporting more specific DSLs and even offering hooks for
extensions.
And I know for certain I'm not the only one thinking about this.
See [2] for example where they've lifted, shifted and renamed the
jsonb/jsonpath infra to build a separate language for graphs

[0]
https://www.postgresql.org/message-id/capphfdvdci4iqnf9fhrktqhe-5_8hmzelt56drh+_rv2rnr...@mail.gmail.com
[1] https://www.rfc-editor.org/rfc/rfc9535.html#name-function-extensions
[2] https://github.com/apache/age/blob/master/src/include/utils/agtype.h


Re: Docs pg_restore: Shouldn't there be a note about -n ?

2024-09-25 Thread Florents Tselai
On Sat, Sep 21, 2024 at 9:48 PM Florents Tselai 
wrote:

>
>
> > On 21 Sep 2024, at 9:22 PM, Tom Lane  wrote:
> >
> > Florents Tselai  writes:
> >> Ah,  swapped them by mistake on the previous email:
> >> They're both available in the pg_dump and note on -n missing in
> pg_restore.
> >> The question remains though:
> >> Shouldn’t there be a note about -n in pg_restore ?
> >
> > Probably.  I see that pg_dump has a third copy of the exact same
> > note for "-e".  pg_restore lacks that switch for some reason,
> > but this is surely looking mighty duplicative.  I propose getting
> > rid of the per-switch Notes and putting a para into the Notes
> > section, along the lines of
> >
> >When -e, -n, or -t is specified, pg_dump makes no attempt to dump
> >any other database objects that the selected object(s) might
> >depend upon. Therefore, there is no guarantee that the results of
> >a selective dump can be successfully restored by themselves into a
> >clean database.
>
> Agree with that, but I think there should be a pointer like “see Notes” .
> Otherwise I’m pretty sure most would expect pg doing magic.
> Can’t remember I scrolledl to the bottom of a page “notes” after finding
> the option I want.
>
> I would also add an example of what “depend upon” means,
> To underline that it’s really not that uncommon.
> Something like:
> “If you pg_dump only with -t A and A has foreign key constraints to table
> B,
> Those constraints won’t succeed If B has not been already restored”
>
>
Attached an idea.
The A/B example may be wordy / redundant,
but the -e note on the distinction between installing binaries / creating
an extension I think it's important.


v1-0001-Note-on-e-t-n.patch
Description: Binary data


PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2024-09-25 Thread Florents Tselai
Hello hackers,

This patch is a follow-up and generalization to [0].

It adds the following jsonpath methods:  lower, upper, initcap, l/r/btrim,
replace, split_part.

It makes jsonpath able to support expressions like these:

select jsonb_path_query('"   hElLo WorlD "',
 '$.btrim().lower().upper().lower().replace("hello","bye") starts with
"bye"');
select jsonb_path_query('"abc~@~def~@~ghi"', '$.split_part("~@~", 2)')

They, of course, forward their implementation to the internal
pg_proc-registered function.

As a first wip/poc I've picked the functions I typically need to clean up
JSON data.
I've also added a README.jsonpath with documentation on how to add a new
jsonpath method.
If I had this available when I started, it would have saved me some time.
So, I am leaving it here for the next hacker.

This patch is not particularly intrusive to existing code:
Afaict, the only struct I've touched is JsonPathParseItem , where I added {
JsonPathParseItem *arg0, *arg1; } method_args.
Up until now, most of the jsonpath methods that accept arguments rely on
left/right operands,
which works, but it could be more convenient for future more complex
methods.
I've also added the appropriate jspGetArgX(JsonPathItem *v, JsonPathItem
*a).

Open items
- What happens if the jsonpath standard adds a new method by the same name?
A.D. mentioned this in [0] with the proposal of having a prefix like pg_ or
initial-upper letter.
- Still using the default collation like the rest of the jsonpath code.
- documentation N/A yet
- I do realize that the process of adding a new method sketches an
imaginary.
CREATE JSONPATH FUNCTION. This has been on the back of my mind for some
time now,
but I can't say I have an action plan for this yet.

GitHub PR view if you prefer:
https://github.com/Florents-Tselai/postgres/pull/18

[0]
https://www.postgresql.org/message-id/flat/185BF814-9225-46DB-B1A1-6468CF2C8B63%40justatheory.com#1850a37a98198974cf543aefe225ba56

All the best,
Flo


v1-0001-This-patch-adds-the-following-string-processing-m.patch
Description: Binary data


Re: Docs pg_restore: Shouldn't there be a note about -n ?

2024-09-21 Thread Florents Tselai



> On 21 Sep 2024, at 9:22 PM, Tom Lane  wrote:
> 
> Florents Tselai  writes:
>> Ah,  swapped them by mistake on the previous email:
>> They're both available in the pg_dump and note on -n missing in pg_restore.
>> The question remains though:
>> Shouldn’t there be a note about -n in pg_restore ?
> 
> Probably.  I see that pg_dump has a third copy of the exact same
> note for "-e".  pg_restore lacks that switch for some reason,
> but this is surely looking mighty duplicative.  I propose getting
> rid of the per-switch Notes and putting a para into the Notes
> section, along the lines of
> 
>When -e, -n, or -t is specified, pg_dump makes no attempt to dump
>any other database objects that the selected object(s) might
>depend upon. Therefore, there is no guarantee that the results of
>a selective dump can be successfully restored by themselves into a
>clean database.

Agree with that, but I think there should be a pointer like “see Notes” .
Otherwise I’m pretty sure most would expect pg doing magic.
Can’t remember I scrolledl to the bottom of a page “notes” after finding the 
option I want.

I would also add an example of what “depend upon” means,
To underline that it’s really not that uncommon.
Something like: 
“If you pg_dump only with -t A and A has foreign key constraints to table B,
Those constraints won’t succeed If B has not been already restored” 



> 
> and mutatis mutandis for pg_restore.
> 
>   regards, tom lane





Re: Docs pg_restore: Shouldn't there be a note about -n ?

2024-09-21 Thread Florents Tselai
On Sat, Sep 21, 2024 at 8:34 PM Florents Tselai 
wrote:

> I’m in the process of trying to restore some PG15/16 backups in PG17.
>
> While playing with different -t and -n combinations I was browsing through
> the docs.
>
> In *pg_restore* there are two notes about both -t / -n
>
> > When -n / -t is specified, pg_dump makes no attempt to ...
>
> In pg_dump though there’s the equivalent note only for the -t option.
>
> Shouldn’t it be a note as well for -n ?
>
> Otherwise I would expect -n to cascade the restore to objects in other
> schemas;
> Which I don’t think it does.
>
> Am I missing something?
>

Ah,  swapped them by mistake on the previous email:

They're both available in the pg_dump and note on -n missing in pg_restore.

The question remains though:
Shouldn’t there be a note about -n in pg_restore ?


Docs pg_restore: Shouldn't there be a note about -n ?

2024-09-21 Thread Florents Tselai
I’m in the process of trying to restore some PG15/16 backups in PG17.

While playing with different -t and -n combinations I was browsing through the 
docs.

In pg_restore there are two notes about both -t / -n 

> When -n / -t is specified, pg_dump makes no attempt to ...

In pg_dump though there’s the equivalent note only for the -t option.

Shouldn’t it be a note as well for -n ? 

Otherwise I would expect -n to cascade the restore to objects in other schemas;
Which I don’t think it does.

Am I missing something? 

Re: [PATCH] WIP: replace method for jsonpath

2024-09-19 Thread Florents Tselai
On 18 Sep 2024, at 3:47 PM, Andrew Dunstan  wrote:


On 2024-09-18 We 4:23 AM, Peter Eisentraut wrote:

On 17.09.24 21:16, David E. Wheeler wrote:

On Sep 17, 2024, at 15:03, Florents Tselai 
wrote:

Fallback scenario: make this an extension, but in a first pass I didn’t
find any convenient hooks.
One has to create a whole new scanner, grammar etc.


Yeah, it got me thinking about the RFC-9535 JSONPath "Function Extension"
feature[1], which allows users to add functions. Would be cool to have a
way to register jsonpath functions somehow, but I would imagine it’d need
quite a bit of specification similar to RFC-9535. Wouldn’t surprise me to
see something like that appear in a future version of the spec, with an
interface something like CREATE OPERATOR.


Why can't we "just" call any suitable pg_proc-registered function from JSON
path?  The proposed patch routes the example '$.replace("hello","bye")'
internally to the internal implementation of the SQL function replace(...,
'hello', 'bye'). Why can't we do this automatically for any function call
in a JSON path expression?




That might work. The thing that bothers me about the original proposal is
this: what if we add a new non-spec jsonpath method and then a new version
of the spec adds a method with the same name, but not compatible with our
method? We'll be in a nasty place. At the very least I think we need to try
hard to avoid that. Maybe we should prefix non-spec method names with
"pg_", or maybe use an initial capital letter.


If naming is your main reservation, then I take it you’re generally
positive.

Having said that, “pg_” is probably too long for a jsonpath expression,
Most importantly though, “pg_” in my mind is a prefix for things like
catalog lookup and system monitoring.
Not a function that the average user would use.
Thus, I lean towards initial-capital.

The more general case would look like:
A new jsonpath item of the format $.Func(arg1, …, argn) can be applied
(recursively or not) to a json object.

As a first iteration/version only pg_proc-registered functions of the
format func(text, ...,) -> text are available.
We can focus on oid(arg0) = TEXTOID and rettype = TEXTOID fist.
The first arg0 has to be TEXTOID (the current json string) while subsequent
args are provided by the user
in the jsonpath expression.

The functions we want to support will be part of jsonpath grammar
and during execution we'll have enough info to find the appropriate
PGFunction to call.

What I'm missing yet is how we could handle vars jsonb,
in case the user doesn't want to just hardcode the actual function
arguments.
Then resolving argtypes1...n is a bit more complex:

The signature for jsonb_apply(doc jsonb, func text[, variadic "any"
args1_n]); [0]
Here, variadic "any" works beautifully, but that's a brand-new function.

In existing jsonb{path} facilities, vars are jsonb objects which could work
as well I think.
Unless I'm missing something.

[0] https://github.com/Florents-Tselai/jsonb_apply




cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com


Re: Get TupleDesc for extension-defined types

2024-09-18 Thread Florents Tselai
On Wed, Sep 18, 2024 at 1:09 PM Pavel Stehule 
wrote:

> Hi
>
> st 18. 9. 2024 v 9:04 odesílatel Florents Tselai <
> florents.tse...@gmail.com> napsal:
>
>> Correct me if I'm wrong,
>> but for an extension that defines composite types,
>> there's currently no easy way to get a TupleDesc, even for its own types.
>>
>> Something like
>> TupleDesc get_extension_type_tupledesc(const char *extname, const char
>> *typname)
>>
>> Here's a routine I've stolen borrowed from pramsey's code and have been
>> using ever since.
>>
>> Could this be exposed in extension.h ? (probably without the version
>> check)
>>
>
> I don't think this functionality is generally useful.  Wrapping
> TypeGetTupleDesc(typoid, NIL) is very specific, and probably this code
> should be inside the extension.
>
> Different question is API for searching in system catalog and
> dependencies. I can imagine some functions like
>

That's a better phrasing

>
> Oid extid = get_extension_id(extname);
> Oid objid = get_extension_object_id(extid, schema_can_be_null, name,
> TYPEOID); // can be used for routine, table, ...
>
> tupdesc = TypeGetTupleDesc(objid, NIL);
>

These are valid.
For context:
The "problem" (inconvenience really) I'm trying to solve is this:
Most extensions define some convenient PG_GETARG_MYTYPE(n) macros.
When these types are varlena, things are easy.

When they're composite types though things get more verbose.
i.e. the lines of code the author needs to get from a Datum argument to
struct MyType are too many
and multiple extensions copy-paste the same logic.

My hope is we could come up with a few routines that ease and standardize
this a bit.

You're right that extname isn't unique, so Oid should be the argument for
extension, rather than char *extname,
but in my mind the  "default" is "the current extension" , but no arguing
about that.

>
> Regards
>
> Pavel
>
>
>


Re: [PATCH] WIP: replace method for jsonpath

2024-09-18 Thread Florents Tselai


> On 18 Sep 2024, at 11:23 AM, Peter Eisentraut  wrote:
> 
> On 17.09.24 21:16, David E. Wheeler wrote:
>> On Sep 17, 2024, at 15:03, Florents Tselai  wrote:
>>> Fallback scenario: make this an extension, but in a first pass I didn’t 
>>> find any convenient hooks.
>>> One has to create a whole new scanner, grammar etc.
>> Yeah, it got me thinking about the RFC-9535 JSONPath "Function Extension" 
>> feature[1], which allows users to add functions. Would be cool to have a way 
>> to register jsonpath functions somehow, but I would imagine it’d need quite 
>> a bit of specification similar to RFC-9535. Wouldn’t surprise me to see 
>> something like that appear in a future version of the spec, with an 
>> interface something like CREATE OPERATOR.
> 
> Why can't we "just" call any suitable pg_proc-registered function from JSON 
> path?  The proposed patch routes the example '$.replace("hello","bye")' 
> internally to the internal implementation of the SQL function replace(..., 
> 'hello', 'bye').  Why can't we do this automatically for any function call in 
> a JSON path expression?
> 


Well, we can.
A couple of weeks ago, I discovered transform_jsonb_string_values, which is 
already available in  jsonfuncs.h
and that gave me the idea for this extension 
https://github.com/Florents-Tselai/jsonb_apply 

It does exactly what you’re saying: searches for a suitable pg_proc in the 
catalog, and directly applies it.

select jsonb_apply('{
  "id": 1,
  "name": "John",
  "messages": [
"hello"
  ]
}', 'replace', 'hello', 'bye’);

select jsonb_filter_apply('{
  "id": 1,
  "name": "John",
  "messages": [
"hello"
  ]
}', '{messages}', 'md5’);

But, I don't know… jsonb_apply? That seemed “too fancy”/LISPy for standard 
Postgres.

Now that you mention it, though, there’s an alternative of tweaking the grammar 
and calling the suitable text proc.

Get TupleDesc for extension-defined types

2024-09-18 Thread Florents Tselai
Correct me if I'm wrong,
but for an extension that defines composite types,
there's currently no easy way to get a TupleDesc, even for its own types.

Something like
TupleDesc get_extension_type_tupledesc(const char *extname, const char
*typname)

Here's a routine I've stolen borrowed from pramsey's code and have been
using ever since.

Could this be exposed in extension.h ? (probably without the version check)


v1-0001-First-implementation-for-get_extension_type_tuple.patch
Description: Binary data


Re: jsonb_strip_nulls with arrays?

2024-09-17 Thread Florents Tselai
On Tue, Sep 17, 2024 at 5:11 PM Andrew Dunstan  wrote:

>
> On 2024-09-17 Tu 5:26 AM, Florents Tselai wrote:
>
> Currently:
>
>
> jsonb_strip_nulls ( jsonb ) → jsonb
>
> Deletes all object fields that have null values from the given JSON value,
> recursively. Null values that are not object fields are untouched.
>
>
> > Null values that are not object fields are untouched.
>
>
> Can we revisit this and make it work with arrays, too?
>
> Tbh, at first sight that looked like the expected behavior for me.
>
> That is strip nulls from arrays as well.
>
>
> This has been available since 9.5 and iiuc predates lots of the jsonb
> array work.
>
>
> I don't think that's a great idea. Removing an object field which has a
> null value shouldn't have any effect on the surrounding data, nor really
> any on other operations (If you try to get the value of the missing field
> it should give you back null). But removing a null array member isn't like
> that at all - unless it's the trailing member of the array it will renumber
> all the succeeding array members.
>
> And I don't think we should be changing the behaviour of a function, that
> people might have been relying on for the better part of a decade.
>
>
>
> In practice, though, whenever jsonb_build_array is used (especially with
> jsonpath),
>
> a few nulls do appear in the resulting array most of the times,
>
> Currently, there’s no expressive way to remove this.
>
>
> We could also have jsonb_array_strip_nulls(jsonb) as well
>
>
> We could, if we're going to do anything at all in this area. Another
> possibility would be to provide a second optional parameter for
> json{b}_strip_nulls. That's probably a better way to go.
>
Here's a patch that adds that argument (only for jsonb; no json
implementation yet)

That's how I imagined & implemented it,
but there may be non-obvious pitfalls in the semantics.

as-is version

select jsonb_strip_nulls('[1,2,null,3,4]');
 jsonb_strip_nulls

 [1, 2, null, 3, 4]
(1 row)

select
jsonb_strip_nulls('{"a":1,"b":null,"c":[2,null,3],"d":{"e":4,"f":null}}');
 jsonb_strip_nulls

 {"a": 1, "c": [2, null, 3], "d": {"e": 4}}
(1 row)

with the additional boolean flag added

select jsonb_strip_nulls('[1,2,null,3,4]', *true*);
 jsonb_strip_nulls
---
 [1, 2, 3, 4]
(1 row)

select
jsonb_strip_nulls('{"a":1,"b":null,"c":[2,null,3],"d":{"e":4,"f":null}}',
*true*);
  jsonb_strip_nulls
--
 {"a": 1, "c": [2, 3], "d": {"e": 4}}
(1 row)


GH PR view: https://github.com/Florents-Tselai/postgres/pull/6/files

> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>


v1-0002-Add-docs-for-strip_in_arrays-argument.patch
Description: Binary data


v1-0001-jsonb_strip_nulls-jsonb-bool-wip.patch
Description: Binary data


Re: [PATCH] WIP: replace method for jsonpath

2024-09-17 Thread Florents Tselai



> On 17 Sep 2024, at 9:40 PM, David E. Wheeler  wrote:
> 
> On Sep 16, 2024, at 18:39, Florents Tselai  wrote:
> 
>> Here’s an updated version of this patch.
> 
> Oh, nice function.
> 
> But a broader question for hackers: Is replace() specified in the SQL/JSON 
> spec? If not, what’s the process for evaluating whether or not to add 
> features not specified by the spec?

That’s the first argument I was expecting, and it’s a valid one.

From a user’s perspective the answer is:
Why not?
The more text-processing facilities I have in jsonb,
The less back-and-forth-parentheses-fu I do,
The easier my life is.

From a PG gatekeeping it’s a more complicated issue. 

It’s not part of the spec, 
But I think the jsonb infrastructure in PG is really powerful already and we 
can built on it,
And can evolve into a superset DSL of jsonpath.

For example, apache/age have lift-and-shifted this infra and built another DSL 
(cypher) on top of it.

> 
>> As a future note:
>> It’s worth noting that both this newly added jspItem and other ones like 
>> (jpiDecimal, jpiString)
>> use jspGetRightArg and jspGetLeftArg.
>> left and right can be confusing if more complex methods are added in the 
>> future.
>> i.e. jsonpath methods with nargs>=3 .
>> I was wondering if we’d like something like JSP_GETARG(n)
> 
> So far I think we have only functions defined by the spec, which tend to be 
> unary or binary, so left and right haven’t been an issue.

If the answer to the Q above is: “we stick to the spec” then there’s no 
thinking about this.

But tbh, I’ve already started experimenting  with other text methods in text 
$.strip() / trim() / upper() / lower() etc.

Fallback scenario: make this an extension, but in a first pass I didn’t find 
any convenient hooks.
One has to create a whole new scanner, grammar etc.

> 
> Best,
> 
> David
> 





jsonb_strip_nulls with arrays?

2024-09-17 Thread Florents Tselai
Currently: 

jsonb_strip_nulls ( jsonb ) → jsonb
Deletes all object fields that have null values from the given JSON value, 
recursively. Null values that are not object fields are untouched.

> Null values that are not object fields are untouched. 

Can we revisit this and make it work with arrays, too?
Tbh, at first sight that looked like the expected behavior for me.
That is strip nulls from arrays as well.

This has been available since 9.5 and iiuc predates lots of the jsonb array 
work.

In practice, though, whenever jsonb_build_array is used (especially with 
jsonpath),
a few nulls do appear in the resulting array most of the times,
Currently, there’s no expressive way to remove this.  

We could also have jsonb_array_strip_nulls(jsonb) as well



Re: [PATCH] WIP: replace method for jsonpath

2024-09-16 Thread Florents Tselai
Here’s an updated version of this patch.The previous version failed several CI steps; this passes them all.Unless someone disagrees,I’ll proceed with the documentation and add this for the next CF.As a future note:It’s worth noting that both this newly added jspItem and other ones like (jpiDecimal, jpiString)use jspGetRightArg and jspGetLeftArg.left and right can be confusing if more complex methods are added in the future.i.e. jsonpath methods with nargs>=3 .I was wondering if we’d like something like JSP_GETARG(n)GitHub PR in case you prefer it’s UI  https://github.com/Florents-Tselai/postgres/pull/3 

v2-0001-Add-first-working-version-for-jsonpath-.replace-s.patch
Description: Binary data


v2-0002-replace-should-work-with-strings-inside-arrays.patch
Description: Binary data


v2-0003-Fix-logic-for-jpiReplaceFunc.patch
Description: Binary data
On 15 Sep 2024, at 4:15 AM, Florents Tselai  wrote:Hi,When working with jsonb/jsonpath, I’ve always wanted more fluent string operations for data cleaning.Ideally, chaining text processing methods together.Here’s a draft patch that adds a string replace method.It works like thisselect jsonb_path_query('"hello world"', '$.replace("hello","bye")'); jsonb_path_query -- "bye world"(1 row)And looks like plays nicely with existing facilities  select jsonb_path_query('"hello world"', '$.replace("hello","bye") starts with "bye"'); jsonb_path_query -- true(1 row)I’d like some initial feedback on whether this is of interested before I proceed with the following:- I’ve tried respecting the surrounding code, but a more experienced eye can spot some inconsistencies. I’ll revisit those- Add more test cases (I’ve added the obvious ones, but ideas on more cases are welcome).- pg upgrades currently fail on CI (see)- better error handling/reporting: I’ve kept the wording of error messages, but we’ll need something akin to ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION.- documentation.

[PATCH] WIP: replace method for jsonpath

2024-09-14 Thread Florents Tselai
Hi,When working with jsonb/jsonpath, I’ve always wanted more fluent string operations for data cleaning.Ideally, chaining text processing methods together.Here’s a draft patch that adds a string replace method.It works like thisselect jsonb_path_query('"hello world"', '$.replace("hello","bye")'); jsonb_path_query -- "bye world"(1 row)And looks like plays nicely with existing facilities  select jsonb_path_query('"hello world"', '$.replace("hello","bye") starts with "bye"'); jsonb_path_query -- true(1 row)I’d like some initial feedback on whether this is of interested before I proceed with the following:- I’ve tried respecting the surrounding code, but a more experienced eye can spot some inconsistencies. I’ll revisit those- Add more test cases (I’ve added the obvious ones, but ideas on more cases are welcome).- pg upgrades currently fail on CI (see)- better error handling/reporting: I’ve kept the wording of error messages, but we’ll need something akin to ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION.- documentation.

v1-0001-jsonpath-replace-method.patch
Description: Binary data


Re: Exporting float_to_shortest_decimal_buf(n) with Postgres 17 on Windows

2024-09-13 Thread Florents Tselai



> On 13 Sep 2024, at 11:07 PM, Andrew Kane  wrote:
> 
> Hi,
> 
> With Postgres 17 RC1 on Windows, `float_to_shortest_decimal_buf` and 
> `float_to_shortest_decimal_bufn` are not longer exported. This causes 
> `unresolved external symbol` linking errors for extensions that rely on these 
> functions (like pgvector). Can these functions be exported like previous 
> versions of Postgres?

Probably a Windows thing?
Just tried on Darwin with 17_RC1, and pgvector 0.7.4 build installcheck’s OK. 


> 
> Thanks,
> Andrew





[PATCH] Add some documentation on how to call internal functions

2024-09-12 Thread Florents Tselai
Hi, The documentation on extending using C functions, leaves a blank on how to call other internal Postgres functions.This can leave first-timers hanging.I think it’d be helpful to spend some paragraphs on discussing the DirectFunctionCall API.Here’s an attempt (also a PR[0]).Here’s the relevant HTML snippet for convenience:To call another version-1 function, you can use DirectFunctionCalln(func, arg1,...,argn). This is particularly useful when you want to call functions defined in the standard internal library, by using an interface similar to their SQL signature.Different flavors of similar macros can be found in fmgr.h. The main point though is that they expect a C function name to call as their first argument (or its Oid in some cases), and actual arguments should be supplied as Datums. They always return Datum.For example, to call the starts_with(text, text) from C, you can search through the catalog and find out that its C implementation is based on the Datum text_starts_with(PG_FUNCTION_ARGS) function.In fmgr.h there are also available macros the facilitate conversions between C types and Datum. For example to turn text* into Datum, you can use DatumGetTextPP(X). If your extension defines additional types, it is usually convenient to define similar macros for these types too.I’ve also added the below example function:PG_FUNCTION_INFO_V1(t_starts_with);

Datum
t_starts_with(PG_FUNCTION_ARGS)
{
Datum t1 = PG_GETARG_DATUM(0);
Datum t2 = PG_GETARG_DATUM(1);
bool  bool_res;

Datum datum_res = DirectFunctionCall2(text_starts_with, t1, t2);
bool_res = DatumGetBool(datum_res);

PG_RETURN_BOOL(bool_res);
}PS1: I was not sure if src/tutorial is still relevant with this part of the documentation.If so, it needs updating too.[0] https://github.com/Florents-Tselai/postgres/pull/1/commits/1651b7bb68e0f9c2b61e1462367295d846d253ec

0001-Add-some-documentation-on-how-to-call-internal-funct.patch
Description: Binary data


Re: Update platform notes to build Postgres on macos

2024-07-04 Thread Florents Tselai
./configure —help 

It will show that you can build —without-icu , 
you can also specify a path to pkg-config via PKG_CONFIG=/path/to/pkg-config 

side note: I’ve had better experience building with brew on macos, rather than 
macports.

> On 4 Jul 2024, at 9:02 PM, Said Assemlal  wrote:
> 
> Hi,
> 
> 
> 
> I just built postgresql on macos sonoma (v14) and I had to install the 
> following packages:
> 
> * icu - https://ports.macports.org/port/icu/
> * pkg - https://ports.macports.org/port/pkgconfig/
> I don't see anything related to this on 
> https://www.postgresql.org/docs/devel/installation-platform-notes.html
> 
> Did I miss something ? Should we add a note?
> 
> best,
> Saïd
> 
> 
> 



Re: SQL Property Graph Queries (SQL/PGQ)

2024-07-04 Thread Florents Tselai
In the ddl.sgml, I’d swap the first two paragraphs.
I find the first one a bit confusing as-is. As far as I can tell, it’s an 
implementation detail.
The first paragraph should answer, “I have some data modeled as a graph G=(V, 
E). Can Postgres help me?”.

Then, introducing property graphs makes more sense. 

I'd also use the examples and fake data in `graph_table.sql` in 
ddl/queries.sgml).
I was bummed that that copy-pasting didn't work as is.
I’d keep explaining how a graph query translates to a relational one later in 
the page.

As for the implementation, I can’t have an opinion yet,
but for those not familiar, Apache Age uses a slightly different approach
that mimics jsonpath (parses a sublanguage expression into an internal 
execution engine etc.).
However, the standard requires mapping this to the relational model, which 
makes sense for core Postgres.


> On 27 Jun 2024, at 3:31 PM, Peter Eisentraut  wrote:
> 
> Here is a new version of this patch.  I have been working together with 
> Ashutosh on this.  While the version 0 was more of a fragile demo, this 
> version 1 has a fairly complete minimal feature set and should be useful for 
> playing around with.  We do have a long list of various internal bits that 
> still need to be fixed or revised or looked at again, so there is by no means 
> a claim that everything is completed.
> 
> Documentation to get started is included (ddl.sgml and queries.sgml). (Of 
> course, feedback on the getting-started documentation would be most welcome.)
> 





Re: Do we want a hashset type?

2023-08-14 Thread Florents Tselai
Has anyone put this in a git repo / extension package or similar ? 

I’d like to try it out outside the core pg tree. 

> On 1 Jul 2023, at 12:04 PM, Joel Jacobson  wrote:
> 
> On Fri, Jun 30, 2023, at 06:50, jian he wrote:
>> more like a C questions
>> in this context does
>> #define HASHSET_GET_VALUES(set) ((int32 *) ((set)->data +
>> CEIL_DIV((set)->capacity, 8)))
>> define first, then define struct int4hashset_t. Is this normally ok?
> 
> Yes, it's fine. Macros are just text substitutions done pre-compilation.
> 
>> Also does
>> #define HASHSET_GET_VALUES(set) ((int32 *) ((set)->data +
>> CEIL_DIV((set)->capacity, 8)))
>> 
>> remove (int32 *) will make it generic? then when you use it, you can
>> cast whatever type you like?
> 
> Maybe, but might be less error-prone more descriptive with different
> macros for each type, e.g. INT4HASHSET_GET_VALUES,
> similar to the existing PG_GETARG_INT4HASHSET
> 
> Curious to hear what everybody thinks about the interface, documentation,
> semantics and implementation?
> 
> Is there anything missing or something that you think should be 
> changed/improved?
> 
> /Joel
> 
> 





Re: Improving FTS for Greek

2023-06-06 Thread Florents Tselai


> On 7 Jun 2023, at 12:13 AM, Peter Eisentraut  wrote:
> 
> On 03.06.23 19:47, Florents Tselai wrote:
>> There’s another previous relevant patch [0] but was never merged. I’ve 
>> included these stop words and added some more (info in README.md).
>> For my personal projects looks like it yields much better results.
>> I’d like some feedback on the extension ; particularly on the installation 
>> infra (I’m not sure I’ve handled properly the permissions in the .sql files)
>> I’ll then try to make a .patch for this.
> 
> The open question at the previous attempt was that it wasn't clear what the 
> upstream source or long-term maintenance of the stop words list would be.  If 
> it's just a personally composed list, then it's okay if you use it yourself, 
> but for including it into PostgreSQL it ought to come from a reputable 
> non-individual source like snowball.

I’ve used the NLTK list [0] as my base of stopwords; Wouldn’t this be 
considered reputable enough ? 

0 
https://github.com/nltk/nltk_data/blob/gh-pages/packages/corpora/stopwords.zip 
(see greek.stop file in the archive)

> 



Improving FTS for Greek

2023-06-03 Thread Florents Tselai
I posted earlier in pgsql-general, that I realised there’s no greek.stop under 
$(pg_config —sharedir)/tsearch_data

And indeed looks like stop words are maintained with to_tsvector(‘greek’, ..). 

I wrote an extension https://github.com/Florents-Tselai/pg_fts_greek that adds 
another ‘greek_ext’ regconfig 

Here’s how the results compare


t   to_tsvector('greek', t) to_tsvector('greek_ext', t)
'το τετράγωνο της υποτείνουσας ενός ορθογωνίου τριγώνου''εν':5 
'ορθογων':6 'τ':3 'τετραγων':2 'το':1 'τριγων':7 'υποτεινουσ':4  'εν':5 
'ορθογων':6 'τετραγων':2 'τριγων':7 'υποτεινουσ':4
'ο γιώργος είναι πονηρός'   'γιωργ':2 'εινα':3 'ο':1 'πονηρ':4  
'γιωργ':2 'πονηρ':4
'ο ήλιος ο πράσινος o ήλιος που ανατέλλει'  'o':5 'ανατελλ':8 'ηλι':2,6 
'ο':1,3 'π':7 'πρασιν':4'ανατελλ':8 'ηλι':2,6 'πρασιν':4

There’s another previous relevant patch [0] but was never merged. I’ve included 
these stop words and added some more (info in README.md).

For my personal projects looks like it yields much better results.

I’d like some feedback on the extension ; particularly on the installation 
infra (I’m not sure I’ve handled properly the permissions in the .sql files) 

I’ll then try to make a .patch for this. 



[0] 
https://www.postgresql.org/message-id/flat/e1c79330-48a5-abef-c309-8d4499e3180b%402ndquadrant.com#7431fdb9ae24b694155aef3f040b7b60