Re: CommandStatus from insert returning when using a portal.

2023-07-14 Thread chap

On 2023-07-14 15:49, Dave Cramer wrote:

On Fri, 14 Jul 2023 at 15:40,  wrote:

Perhaps an easy rule would be, if the driver itself adds RETURNING
because of a RETURN_GENERATED_KEYS option, it should also force the
fetch count to zero and collect all the returned rows before
executeUpdate returns, and then it will have the right count
to return?


Well that kind of negates the whole point of using a cursor in the case
where you have a large result set.

The rows are subsequently fetched in rs.next()


I guess it comes down, again, to the question of what kind of thing
the API client thinks it is doing, when it issues an INSERT with
the RETURN_GENERATED_KEYS option.

An API client issuing a plain INSERT knows it is the kind of thing
for which executeUpdate() is appropriate, and the complete success
or failure outcome is known when that returns.

An API client issuing its own explicit INSERT RETURNING knows it
is the kind of thing for which executeQuery() is appropriate, and
that some processing (and possibly ereporting) may be left to
occur while working through the ResultSet.

But now how about this odd hybrid, where the API client wrote
a plain INSERT, but added the RETURN_GENERATED_KEYS option?
The rewritten query is the kind of thing the server and the
driver need to treat as a query, but to the API client it still
appears the kind of thing for which executeUpdate() is suited.
The generated keys can then be examined--in the form of a
ResultSet--but one obtained with the special method
getGeneratedKeys(), rather than the usual way of getting the
ResultSet from a query.

Should the client then assume the semantics of executeUpdate
have changed for this case, the outcome isn't fully known yet,
and errors could come to light while examining the returned
keys? Or should it still assume that executeUpdate has its
usual meaning, all the work is done by that point, and the
ResultSet from getGeneratedKeys() is simply a convenient,
familiar interface for examining what came back?

I'm not sure if there's a firm answer to that one way or the
other in the formal JDBC spec, but the latter seems perhaps
safer to me.

Regards,
-Chap




Re: CommandStatus from insert returning when using a portal.

2023-07-14 Thread chap

On 2023-07-14 14:19, David G. Johnston wrote:

Because of the returning they all need a portal so far as the server is
concerned and the server will obligingly send the contents of the 
portal

back to the client.


Dave's pcap file, for the fetch count 0 case, does not show any
portal name used in the bind, describe, or execute messages, or
any portal close message issued by the client afterward. The server
may be using a portal in that case, but it seems more transparent
to the client when fetch count is zero.

Perhaps an easy rule would be, if the driver itself adds RETURNING
because of a RETURN_GENERATED_KEYS option, it should also force the
fetch count to zero and collect all the returned rows before
executeUpdate returns, and then it will have the right count
to return?

It seems that any approach leaving any of the rows unfetched at
the time executeUpdate returns might violate a caller's expectation
that the whole outcome is known by that point.

Regards,
-Chap




Re: CommandStatus from insert returning when using a portal.

2023-07-14 Thread chap

On 2023-07-14 14:19, David G. Johnston wrote:
Is there some magic set of arguments I should be using besides: tcpdump 
-Ar

filename ?


I opened it with Wireshark, which has a pgsql protocol decoder.

Regards,
-Chap




Re: CommandStatus from insert returning when using a portal.

2023-07-14 Thread chap

On 2023-07-12 21:30, David G. Johnston wrote:
Right, and executeUpdate is the wrong API method to use, in the 
PostgreSQL

world, when executing insert/update/delete with the non-SQL-standard
returning clause. ... ISTM that you are trying to make user-error less
painful.


In Dave's Java reproducer, no user-error has been made, because the user
supplied a plain INSERT with the RETURN_GENERATED_KEYS option, and the
RETURNING clause has been added by the JDBC driver. So the user expects
executeUpdate to be the right method, and return the row count, and
getGeneratedKeys() to then return the rows.

I've seen a possibly even more interesting result using pgjdbc-ng with
protocol.trace=true:

FetchSize=0

1.>t.>T$>Z*


2.>D.>D.>C.>Z*

executeUpdate result: 2
ids: 1 2

FetchSize=1

2.>D.>s*

executeUpdate result: -1
ids: 3 
D.>s*

4 
C*


3.>Z*


FetchSize=2

2.>D.>D.>s*

executeUpdate result: -1
ids: 5 6 
C*


3.>Z*


FetchSize=3

2.>D.>D.>C*


3.>Z*

executeUpdate result: 2
ids: 7 8


Unless there's some interleaving of trace and stdout messages happening
here, I think pgjdbc-ng is not even collecting all the returned rows
in the suspended-cursor case before executeUpdate returns, but keeping
the cursor around for getGeneratedKeys() to use, so executeUpdate
returns -1 before even having seen the later command complete, and would
still do that even if the command complete message had the right count.

Regards,
-Chap




Re: CommandStatus from insert returning when using a portal.

2023-07-14 Thread chap

On 2023-07-14 12:58, Dave Cramer wrote:

See attached pcap file


So if the fetch count is zero and no portal is needed,
or if the fetch count exceeds the row count and the command
completion follows directly with no suspension of the portal, then
it comes with the correct count, but if the portal gets suspended,
then the later command completion reports a zero count?

Regards,
-Chap




Re: CommandStatus from insert returning when using a portal.

2023-07-14 Thread chap

On 2023-07-12 20:57, Dave Cramer wrote:
Without a cursor it returns right away as all of the results are 
returned

by the server. However with cursor you have to wait until you fetch the
rows before you can get the CommandComplete message which btw is wrong 
as

it returns INSERT 0 0 instead of INSERT 2 0


To make sure I am following, was this describing a comparison of
two different ways in Java, using JDBC, to perform the same operation,
one of which behaves as desired while the other doesn't? If so, for
my curiosity, what do both ways look like in Java?

Or was it a comparison of two different operations, say one
an INSERT RETURNING and the other something else?

Regards,
-Chap




Re: CommandStatus from insert returning when using a portal.

2023-07-12 Thread chap

Dave Cramer  writes:

Obviously I am biased by the JDBC API which would like to have
PreparedStatement.execute() return the number of rows inserted
without having to wait to read all of the rows returned


Huh ... just how *is* PreparedStatement.execute() supposed
to behave when the statement is an INSERT RETURNING?

execute() -> true
getResultSet() -> the rows
getMoreResults() -> false
getUpdateCount() -> number inserted?

It seems that would fit the portal's behavior easily enough.

Or is the JDBC spec insisting on some other order?

Regards,
-Chap




Re: COPY table FROM STDIN via SPI

2023-07-12 Thread chap

On 2023-07-12 14:18, Joe Conway wrote:

On 7/11/23 22:52, James Sewell wrote:

What about running a COPY directly from C - is that possible?


https://www.postgresql.org/docs/current/libpq-copy.html


Or is the question about a COPY kicked off from server-side
C code (following up a question about SPI)?

If the idea is to kick off a COPY that reads from the connected
client's STDIN, the wire protocol doesn't really have a way to
work that out with the client, as Tom pointed out.

Or is the goal for some server-side code to quickly populate
a table from some file that's readable on the server and has
the same format that COPY FROM expects?

Regards,
-Chap




Re: When IMMUTABLE is not.

2023-06-15 Thread chap

On 2023-06-15 10:19, David G. Johnston wrote:
The failure to find and execute the function code itself is not a 
failure
mode that these markers need be concerned with.  Assuming one can 
execute
the function an immutable function will give the same answer for the 
same

input for all time.


That was the view I ultimately took, and just made PL/Java suppress that
SPI readonly flag when going to look for the function code.

Until that change, you could run into the not-uncommon situation
where you've just loaded a jar of new functions and try to use them
in the same transaction, and hey presto, the VOLATILE ones all work,
and the IMMUTABLE ones aren't there yet.

Regards,
-Chap




Re: When IMMUTABLE is not.

2023-06-15 Thread chap

On 2023-06-15 09:58, c...@anastigmatix.net wrote:

also influences what snapshot the
function is looking at, and therefore what it can see, which has
also struck me more as a tacked-on effect than something inherent
in the declaration's meaning.


I just re-read that and realized I should anticipate the obvious
response "but how can it matter what the function can see, if
it's IMMUTABLE and depends on no data?".

So, I ran into the effect while working on PL/Java, where the
code of a function isn't all found in pg_proc.prosrc; that just
indicates what code has to be fetched from sqlj.jar_entry.

So one could take a strict view that "no PL/Java function should
ever be marked IMMUTABLE" because every one depends on fetching
something (once, at least).

But on the other hand, it would seem punctilious to say that
f(int x, int y) { return x + y; } isn't IMMUTABLE, only because
it depends on a fetch /of its own implementation/, and overall
its behavior is better described by marking it IMMUTABLE.

Regards,
-Chap




Re: When IMMUTABLE is not.

2023-06-15 Thread chap

On 2023-06-15 09:21, Tom Lane wrote:

Yura Sokolov  writes:

not enough to be sure function doesn't manipulate data.


Of course not.  It is the user's responsibility to mark functions
properly.


And also, isn't it the case that IMMUTABLE should mark a function,
not merely that "doesn't manipulate data", but whose return value
doesn't depend in any way on data (outside its own arguments)?

The practice among PLs of choosing an SPI readonly flag based on
the IMMUTABLE/STABLE/VOLATILE declaration seems to be a sort of
peculiar heuristic, not something inherent in what that declaration
means to the optimizer. (And also influences what snapshot the
function is looking at, and therefore what it can see, which has
also struck me more as a tacked-on effect than something inherent
in the declaration's meaning.)

Regards,
-Chap




Re: Let's make PostgreSQL multi-threaded

2023-06-06 Thread chap

On 2023-06-06 12:24, Heikki Linnakangas wrote:

I'm afraid having multiple processes and JVMs doesn't help that.
If you can escape the one JVM in one backend process, it's game over.


So there's escape and there's escape, right? Java still prioritizes
(and has, in fact, strengthened) barriers against breaking module
encapsulation, or getting access to arbitrary native memory or code.

The features that have been deprecated, to eventually go away, are
the ones that offer fine-grained control over operations that there
are Java APIs for. Eventually it won't be as easy as it is now to say
"ok, your function gets to open these files or these sockets but
not those ones."

Even for those things, there may yet be solutions. There are Java
APIs for virtualizing the view of the file system, for example. It's
yet to be seen how things will shake out. Configuration may get
trickier, and there may be some incentive to to include, say,
sepgsql in the picture.

Sure, even access to a file API can be game over, depending on
what file you open, but that's already the risk for every PL
with an 'untrusted' flavor.

Regards,
-Chap




Re: Let's make PostgreSQL multi-threaded

2023-06-06 Thread chap

On 2023-06-06 08:06, Konstantin Knizhnik wrote:
7. It will be hard to support non-multithreaded PL languages (like 
python), but for example support of Java will be more natural and 
efficient.


To this I say ...

Hmm.

Surely, the current situation with a JVM in each backend process
(that calls for one) has been often seen as heavier than desirable.

At the same time, I am not sure how manageable one giant process
with one giant JVM instance would prove to be, either.

It is somewhat nice to be able to tweak JVM settings in a session
and see what happens, without disrupting other sessions. There may
also exist cases for different JVM settings in per-user or per-
database GUCs.

Like Python with the GIL, it is documented for JNI_CreateJavaVM
that "Creation of multiple VMs in a single process is not
supported."[1]

And the devs of Java, in their immeasurable wisdom, have announced
a "JDK Enhancement Proposal" (that's just what these things are
called, don't blame Orwell), JEP 411[2][3], in which all of the
Security Manager features that PL/Java relies on for bounds on
'trusted' behavior are deprecated for eventual removal with no
functional replacement. I'd be even more leery of using one big
shared JVM for everybody's work after that happens.

Might the work toward allowing a run-time choice between a
process or threaded model also make possible some
intermediate models as well? A backend process for
connections to a particular database, or with particular
authentication credentials? Go through the authentication
handshake and then sendfd the connected socket to the
appropriate process. (Has every supported platform got
something like sendfd?)

That way, there could be some flexibility to arrange how many
distinct backends (and, for Java purposes, how many JVMs) get
fired up, and have each sharing sessions that have something in
common.

Or, would that just require all the complexity of both
approaches to synchronization, with no sufficient benefit?

Regards,
-Chap

[1] 
https://docs.oracle.com/en/java/javase/17/docs/specs/jni/invocation.html#jni_createjavavm
[2] 
https://blogs.apache.org/netbeans/entry/jep-411-deprecate-the-security1

[3] https://github.com/tada/pljava/wiki/JEP-411




Re: Documentation about PL transforms

2022-07-29 Thread chap

On 2022-07-28 01:51, Pavel Stehule wrote:

Would it be clearer to say:

It also contains the OID of the intended procedural language and 
whether

that procedural language is declared as TRUSTED.
While these values are redundant if the inline handler is serving only
a single procedural language, they are necessary to allow one inline
handler to serve more than one PL.


ok


I will probably be unable to produce a new patch for this CF.
Family obligation.

Regards,
-Chap




Re: Documentation about PL transforms

2022-07-27 Thread chap

On 2022-07-13 00:26, Pavel Stehule wrote:

1. I am not sure if get_call_trftypes is a good name - the prefix 
get_call

is used when some runtime data is processed.


I guess I hadn't caught on that the prefix carried that meaning.
To me, it appeared to be a prefix used to avoid being specific to
'function' or 'procedure'.


This function just returns
reformatted data from the system catalogue. Maybe 
get_func_trftypes_list,
or just replace function get_func_trftypes (now, the list is an array, 
so
there should not be performance issues). For consistency, the new 
function
should be used in plperl and plpython too. Probably this function is 
not


If it is acceptable to replace get_func_trftypes like that, I can 
produce

such a patch.


2.

+It also contains the OID of the intended procedural language and
whether
+that procedural language is declared as 
TRUSTED,

useful
+if a single inline handler is supporting more than one procedural
language.

I am not sure if this sentence is in the correct place. Maybe can be
mentioned separately,
so generally handlers can be used by more than one procedural language. 
But

maybe
I don't understand this sentence.


My point was that, if the structure did /not/ include the OID of
the PL and its TRUSTED property, then it would not be possible
for a single inline handler to support more than one PL. So that
is why it is a good thing that those are included in the structure,
and why it would be a bad thing if they were not.

Would it be clearer to say:

It also contains the OID of the intended procedural language and whether
that procedural language is declared as TRUSTED.
While these values are redundant if the inline handler is serving only
a single procedural language, they are necessary to allow one inline
handler to serve more than one PL.

Regards,
-Chap




Re: timezones BCE

2022-04-13 Thread chap

On 2022-04-13 14:13, Dave Cramer wrote:


Oh please don't do something bespoke. I'm trying to make this work with 
the

JDBC driver.
So it has to be at least compatible with other libraries.


Looks like Java agrees with the offset, prior to Toronto's 1895 adoption
of the hour-wide zone:

jshell> java.time.ZoneId.of("America/Toronto").
   ...> getRules().
   ...> nextTransition(java.time.Instant.parse("0101-01-01T00:00:00Z"))
$1 ==> Transition[Gap at 1895-01-01T00:00-05:17:32 to -05:00]

Regards,
-Chap




Re: timezones BCE

2022-04-13 Thread chap

On 2022-04-13 12:33, Dave Cramer wrote:

test=# set timezone to 'America/Toronto';
SET
test=# select '0101-01-01'::timestamptz;
 timestamptz
--
 0101-01-01 00:00:00-05:17:32

Specifically why the -05:17:32


Timezones were regularized into their (typically hour-wide) chunks
during a period around the late nineteenth century IIRC.

If you decompile the zoneinfo database to look at America/Toronto,
you will probably find an entry for dates earlier than when the
regularized zones were established there, and that entry will have
an offset reflecting Toronto's actual longitude.

Regards,
-Chap




Re: Documentation about PL transforms

2022-04-07 Thread chap

On 2022-02-22 12:59, Chapman Flack wrote:

It would have been painful to write documentation of get_func_trftypes
saying its result isn't what get_transform_{from.to}sql expect, so
patch 1 does add a get_call_trftypes that returns a List *.

Patch 2 then updates the docs as discussed in this thread. It turned 
out

plhandler.sgml was kind of a long monolith of text even before adding
transform information, so I broke it into sections first. This patch 
adds

the section markup without reindenting, so the changes aren't obscured.

The chapter had also fallen behind the introduction of procedures, so
I have changed many instances of 'function' to the umbrella term
'routine'.

Patch 3 simply reindents for the new section markup and rewraps.
Whitespace only.

Patch 4 updates src/test/modules/plsample to demonstrate handling of
transforms (and to add some more comments generally).


Here is a rebase.From 9c726423d47a114a82ca703c0e03a62e3d74f1f6 Mon Sep 17 00:00:00 2001
From: Chapman Flack 
Date: Mon, 21 Feb 2022 20:56:28 -0500
Subject: [PATCH v2 1/4] Warmup: add a get_call_trftypes function

The existing get_func_trftypes function produces an Oid[], where
both existing get_transform_{from,to}sql functions that depend
on the result expect a List*.

Rather than writing documentation awkwardly describing functions
that won't play together, add a get_call_trftypes function that
returns List*. (The name get_call_... to distinguish from
get_func_... follows the naming used in funcapi.h for a function
returning information about either a function or a procedure.)
---
 src/backend/utils/cache/lsyscache.c | 18 ++
 src/include/funcapi.h   |  5 +
 src/include/utils/lsyscache.h   |  1 +
 3 files changed, 24 insertions(+)

diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 1b7e11b..3cd94db 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2117,6 +2117,24 @@ get_transform_tosql(Oid typid, Oid langid, List *trftypes)
 		return InvalidOid;
 }
 
+/*
+ * get_call_trftypes
+ *
+ *		A helper function that does not itself query the transform cache, but
+ *		constructs the transform-type List expected by the functions above.
+ */
+List *
+get_call_trftypes(HeapTuple procTup)
+{
+	Datum		protrftypes;
+	bool		isNull;
+
+	protrftypes = SysCacheGetAttr(PROCOID, procTup,
+  Anum_pg_proc_protrftypes,
+  &isNull);
+	return isNull ?  NIL : oid_array_to_list(protrftypes);
+}
+
 
 /*-- TYPE CACHE --		 */
 
diff --git a/src/include/funcapi.h b/src/include/funcapi.h
index dc3d819..70c3e13 100644
--- a/src/include/funcapi.h
+++ b/src/include/funcapi.h
@@ -175,7 +175,12 @@ extern int	get_func_arg_info(HeapTuple procTup,
 extern int	get_func_input_arg_names(Datum proargnames, Datum proargmodes,
 	 char ***arg_names);
 
+/*
+ * A deprecated earlier version of get_call_trftypes (in lsyscache.h).
+ * That version produces a List, which is the form downstream functions expect.
+ */
 extern int	get_func_trftypes(HeapTuple procTup, Oid **p_trftypes);
+
 extern char *get_func_result_name(Oid functionId);
 
 extern TupleDesc build_function_result_tupdesc_d(char prokind,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index b8dd27d..93b19e7 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid);
 extern bool get_rel_relispartition(Oid relid);
 extern Oid	get_rel_tablespace(Oid relid);
 extern char get_rel_persistence(Oid relid);
+extern List *get_call_trftypes(HeapTuple procTup);
 extern Oid	get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
 extern Oid	get_transform_tosql(Oid typid, Oid langid, List *trftypes);
 extern bool get_typisdefined(Oid typid);
-- 
2.7.3

From 7fb3f8971c4c573903b1d08bcbeb126e31a6544c Mon Sep 17 00:00:00 2001
From: Chapman Flack 
Date: Mon, 21 Feb 2022 20:59:32 -0500
Subject: [PATCH v2 2/4] Update PL handler implementation docs

The original purpose was to add information on support for
CREATE TRANSFORM (which must be explicitly coded in any PL
implementation intending to support it). But the plhandler
section was about as long as a monolith of text ought to be,
even before adding transform information, so reorganized
first into sections.

Front-loaded with short descriptions of the three possible
functions (call handler, validator, inline handler) registered
with CREATE LANGUAGE. The latter two were afterthoughts in historical
sequence, but the docs don't need to present them that way.

The section had also fallen behind the introduction of procedures,
so updated to generally use the umbrella term 'routine' in place
of 'function'.

New section markup added here without indentation change, to avoid
obscuring changes. A follow-on commit will reindent and rewrap.
---
 doc/src/sgml/event-trigger.sgml|  16 ++
 doc/src/sgml/plhandler.sgml   

Re: test/isolation/expected/stats_1.out broken for me

2022-04-07 Thread chap

On 2022-04-07 15:04, Andres Freund wrote:

And done. Chap, could you confirm this fixes the issue for you?


Looks good from here. One installcheck-world with no failures; 
previously,

it failed for me every time.

Regards,
-Chap




Re: test/isolation/expected/stats_1.out broken for me

2022-04-07 Thread chap

On 2022-04-07 12:49, Tom Lane wrote:

So what non-default build options are you using?


The command that I've just been reusing from my bash_history without
thinking about it for some years is:

configure --enable-cassert --enable-tap-tests \
 --with-libxml --enable-debug \
 CFLAGS='-ggdb -Og -g3 -fno-omit-frame-pointer'

Regards,
-Chap




test/isolation/expected/stats_1.out broken for me

2022-04-07 Thread chap

Running installcheck-world on an unrelated patch, I noticed a failure
here in test/isolation/expected/stats_1.out (this is line 3102):

step s1_slru_check_stats:
SELECT current.blks_zeroed > before.value
  FROM test_slru_stats before
  INNER JOIN pg_stat_slru current
  ON before.slru = current.name
  WHERE before.stat = 'blks_zeroed';

?column?

t
(1 row)

This is built from bab588c. On my amd64/linux box the result is f.

The same mismatch is present if I build from 6392f2a (i.e., just before
a2f433f pgstat: add alternate output for stats.spec), along with
a bunch of others. So a2f433f seems to have silenced all the rest
of those, but not this one.

If I build from ad40166, installcheck-world passes. That's as far
as I have pursued it.

Regards,
-Chap




Re: trigger example for plsample

2022-04-07 Thread chap

On 2022-04-06 16:44, Mark Wong wrote:

I think I've applied all of these suggestions and attached a new patch.


That looks good to me, though I wonder about the pfree(source).
In the simplest case of a PL that uses no advance compilation or
augmentation step, the Code Execution block might naturally refer
to source, so perhaps the example boilerplate shouldn't include
a pfree that needs to be removed in that case.

In fact, I wonder about the need for any retail pfree()s here. Those
added in this patch are the only ones in plsample.c. They are small
allocations, and maybe it would both streamline the example to leave
out the pfree calls, and be an illustration of best practice in letting
the memory context machinery handle all the deallocation at once, where
there isn't a special need to free something, like an especially large
allocation, at retail.

Regards,
-Chap




Re: PostgreSQL shutdown modes

2022-04-02 Thread chap

On 2022-04-01 13:22, Robert Haas wrote:

I attach herewith a modest patch to rename these shutdown modes to
more accurately correspond to their actual characteristics.


I've waited for April 2nd to submit this comment, but it seemed to me 
that the
suggestion about the first-pass checkpoint in 'slow' mode is a 
no-foolin' good one.
Then I wondered whether there could be an option to accompany the 'dumb' 
mode that
would take a WHERE clause, to be implicitly applied to pg_stat_activity, 
whose
purpose would be to select those sessions that are ok to evict without 
waiting for
them to exit. It could recognize, say, backend connections in no current 
transaction
that are from your pesky app or connection pooler that holds things 
open. It could

also, for example, select things in transaction state but where
 current_timestamp - state_change > '5 minutes' (so it would be 
re-evaluated every

so often until ready to shut down).

For conciseness (and sanity), maybe the WHERE clause could be implicitly 
applied,
not to pg_stat_activity directly, but to a (virtual or actual) view that 
has
already been restricted to client backend sessions, and already has a 
column

for current_timestamp - state_change.

Regards,
-Chap