Re: CommandStatus from insert returning when using a portal.
On 2023-07-14 15:49, Dave Cramer wrote: On Fri, 14 Jul 2023 at 15:40, wrote: Perhaps an easy rule would be, if the driver itself adds RETURNING because of a RETURN_GENERATED_KEYS option, it should also force the fetch count to zero and collect all the returned rows before executeUpdate returns, and then it will have the right count to return? Well that kind of negates the whole point of using a cursor in the case where you have a large result set. The rows are subsequently fetched in rs.next() I guess it comes down, again, to the question of what kind of thing the API client thinks it is doing, when it issues an INSERT with the RETURN_GENERATED_KEYS option. An API client issuing a plain INSERT knows it is the kind of thing for which executeUpdate() is appropriate, and the complete success or failure outcome is known when that returns. An API client issuing its own explicit INSERT RETURNING knows it is the kind of thing for which executeQuery() is appropriate, and that some processing (and possibly ereporting) may be left to occur while working through the ResultSet. But now how about this odd hybrid, where the API client wrote a plain INSERT, but added the RETURN_GENERATED_KEYS option? The rewritten query is the kind of thing the server and the driver need to treat as a query, but to the API client it still appears the kind of thing for which executeUpdate() is suited. The generated keys can then be examined--in the form of a ResultSet--but one obtained with the special method getGeneratedKeys(), rather than the usual way of getting the ResultSet from a query. Should the client then assume the semantics of executeUpdate have changed for this case, the outcome isn't fully known yet, and errors could come to light while examining the returned keys? Or should it still assume that executeUpdate has its usual meaning, all the work is done by that point, and the ResultSet from getGeneratedKeys() is simply a convenient, familiar interface for examining what came back? I'm not sure if there's a firm answer to that one way or the other in the formal JDBC spec, but the latter seems perhaps safer to me. Regards, -Chap
Re: CommandStatus from insert returning when using a portal.
On 2023-07-14 14:19, David G. Johnston wrote: Because of the returning they all need a portal so far as the server is concerned and the server will obligingly send the contents of the portal back to the client. Dave's pcap file, for the fetch count 0 case, does not show any portal name used in the bind, describe, or execute messages, or any portal close message issued by the client afterward. The server may be using a portal in that case, but it seems more transparent to the client when fetch count is zero. Perhaps an easy rule would be, if the driver itself adds RETURNING because of a RETURN_GENERATED_KEYS option, it should also force the fetch count to zero and collect all the returned rows before executeUpdate returns, and then it will have the right count to return? It seems that any approach leaving any of the rows unfetched at the time executeUpdate returns might violate a caller's expectation that the whole outcome is known by that point. Regards, -Chap
Re: CommandStatus from insert returning when using a portal.
On 2023-07-14 14:19, David G. Johnston wrote: Is there some magic set of arguments I should be using besides: tcpdump -Ar filename ? I opened it with Wireshark, which has a pgsql protocol decoder. Regards, -Chap
Re: CommandStatus from insert returning when using a portal.
On 2023-07-12 21:30, David G. Johnston wrote: Right, and executeUpdate is the wrong API method to use, in the PostgreSQL world, when executing insert/update/delete with the non-SQL-standard returning clause. ... ISTM that you are trying to make user-error less painful. In Dave's Java reproducer, no user-error has been made, because the user supplied a plain INSERT with the RETURN_GENERATED_KEYS option, and the RETURNING clause has been added by the JDBC driver. So the user expects executeUpdate to be the right method, and return the row count, and getGeneratedKeys() to then return the rows. I've seen a possibly even more interesting result using pgjdbc-ng with protocol.trace=true: FetchSize=0 1.>t.>T$>Z* 2.>D.>D.>C.>Z* executeUpdate result: 2 ids: 1 2 FetchSize=1 2.>D.>s* executeUpdate result: -1 ids: 3 D.>s* 4 C* 3.>Z* FetchSize=2 2.>D.>D.>s* executeUpdate result: -1 ids: 5 6 C* 3.>Z* FetchSize=3 2.>D.>D.>C* 3.>Z* executeUpdate result: 2 ids: 7 8 Unless there's some interleaving of trace and stdout messages happening here, I think pgjdbc-ng is not even collecting all the returned rows in the suspended-cursor case before executeUpdate returns, but keeping the cursor around for getGeneratedKeys() to use, so executeUpdate returns -1 before even having seen the later command complete, and would still do that even if the command complete message had the right count. Regards, -Chap
Re: CommandStatus from insert returning when using a portal.
On 2023-07-14 12:58, Dave Cramer wrote: See attached pcap file So if the fetch count is zero and no portal is needed, or if the fetch count exceeds the row count and the command completion follows directly with no suspension of the portal, then it comes with the correct count, but if the portal gets suspended, then the later command completion reports a zero count? Regards, -Chap
Re: CommandStatus from insert returning when using a portal.
On 2023-07-12 20:57, Dave Cramer wrote: Without a cursor it returns right away as all of the results are returned by the server. However with cursor you have to wait until you fetch the rows before you can get the CommandComplete message which btw is wrong as it returns INSERT 0 0 instead of INSERT 2 0 To make sure I am following, was this describing a comparison of two different ways in Java, using JDBC, to perform the same operation, one of which behaves as desired while the other doesn't? If so, for my curiosity, what do both ways look like in Java? Or was it a comparison of two different operations, say one an INSERT RETURNING and the other something else? Regards, -Chap
Re: CommandStatus from insert returning when using a portal.
Dave Cramer writes: Obviously I am biased by the JDBC API which would like to have PreparedStatement.execute() return the number of rows inserted without having to wait to read all of the rows returned Huh ... just how *is* PreparedStatement.execute() supposed to behave when the statement is an INSERT RETURNING? execute() -> true getResultSet() -> the rows getMoreResults() -> false getUpdateCount() -> number inserted? It seems that would fit the portal's behavior easily enough. Or is the JDBC spec insisting on some other order? Regards, -Chap
Re: COPY table FROM STDIN via SPI
On 2023-07-12 14:18, Joe Conway wrote: On 7/11/23 22:52, James Sewell wrote: What about running a COPY directly from C - is that possible? https://www.postgresql.org/docs/current/libpq-copy.html Or is the question about a COPY kicked off from server-side C code (following up a question about SPI)? If the idea is to kick off a COPY that reads from the connected client's STDIN, the wire protocol doesn't really have a way to work that out with the client, as Tom pointed out. Or is the goal for some server-side code to quickly populate a table from some file that's readable on the server and has the same format that COPY FROM expects? Regards, -Chap
Re: When IMMUTABLE is not.
On 2023-06-15 10:19, David G. Johnston wrote: The failure to find and execute the function code itself is not a failure mode that these markers need be concerned with. Assuming one can execute the function an immutable function will give the same answer for the same input for all time. That was the view I ultimately took, and just made PL/Java suppress that SPI readonly flag when going to look for the function code. Until that change, you could run into the not-uncommon situation where you've just loaded a jar of new functions and try to use them in the same transaction, and hey presto, the VOLATILE ones all work, and the IMMUTABLE ones aren't there yet. Regards, -Chap
Re: When IMMUTABLE is not.
On 2023-06-15 09:58, c...@anastigmatix.net wrote: also influences what snapshot the function is looking at, and therefore what it can see, which has also struck me more as a tacked-on effect than something inherent in the declaration's meaning. I just re-read that and realized I should anticipate the obvious response "but how can it matter what the function can see, if it's IMMUTABLE and depends on no data?". So, I ran into the effect while working on PL/Java, where the code of a function isn't all found in pg_proc.prosrc; that just indicates what code has to be fetched from sqlj.jar_entry. So one could take a strict view that "no PL/Java function should ever be marked IMMUTABLE" because every one depends on fetching something (once, at least). But on the other hand, it would seem punctilious to say that f(int x, int y) { return x + y; } isn't IMMUTABLE, only because it depends on a fetch /of its own implementation/, and overall its behavior is better described by marking it IMMUTABLE. Regards, -Chap
Re: When IMMUTABLE is not.
On 2023-06-15 09:21, Tom Lane wrote: Yura Sokolov writes: not enough to be sure function doesn't manipulate data. Of course not. It is the user's responsibility to mark functions properly. And also, isn't it the case that IMMUTABLE should mark a function, not merely that "doesn't manipulate data", but whose return value doesn't depend in any way on data (outside its own arguments)? The practice among PLs of choosing an SPI readonly flag based on the IMMUTABLE/STABLE/VOLATILE declaration seems to be a sort of peculiar heuristic, not something inherent in what that declaration means to the optimizer. (And also influences what snapshot the function is looking at, and therefore what it can see, which has also struck me more as a tacked-on effect than something inherent in the declaration's meaning.) Regards, -Chap
Re: Let's make PostgreSQL multi-threaded
On 2023-06-06 12:24, Heikki Linnakangas wrote: I'm afraid having multiple processes and JVMs doesn't help that. If you can escape the one JVM in one backend process, it's game over. So there's escape and there's escape, right? Java still prioritizes (and has, in fact, strengthened) barriers against breaking module encapsulation, or getting access to arbitrary native memory or code. The features that have been deprecated, to eventually go away, are the ones that offer fine-grained control over operations that there are Java APIs for. Eventually it won't be as easy as it is now to say "ok, your function gets to open these files or these sockets but not those ones." Even for those things, there may yet be solutions. There are Java APIs for virtualizing the view of the file system, for example. It's yet to be seen how things will shake out. Configuration may get trickier, and there may be some incentive to to include, say, sepgsql in the picture. Sure, even access to a file API can be game over, depending on what file you open, but that's already the risk for every PL with an 'untrusted' flavor. Regards, -Chap
Re: Let's make PostgreSQL multi-threaded
On 2023-06-06 08:06, Konstantin Knizhnik wrote: 7. It will be hard to support non-multithreaded PL languages (like python), but for example support of Java will be more natural and efficient. To this I say ... Hmm. Surely, the current situation with a JVM in each backend process (that calls for one) has been often seen as heavier than desirable. At the same time, I am not sure how manageable one giant process with one giant JVM instance would prove to be, either. It is somewhat nice to be able to tweak JVM settings in a session and see what happens, without disrupting other sessions. There may also exist cases for different JVM settings in per-user or per- database GUCs. Like Python with the GIL, it is documented for JNI_CreateJavaVM that "Creation of multiple VMs in a single process is not supported."[1] And the devs of Java, in their immeasurable wisdom, have announced a "JDK Enhancement Proposal" (that's just what these things are called, don't blame Orwell), JEP 411[2][3], in which all of the Security Manager features that PL/Java relies on for bounds on 'trusted' behavior are deprecated for eventual removal with no functional replacement. I'd be even more leery of using one big shared JVM for everybody's work after that happens. Might the work toward allowing a run-time choice between a process or threaded model also make possible some intermediate models as well? A backend process for connections to a particular database, or with particular authentication credentials? Go through the authentication handshake and then sendfd the connected socket to the appropriate process. (Has every supported platform got something like sendfd?) That way, there could be some flexibility to arrange how many distinct backends (and, for Java purposes, how many JVMs) get fired up, and have each sharing sessions that have something in common. Or, would that just require all the complexity of both approaches to synchronization, with no sufficient benefit? Regards, -Chap [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/invocation.html#jni_createjavavm [2] https://blogs.apache.org/netbeans/entry/jep-411-deprecate-the-security1 [3] https://github.com/tada/pljava/wiki/JEP-411
Re: Documentation about PL transforms
On 2022-07-28 01:51, Pavel Stehule wrote: Would it be clearer to say: It also contains the OID of the intended procedural language and whether that procedural language is declared as TRUSTED. While these values are redundant if the inline handler is serving only a single procedural language, they are necessary to allow one inline handler to serve more than one PL. ok I will probably be unable to produce a new patch for this CF. Family obligation. Regards, -Chap
Re: Documentation about PL transforms
On 2022-07-13 00:26, Pavel Stehule wrote: 1. I am not sure if get_call_trftypes is a good name - the prefix get_call is used when some runtime data is processed. I guess I hadn't caught on that the prefix carried that meaning. To me, it appeared to be a prefix used to avoid being specific to 'function' or 'procedure'. This function just returns reformatted data from the system catalogue. Maybe get_func_trftypes_list, or just replace function get_func_trftypes (now, the list is an array, so there should not be performance issues). For consistency, the new function should be used in plperl and plpython too. Probably this function is not If it is acceptable to replace get_func_trftypes like that, I can produce such a patch. 2. +It also contains the OID of the intended procedural language and whether +that procedural language is declared as TRUSTED, useful +if a single inline handler is supporting more than one procedural language. I am not sure if this sentence is in the correct place. Maybe can be mentioned separately, so generally handlers can be used by more than one procedural language. But maybe I don't understand this sentence. My point was that, if the structure did /not/ include the OID of the PL and its TRUSTED property, then it would not be possible for a single inline handler to support more than one PL. So that is why it is a good thing that those are included in the structure, and why it would be a bad thing if they were not. Would it be clearer to say: It also contains the OID of the intended procedural language and whether that procedural language is declared as TRUSTED. While these values are redundant if the inline handler is serving only a single procedural language, they are necessary to allow one inline handler to serve more than one PL. Regards, -Chap
Re: timezones BCE
On 2022-04-13 14:13, Dave Cramer wrote: Oh please don't do something bespoke. I'm trying to make this work with the JDBC driver. So it has to be at least compatible with other libraries. Looks like Java agrees with the offset, prior to Toronto's 1895 adoption of the hour-wide zone: jshell> java.time.ZoneId.of("America/Toronto"). ...> getRules(). ...> nextTransition(java.time.Instant.parse("0101-01-01T00:00:00Z")) $1 ==> Transition[Gap at 1895-01-01T00:00-05:17:32 to -05:00] Regards, -Chap
Re: timezones BCE
On 2022-04-13 12:33, Dave Cramer wrote: test=# set timezone to 'America/Toronto'; SET test=# select '0101-01-01'::timestamptz; timestamptz -- 0101-01-01 00:00:00-05:17:32 Specifically why the -05:17:32 Timezones were regularized into their (typically hour-wide) chunks during a period around the late nineteenth century IIRC. If you decompile the zoneinfo database to look at America/Toronto, you will probably find an entry for dates earlier than when the regularized zones were established there, and that entry will have an offset reflecting Toronto's actual longitude. Regards, -Chap
Re: Documentation about PL transforms
On 2022-02-22 12:59, Chapman Flack wrote: It would have been painful to write documentation of get_func_trftypes saying its result isn't what get_transform_{from.to}sql expect, so patch 1 does add a get_call_trftypes that returns a List *. Patch 2 then updates the docs as discussed in this thread. It turned out plhandler.sgml was kind of a long monolith of text even before adding transform information, so I broke it into sections first. This patch adds the section markup without reindenting, so the changes aren't obscured. The chapter had also fallen behind the introduction of procedures, so I have changed many instances of 'function' to the umbrella term 'routine'. Patch 3 simply reindents for the new section markup and rewraps. Whitespace only. Patch 4 updates src/test/modules/plsample to demonstrate handling of transforms (and to add some more comments generally). Here is a rebase.From 9c726423d47a114a82ca703c0e03a62e3d74f1f6 Mon Sep 17 00:00:00 2001 From: Chapman Flack Date: Mon, 21 Feb 2022 20:56:28 -0500 Subject: [PATCH v2 1/4] Warmup: add a get_call_trftypes function The existing get_func_trftypes function produces an Oid[], where both existing get_transform_{from,to}sql functions that depend on the result expect a List*. Rather than writing documentation awkwardly describing functions that won't play together, add a get_call_trftypes function that returns List*. (The name get_call_... to distinguish from get_func_... follows the naming used in funcapi.h for a function returning information about either a function or a procedure.) --- src/backend/utils/cache/lsyscache.c | 18 ++ src/include/funcapi.h | 5 + src/include/utils/lsyscache.h | 1 + 3 files changed, 24 insertions(+) diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 1b7e11b..3cd94db 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -2117,6 +2117,24 @@ get_transform_tosql(Oid typid, Oid langid, List *trftypes) return InvalidOid; } +/* + * get_call_trftypes + * + * A helper function that does not itself query the transform cache, but + * constructs the transform-type List expected by the functions above. + */ +List * +get_call_trftypes(HeapTuple procTup) +{ + Datum protrftypes; + bool isNull; + + protrftypes = SysCacheGetAttr(PROCOID, procTup, + Anum_pg_proc_protrftypes, + &isNull); + return isNull ? NIL : oid_array_to_list(protrftypes); +} + /*-- TYPE CACHE -- */ diff --git a/src/include/funcapi.h b/src/include/funcapi.h index dc3d819..70c3e13 100644 --- a/src/include/funcapi.h +++ b/src/include/funcapi.h @@ -175,7 +175,12 @@ extern int get_func_arg_info(HeapTuple procTup, extern int get_func_input_arg_names(Datum proargnames, Datum proargmodes, char ***arg_names); +/* + * A deprecated earlier version of get_call_trftypes (in lsyscache.h). + * That version produces a List, which is the form downstream functions expect. + */ extern int get_func_trftypes(HeapTuple procTup, Oid **p_trftypes); + extern char *get_func_result_name(Oid functionId); extern TupleDesc build_function_result_tupdesc_d(char prokind, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index b8dd27d..93b19e7 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid); extern bool get_rel_relispartition(Oid relid); extern Oid get_rel_tablespace(Oid relid); extern char get_rel_persistence(Oid relid); +extern List *get_call_trftypes(HeapTuple procTup); extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes); extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes); extern bool get_typisdefined(Oid typid); -- 2.7.3 From 7fb3f8971c4c573903b1d08bcbeb126e31a6544c Mon Sep 17 00:00:00 2001 From: Chapman Flack Date: Mon, 21 Feb 2022 20:59:32 -0500 Subject: [PATCH v2 2/4] Update PL handler implementation docs The original purpose was to add information on support for CREATE TRANSFORM (which must be explicitly coded in any PL implementation intending to support it). But the plhandler section was about as long as a monolith of text ought to be, even before adding transform information, so reorganized first into sections. Front-loaded with short descriptions of the three possible functions (call handler, validator, inline handler) registered with CREATE LANGUAGE. The latter two were afterthoughts in historical sequence, but the docs don't need to present them that way. The section had also fallen behind the introduction of procedures, so updated to generally use the umbrella term 'routine' in place of 'function'. New section markup added here without indentation change, to avoid obscuring changes. A follow-on commit will reindent and rewrap. --- doc/src/sgml/event-trigger.sgml| 16 ++ doc/src/sgml/plhandler.sgml
Re: test/isolation/expected/stats_1.out broken for me
On 2022-04-07 15:04, Andres Freund wrote: And done. Chap, could you confirm this fixes the issue for you? Looks good from here. One installcheck-world with no failures; previously, it failed for me every time. Regards, -Chap
Re: test/isolation/expected/stats_1.out broken for me
On 2022-04-07 12:49, Tom Lane wrote: So what non-default build options are you using? The command that I've just been reusing from my bash_history without thinking about it for some years is: configure --enable-cassert --enable-tap-tests \ --with-libxml --enable-debug \ CFLAGS='-ggdb -Og -g3 -fno-omit-frame-pointer' Regards, -Chap
test/isolation/expected/stats_1.out broken for me
Running installcheck-world on an unrelated patch, I noticed a failure here in test/isolation/expected/stats_1.out (this is line 3102): step s1_slru_check_stats: SELECT current.blks_zeroed > before.value FROM test_slru_stats before INNER JOIN pg_stat_slru current ON before.slru = current.name WHERE before.stat = 'blks_zeroed'; ?column? t (1 row) This is built from bab588c. On my amd64/linux box the result is f. The same mismatch is present if I build from 6392f2a (i.e., just before a2f433f pgstat: add alternate output for stats.spec), along with a bunch of others. So a2f433f seems to have silenced all the rest of those, but not this one. If I build from ad40166, installcheck-world passes. That's as far as I have pursued it. Regards, -Chap
Re: trigger example for plsample
On 2022-04-06 16:44, Mark Wong wrote: I think I've applied all of these suggestions and attached a new patch. That looks good to me, though I wonder about the pfree(source). In the simplest case of a PL that uses no advance compilation or augmentation step, the Code Execution block might naturally refer to source, so perhaps the example boilerplate shouldn't include a pfree that needs to be removed in that case. In fact, I wonder about the need for any retail pfree()s here. Those added in this patch are the only ones in plsample.c. They are small allocations, and maybe it would both streamline the example to leave out the pfree calls, and be an illustration of best practice in letting the memory context machinery handle all the deallocation at once, where there isn't a special need to free something, like an especially large allocation, at retail. Regards, -Chap
Re: PostgreSQL shutdown modes
On 2022-04-01 13:22, Robert Haas wrote: I attach herewith a modest patch to rename these shutdown modes to more accurately correspond to their actual characteristics. I've waited for April 2nd to submit this comment, but it seemed to me that the suggestion about the first-pass checkpoint in 'slow' mode is a no-foolin' good one. Then I wondered whether there could be an option to accompany the 'dumb' mode that would take a WHERE clause, to be implicitly applied to pg_stat_activity, whose purpose would be to select those sessions that are ok to evict without waiting for them to exit. It could recognize, say, backend connections in no current transaction that are from your pesky app or connection pooler that holds things open. It could also, for example, select things in transaction state but where current_timestamp - state_change > '5 minutes' (so it would be re-evaluated every so often until ready to shut down). For conciseness (and sanity), maybe the WHERE clause could be implicitly applied, not to pg_stat_activity directly, but to a (virtual or actual) view that has already been restricted to client backend sessions, and already has a column for current_timestamp - state_change. Regards, -Chap