Re: Dropping all tables in a database

2023-08-06 Thread Thorsten Glaser
On Sun, 6 Aug 2023, H wrote:

>I am running PostgreSQL  13.11 and tried to drop all tables in a
>database without dropping the database or schema.

See:
https://evolvis.org/plugins/scmgit/cgi-bin/gitweb.cgi?p=useful-scripts/useful-scripts.git;a=tree;f=SQL;hb=HEAD

Comments welcome (especially a fix to the item still in TODO).

bye,
//mirabilos
-- 
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg


/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against  Mit dem tarent-Newsletter nichts mehr verpassen:
 ╳  HTML eMail! Also, https://www.tarent.de/newsletter
╱ ╲ header encryption!





Re: speed up full table scan using psql

2023-05-31 Thread Thorsten Glaser
On Wed, 31 May 2023, Adrian Klaver wrote:

> Given that I would just COPY the data out as CSV.

I recently did something similar. I found the JSON functions not quite
satisfying and the extra spaces redundant, but it turns out that, for
a numerical table, exporting as CSV, loading that via AJAX then (on
xhr.responseText) substituting newlines with '],[' and prepending '[['
and appending ']]' was enough to let JSON.parse eat it.

With strings this is more complex ofc (though partial use of JSON
functions, e.g. to convert strings to JSONString already, might help).

bye,
//mirabilos
-- 
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg


/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against  Mit dem tarent-Newsletter nichts mehr verpassen:
 ╳  HTML eMail! Also, https://www.tarent.de/newsletter
╱ ╲ header encryption!





Re: Would PostgreSQL 16 native transparent data encryption support database level encryption?

2023-05-22 Thread Thorsten Glaser
(please read http://deb.li/quote and don’t top-post)

On Mon, 22 May 2023, Tony Xu wrote:

>First question - do we, as a community, see the value of the proposal and
>do we believe that value is big enough for us to make any necessary changes

I’d rather like to see the energy, if there’s some invested in the
first place, pointed towards creating something like a “super
postmaster” that helps multiple clusters running on the same
machine (even different versions, starting at the first to support
such) manage the entire-machine resources (CPUs, RAM, …) better.

That has use cases beyond encryption (incidentally, multi-tenant,
but perhaps also keeping an older cluster around for stuff that
cannot yet be dangerlessly migrated to a newer PostgreSQL version).

Just my 2¢, uninformed wondering.

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Would PostgreSQL 16 native transparent data encryption support database level encryption?

2023-05-18 Thread Thorsten Glaser
On Thu, 18 May 2023, Ron wrote:

>> Why not using multiple clusters then?
>
> Yet More Firewall Rules to get approved by the Security Team.  And then they
> balk at port 5433 because they've never heard of it.

But mixing multiple customers on one cluster is much more of a risk.

> And from a technical point of view, one Postgresql system can better manage 
> the
> memory on a VM than two which don't know about each other.

Probably true. Is there something with which multiple clusters running
on the same server can communicate to do that better?

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Would PostgreSQL 16 native transparent data encryption support database level encryption?

2023-05-18 Thread Thorsten Glaser
On Thu, 18 May 2023, Tony Xu wrote:

>Our use-case is for a multi-tenancy scenario - we are considering using
>different databases to store different customer's data, however, for

Why not using multiple clusters then?

Better isolation of the customers, but still on one server.

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Window function for get the last value to extend missing rows

2023-05-12 Thread Thorsten Glaser
On Fri, 12 May 2023, GF wrote:

>"The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead,
>lag, first_value, last_value, and nth_value. This is not implemented in
>PostgreSQL: the behavior is always the same as the standard's default,
>namely RESPECT NULLS".

Yeah, THAT caused no small amount of cursing, earlier this year,
I’d have also used IGNORE NULLS somewhere…

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Adding SHOW CREATE TABLE

2023-05-12 Thread Thorsten Glaser
On Fri, 12 May 2023, Nathaniel Sabanski wrote:

>I believe most users would anticipate a CREATE TABLE statement that aligns
>with the currently installed version- this is the practical solution for

The currently installed version of what, the server or the client?

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Why not use the calloc to replace malloc?

2023-04-23 Thread Thorsten Glaser
On Sat, 22 Apr 2023, Tom Lane wrote:

>Wen Yi  writes:
>> [ use calloc to replace zeroing fields individually ]
[…]
>People have complained about this practice off-and-on, but no one has
>provided any evidence that there's a significant performance cost.
>The maintenance benefits are real though.

Oh, interesting ;-) Thanks for this explanation.

Another data point is: calloc is not correct for pointer fields,
you have to manually assign NULL to them afterwards still, because
NULL doesn’t have to be represented by all-zero bytes (e.g. TenDRA
supports having 0x as NULL pointer as an option).

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: "PANIC: could not open critical system index 2662" - twice

2023-04-13 Thread Thorsten Glaser
On Fri, 14 Apr 2023, Laurenz Albe wrote:

>So if your disk replaces a valid block with zeros (filesystem check
>after crash?), that could explain what you see.

Oh, I had that happen on a RAID 1 once. On of the two discs had an
intermittent error (write I guess) but didn’t fail out of the RAID,
and some of the reads from there got zero-filled blocks in some
positions. It was a CVS repository so I was able to identify all such
blocks in question and restore them from the rsync slave (whose initial
population predated the HDD issue).

Hm, now that I think about it, it could even have been a read error
with subsequent block reassignment. Oh well.

Filesystem issues (ext3, and ext4 without/predating auto_da_alloc,
in particular) could be it just as well of course.

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Guidance on INSERT RETURNING order

2023-04-12 Thread Thorsten Glaser
On Wed, 12 Apr 2023, Mike Bayer wrote:

>ascending values for "mytable_id_seq" will correspond to the ascending
>values for "num".

But, again, what if it’s uuid or something instead of a sequence?

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Guidance on INSERT RETURNING order

2023-04-12 Thread Thorsten Glaser
On Wed, 12 Apr 2023, Kirk Wolak wrote:

>The solution exists.  Pre-fetch the IDs, assign them and insert them with
>the IDs.  Then you have 100% control.
>SELECT NEXTVAL('tbl_seq') from GENERATE_SERIES(1, );

This would be a nice solution… but the PK could be always generated,
or not even sequential (UUIDs for example). If you’re developing
schema-first the application would not even (need to) know about how
the IDs are generated as it’s the DB that generates them.

AIUI the OP’s an SQLAlchemy developer, which is an ORM. Therefore he
needs a generic solution.

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Guidance on INSERT RETURNING order

2023-04-11 Thread Thorsten Glaser
On Tue, 11 Apr 2023, Federico wrote:

>My understanding was that they are generated in select order

But are they? (I don’t know, but I’d not assume that.)

>If my understanding is incorrect, would this alternative guarantee the above

>INSERT INTO t(id, data)
>SELECT nextval(pg_get_serial_sequence('t', 'id')) data
>FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num)
>ORDER BY num
>RETURNING id

Wouldn’t, at that point, it be better to just send multiple
individual INSERT statements? The overhead (on both sides)
for all mentioned… workarounds… surely is larger than that?

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Guidance on INSERT RETURNING order

2023-04-11 Thread Thorsten Glaser
On Tue, 11 Apr 2023, Federico wrote:

>I was under the impression that when using INSERT SELECT ORDER BY the sequence
>ids were generated using the select order.

But someone said that’s not guaranteed, especially when INSERT will
be parallelised later.

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Guidance on INSERT RETURNING order

2023-04-11 Thread Thorsten Glaser
On Tue, 11 Apr 2023, Federico wrote:

>Of course sorting the returned ids is only viable when using a serial

Yes, which is why I pointed out it doesn’t have to be.

>or identity column, that's why in the general case I've mentioned the
>insert with sentinel column

But it was pointed out that that’s not guaranteed either, unless you add
that sentinel column to the table itself…

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Guidance on INSERT RETURNING order

2023-04-11 Thread Thorsten Glaser
On Tue, 11 Apr 2023, Federico wrote:

>The problem here is not having the auto increment id in a particular

The id might not even be auto-increment but UUID or something…
(I am surprised you would even try to insert multiple rows at once.)

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




RE: pg_ctlcluster is not stopping cluster

2023-04-07 Thread Thorsten Glaser
On Fri, 7 Apr 2023, Telium Technical Support wrote:

>I’m assuming something is misconfigured on the host that’s causing this
>unusual behavior….and that’s what I need to understand

The mix between 13 and 15 here is what I’d consider a misconfiguration.

Also, please don’t top-post and full-quote.

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: pg_ctlcluster is not stopping cluster

2023-04-07 Thread Thorsten Glaser
On Fri, 7 Apr 2023, Telium Technical Support wrote:

>Notice: extra pg_ctl/postgres options given, bypassing systemctl for stop

>it might be 13.  Coincidentally, there is a postmaster.pid file in a
>directory OTHER than the data directory:
>
>/var/lib/postgresql/15/main/postmaster.pid
>
>(and notice the 15). Is this a clue?

Maybe the pidfile is written into the cluster version-based directory
instead of the data directory. Best figure out what exactly writes the
pidfile, whether systemd is used for starting (which would of course be
a prime suspect as it says it’s explicitly not used for stopping), etc.

Maybe this is indeed a bug in whatever determines the pidfile path,
perhaps not; is it supposed to live within the data directory?

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: COPY and custom datestyles. Or some other technique?

2023-03-29 Thread Thorsten Glaser
On Wed, 29 Mar 2023, Ron wrote:

> There are 550+ tables, so something that I can do once on this end would make
> my life a lot easier.

Some quick perl or awk or shell job to batch-change the field
to an accepted syntax is probably quicker.

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Binding Postgres to port 0 for testing

2023-03-25 Thread Thorsten Glaser
Hi Markus,

>I am building a simple integration test framework for an application that I
>am building. For this project I am planning to use PostgreSQL.

check pg_virtualenv(1), which apparently is a Debian thing.
It auto-creates and auto-deletes at the end, if desired, a
cluster and runs your program with suitable environment
variables to access that cluster, which includes choosing
an available port. This was in fact made for integration
testing, though I use it for quickly spinning up throwaway
DBs for one-shot tools as well.

bye,
//mirabilos
-- 
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg


/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against  Mit dem tarent-Newsletter nichts mehr verpassen:
 ╳  HTML eMail! Also, https://www.tarent.de/newsletter
╱ ╲ header encryption!





Re: pg_upgrade Only the install user can be defined in the new cluster

2023-03-20 Thread Thorsten Glaser
On Mon, 20 Mar 2023, David G. Johnston wrote:

>On Monday, March 20, 2023, Dávid Suchan 
>wrote:
>
>> I installed both postgres versions on ubuntu machine with 'apt-get install
>> postgres', which installed both client and server packages. Is that where I
>> made a mistake?

>Anyway, you had to have done more than that to use those other roles to
>appear in the new cluster.

And the Debian instructions are pretty clear, they tell one to
first drop the new version’s per-default-installed cluster then
upgrade the old cluster (into a fresh new one).

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Uppercase version of ß desired

2023-03-14 Thread Thorsten Glaser
On Tue, 14 Mar 2023, Celia McInnis wrote:

>uc_alphabet = lc_alphabet.replace('ß', 'ẞ').upper()

That’s probably for the best. The uppercase Eszett was only added
to Unicode under the rule that the lowercase Eszett’s case rules
are kept unchanged, and the former’s considered normally only ever
typed manually.

Of course, the grammar rules about uppercasing ß have since changed,
but since there’s two valid ways, choosing is the application’s duty.

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-03-04 Thread Thorsten Glaser
On Sat, 4 Mar 2023, Alban Hertroys wrote:

>> But isn’t that the same as with a regular LEFT JOIN?
>
>Similar, but not the same, I’d say.
>
>I do now notice that I made some copying errors there, I was a bit
>nauseous at that time.

np, I’m under pollen attack currently so also not at my best.

>That should have read:
>
>>> select jsonb_build_object('opening_times’,
>>> jsonb_agg(obj
>>> ORDER BY
>>> obj->>'weekday’,
>>> obj->>'from_hour’,
>>> obj->>'to_hour')
>>> )
>>> from cot
>>> cross join lateral jsonb_build_object(
>>> 'weekday', cot.weekday,
>>> 'from_hour', cot.from_hour,
>>> 'to_hour', cot.to_hour) obj
>
>The lateral join applies the function to each row returned from the
>left side of the join and enriches that row with the function result. I
>used a cross join because there is no join condition to apply to the
>lateral, otherwise you could also use an inner join on true.

Okay, so I got it now. But that’s still identical to the LEFT JOIN
which I’m using in that example, because…

>A left join wouldn’t make much sense here, unless the function could

… I have a LEFT JOIN already and can just use the CTE there,
so I don’t have to add an extra lateral join.

But good to know for the future/when I don’t have that.

>return NULL - for example if it were a function marked as STRICT and
>some of the input parameter values (from the table) could be NULL.

OK.

>You need a sub-select, which in turn creates its own result set. It’s
>up to the planner whether the left or the right side gets executed
>first, after which the results of the other side of the join get merged
>to this, or whether this can all be collected in one go. That’s up to
>the query planner to decide though, and it could be right.

OK, but that shouldn’t make a difference here as it needs to run over
all rows of the cgwa table anyway (possibly reduced by filtering on
users).

While not the case here, I see that for other entries the lateral
join would cause more work: for the “qualification” kinds of tables,
for example, the individual qualification table has very few rows
(these are entered by the site admin), but the m:n connection table
(whatever the correct name for these is) has a lot because many of
the users have many of these qualifications. If I use a CTE to add
a JSON object to the individual qualification table first, it doesn’t
run on each qualification multiple times; if I use a lateral join,
it possibly, modulo planner optimisations, runs the jsonb_build_object
function many times per qualification despite them all giving the same
result. And, even if the optimisations catch that, it’s mentally not
the same.

>In my experience, lateral joins go well with the jsonb functions. They
>tend to reduce code repetition when referencing object members, such as
>in your case.

Right. In my case I can get the same by adding a CTE instead though,
and it’s hard to see which is better, performance-wise.

This is a lot to take in, and I r̲e̲a̲l̲l̲y̲ appreciate the detailed
explanations given alongside ☻

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-03-03 Thread Thorsten Glaser
On Fri, 3 Mar 2023, Alban Hertroys wrote:

>You can rewrite that into something like this:
>
>select jsonb_build_object('opening_times’,
>   obj
>   ORDER BY
>   obj->>'weekday’,
>   obj->>'from_hour’,
>   obj->>'to_hour')
>)
>from cot
>cross join lateral jsonb_agg(jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour) obj

But isn’t that the same as with a regular LEFT JOIN?

>>cgwaj AS (
>>  SELECT cgwa.id AS id, jsonb_build_object(
>>  'weekday', cgwa.weekday,
>>  'forenoon', cgwa.forenoon,
>>  'afternoon', cgwa.afternoon,
>>  'evening', cgwa.evening) AS obj
>>  FROM core_generalworkavailability cgwa

plus

>>  LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id

With the addition that I can aggregate…

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-03-02 Thread Thorsten Glaser
On Tue, 28 Feb 2023, Alban Hertroys wrote:

>Perhaps you can use a lateral cross join to get the result of
>jsonb_build_object as a jsonb value to pass around?

I don’t see how. (But then I’ve not yet worked with lateral JOINs.)

But I c̲a̲n̲ just generate the objects first, I t̲h̲i̲n̲k̲, given one
of them corresponds to exactly one of the rows of an m:n-linked
table and nothing else. Something like…

WITH
cgwaj AS (
SELECT cgwa.id AS id, jsonb_build_object(
'weekday', cgwa.weekday,
'forenoon', cgwa.forenoon,
'afternoon', cgwa.afternoon,
'evening', cgwa.evening) AS obj
FROM core_generalworkavailability cgwa
),
-- … same for opening times
SELECT cp.email, …,
-- …
jsonb_build_object('possible_work_times', COALESCE(
jsonb_agg(DISTINCT cgwaj.obj ORDER BY cgwaj.obj->>'weekday',
cgwaj.obj->>'forenoon', cgwaj.obj->>'afternoon',
cgwaj.obj->>'evening')
FILTER (WHERE cgwaj.id IS NOT NULL))) ||
-- …
FROM core_person cp
-- …
LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id
LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id
-- …

That is, add a CTE for each m:n-attached table whose “value” is
an object, not a single field, keep the id field; LEFT JOIN that
(instead of the original table), then we have a field to use in
ORDER BY.

I think. I’ve not yet tried it (I don’t have access to that DB
normally, I was just helping out).

This avoids sub-SELECTs in the sense of needing to run one for
each user row, because the innermost JSON object building needs
to be done for each (connected (if the query is not filtering on
specific users)) row of the “property table”, anyway. (And even
if filtered, that can be passed down.)

bye,
//mirabilos
-- 
Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
hervorragend.   -- Andreas Bogk über boehm-gc in d.a.s.r





Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-02-27 Thread Thorsten Glaser
On Mon, 27 Feb 2023, David G. Johnston wrote:

>Consider this then as a jumping point to a more precise query form:
[…]
>the basic concept holds - produce single rows in subqueries then join those
>various single rows together to produce your desired json output.

Ouch. I’ll have to read up and experiment with that, I guess.
But wouldn’t I rather then do a CTA for each referenced table
that does the aggregating and GROUP BY person_id, then join that
(which has single rows for each user row) to users?

>--BAD cross joining going on with multiple one-to-many relationships
>--GOOD, only joining one-to-one relationships

Not quite. This is many-to-many created by Django…


On Mon, 27 Feb 2023, Tom Lane wrote:

>Sure, I was just trying to explain the rule.

Aaah, okay. Sorry, I misunderstood that, and the… general direction
of the replies, then.

>Well, that may be what you want, but it's not what you wrote in
>the query.  Follow David's advice and do
[…]
>I'm pretty sure that this will only incur one evaluation of the
>common subexpression, so even though it's tedious to type it's not
>inefficient.

Thanks. But I fear it’s not as simple as you wrote. More like:

jsonb_build_object('opening_times',
jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)
ORDER BY
jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)->>'weekday',
jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)->>'from_hour',
jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)->>'to_hour')
)

Isn’t that more like it?

(Though I guess at that point I could just drop at least
the to_hour fallback sort, hoping nobody inserts overlapping
hours with indentical start times…)

Now that I see that, it sort of re-raises a question I had
during developing this but didn’t pursue.

How about, instead of creating a JSONB object here, I create
a (not-JSON) ARRAY['weekday', cot.weekday, 'from_hour',
cot.from_hour, 'to_hour', cot.to_hour] in the aggregate function.
The array is something I could naturally use to sort (its elements
already are in sort key order), and its elements *also* are in
the order jsonb_build_object expects its arguments (assuming I can
do something like passing the array to it instead of “spreading”
the arguments). Pseudocode:

… hmm, not that easy. The aggregate would return e.g. this…

ARRAY[['weekday',1,…],['weekday',2,…]]

… and there’s no array_map function that could be used to pass
each inner array, one by one, to jsonb_build_object; converting
the outer array to JSON gets us json_array_elements (same problem),
other ways I can think of also don’t seem to have matching functions
(jq has… interesting ones).

As I write this, I fear that won’t fly because nōn-JSON arrays
cannot contain mixed types (text and numbers) in PostgreSQL… at
least I ran into that at some point in the past vaguely remembering…

Oh well,
//mirabilos
-- 
Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
hervorragend.   -- Andreas Bogk über boehm-gc in d.a.s.r




Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)

2023-02-27 Thread Thorsten Glaser
On Mon, 27 Feb 2023, Tom Lane wrote:

>Well, yeah.  Simplify it to
>
>  SELECT array_agg(DISTINCT x ORDER BY y) FROM mytable;

That’s… a bit too simple for this case.

>If there are several rows containing the same value of x and different
>values of y, which y value are we supposed to sort the unique-ified x
>value by?  It's an ill-defined query.

The problem here is that I do not have an ‘x’.

>For the specific example you give, it's true that any specific
>possible output of jsonb_build_object() would correspond to
>a unique set of cot.weekday, cot.from_hour, cot.to_hour values.

Not necessarily (see directly below), but why would that matter?
It should sort the generated JSON objects within the array.

>, or maybe thinking harder
>about why you're getting duplicates in the first place.

The application developer informed me that it’s entirely possible
that some user entered matching information twice. I don’t have
the exact query that produced duplicates easily in the history
(the downside of working with \i) and tests on other users didn’t
produce duplicates.

So, yes, filtering them out is indeed part of the task here.


On Mon, 27 Feb 2023, David G. Johnston wrote:

>So long as the function call itself is at least stable...:
>
>DISTINCT func_call(...) ORDER BY func_call(...)

aieee really?

(I’d hope jsonb_build_object to be.)

Is that better or worse than using the extra functions to sort…?

>Order By is evaluated AFTER and over the contents of the distinct-ified
>expression

That’s right and good, but the problem is that I do not seem to
have a syntax with which to refer to the distinct-ified expression
to use in the ORDER BY clause.

>> place; I am somewhat proud I even managed to write this with
>> JOINs and without any subqueries in the first place so I’m
>> hesitant to go that route.
>
>That pride seems misplaced.  Related to Tom's comment, the presence of the
>DISTINCTs is telling you that what you did is not good.  DISTINCT is almost
>always a code smell

Not about the DISTINCTs. I haven’t used JOIN much (usually WHERE
as inner join) nor on tables this massive, and this is my second
foray into aggregate functions only.

>Lastly, if you do need to care about normalizing the output of JSON you
>should consider writing a function that takes arbitrary json input and
>reformats it, rather than trying to build up json from scratch where every
>individual component needs to be aware and take action.  i.e., get rid of
>the ORDER BY also.  Maybe this belongs in an application layer with tooling
>that already provides this capability.

Funnily enough, both here and in the other place where I tried to
use JSON output, PostgreSQL (with COPY TO STDOUT) *is* the application
layer. Here I’m generating a CSV file; in the other situation I was
going to put the generated JSON directly into an HTTP result filehandle.

In the latter, I dropped that approach, output CSV and converted that
(by replacing newlines with “],[” and prepending “[[” and appending
“]]”) to JSON myself, which worked there as it was all-numeric. But
the frustration there was about unnecessary whitespace instead.

Both cases have in common that a, possibly huge, result set can be
directly streamed from PostgreSQL to the consumer, but the former
lacks just that tiny bit of functionality that would make it really
rock :/

I was asking here because perhaps either that missing functionality
can be considered, or to find out if there’s better ways to produce
that output, due to my inexperience with SQL. The ways I’m using do
work, and I’m relatively happy, but…

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)