Re: re-novice coming back to pgsql: porting an SQLite update statement to postgres
> On 15 Sep 2024, at 11:07, Dan Kortschak wrote: > > I have come to hopefully my last stumbling point. > > I am unable to see a way to express something like this SQLite syntax > > select json_group_array(json_replace(value, > '$.a', case >when json_extract(value, '$.a') > 2 then > 2 >else > json_extract(value, '$.a') >end, > '$.b', case >when json_extract(value, '$.b') < -2 then > -2 >else > json_extract(value, '$.b') >end > )) > from > json_each('[{"a":1, "b":-3},{"a":2, "b":-2},{"a":3, "b":-1}]'); What’s the result of that query in SQLite? I’m guessing it would be: [{"a":1, "b":-2},{"a":2, "b":-2},{"a":2, "b":-1}] I see basically two approaches. One is to take the objects apart and build them back together again, the other is to attempt to only replace the values that need replacing. For the sake of showing how both approaches modify the original, I added an extra field “c” to your objects that should be in the result unmodified. The first approach rebuilds the objects: with t as ( select jsonb($$[{"a":1, "b":-3, "c":1},{"a":2, "b":-2, "c":2},{"a":3, "b":-1, "c":3},{"a":3, "b":-3, "c":4}]$$) arr ) select jsonb_agg(jsonb_build_object( 'a', case when records.a > 2 then 2 else records.a end , 'b', case when records.b < -2 then -2 else records.b end , 'c', c )) from t cross join lateral jsonb_to_recordset(t.arr) records(a int, b int, c int) ; jsonb_agg -- [{"a": 1, "b": -2, "c": 1}, {"a": 2, "b": -2, "c": 2}, {"a": 2, "b": -1, "c": 3}, {"a": 2, "b": -2, "c": 4}] (1 row) The drawback is that you have to specify all fields and types, but you don’t need to cast the values all the time either. The replacement approach gets a bit trickier. I don’t see any good method that would replace both ‘a’ and ‘b’ values if they both go outside bounds in the same object. The jsonb_set function in PG doesn’t seem to be able to handle setting a value conditionally, let alone, setting multiple values conditionally in one call, so I ended up with replacing either ‘a’ or ‘b’. I did include a case where both ‘a’ and ‘b’ go out of bounds, replacing both values with there respective replacements, but the syntax for that approach doesn’t scale well to more combinations of fields and boundaries to check and replace. Hence I added the problematic case to the test string. As you can see from the previous query, that one does handle that case properly without much extra hassle. with t as ( select jsonb($$[{"a":1, "b":-3, "c":1},{"a":2, "b":-2, "c":2},{"a":3, "b":-1, "c":3},{"a":3, "b":-3, "c":4}]$$) arr ) select jsonb_agg( case when (obj->>'a')::INTEGER > 2 and (obj->>'b')::INTEGER < -2 then jsonb_set(jsonb_set(obj, '{a}', '2') ,'{b}', '-2') when (obj->>'a')::INTEGER > 2 then jsonb_set(obj, '{a}', '2') when (obj->>'b')::INTEGER < -2 then jsonb_set(obj, '{b}', '-2') else obj end) newArr from ( select jsonb_array_elements(arr) obj from t ) elements; newarr ------ [{"a": 1, "b": -2, "c": 1}, {"a": 2, "b": -2, "c": 2}, {"a": 2, "b": -1, "c": 3}, {"a": 2, "b": -2, "c": 4}] (1 row) For understanding both queries better, it probably helps to take out the jsonb_agg calls to see the separate objects from the array. Add the original obj back in for comparison, if you like. I typically use the documentation pages for the JSON functions and the one on aggregate functions, where the JSONB aggregates are located: https://www.postgresql.org/docs/16/functions-json.html https://www.postgresql.org/docs/16/functions-aggregate.html And if you’re not familiar with dollar quoting: https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING Alban Hertroys -- There is always an exception to always.
Re: re-novice coming back to pgsql: porting an SQLite update statement to postgres
> On 14 Sep 2024, at 10:33, Dan Kortschak wrote: (…) > I'm still having difficulties with the second part which is to update > the contents of the amend array in the JSON. > > So far I'm able to append the relevant details to the append array, but > I'm unable to correctly select the corrects elements from the $6 > argument, which is in the form > [{"start":,"end":,"data":}, ...]. The first > update statement gives me broadly what I want, but includes elements of > the array that it shouldn't. (…) > If I filter on the start and end time, I end up with no element coming > through at all and the "replace" field ends up null. > > update > events > set > datastr = jsonb_set( > datastr, > '{amend}', > datastr->'amend' || jsonb_build_object( > 'time', $2::TEXT, > 'msg', $3::TEXT, > 'replace', ( > select * > from > jsonb($6::TEXT) as replacement > where > (replacement->>'start')::TIMESTAMP WITH TIME ZONE < endtime and > (replacement->>'end')::TIMESTAMP WITH TIME ZONE > starttime > ) > ) > ) > where > starttime < $5 and > endtime > $4 and > bucketrow = ( > select rowid from buckets where id = $1 > ); That’s because the replacement data is an array of objects, not a single object. You need to iterate through the array elements to build your replacement data, something like what I do here with a select (because that’s way easier to play around with): with dollar6 as ( select jsonb($$[ { "data": { "foo": 1, "bar": 2 }, "end": "2023-06-12T19:54:51Z", "start": "2023-06-12T19:54:39Z" } ]$$::text) replacement ) select * from dollar6 cross join lateral jsonb_array_elements(replacement) r where (r->>'start')::timestamptz <= current_timestamp; There are probably other ways to attack this problem, this is the one I came up with. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Column type modification in big tables
> On 15 Aug 2024, at 14:15, Lok P wrote: (…) > Hello Greg, > > In terms of testing on sample data and extrapolating, as i picked the avg > partition sizeof the table (which is ~20GB) and i created a non partitioned > table with exactly same columns and populated with similar data and also > created same set of indexes on it and the underlying hardware is exactly same > as its on production. I am seeing it's taking ~5minutes to alter all the four > columns on this table. So we have ~90 partitions in production with data in > them and the other few are future partitions and are blank. (Note- I executed > the alter with "work_mem=4GB, maintenance_work_mem=30gb, > max_parallel_worker_per_gather=8, max_parallel_maintenance_worker =16" ) > > So considering the above figures , can i safely assume it will take > ~90*5minutes= ~7.5hours in production and thus that many hours of downtime > needed for this alter OR do we need to consider any other factors or activity > here? Are all those partitions critical, or only a relative few? If that’s the case, you could: 1) detach the non-critical partitions 2) take the system down for maintenance 3) update the critical partitions 4) take the system up again 5) update the non-critical partitions 6) re-attach the non-critical partitions That could shave a significant amount of time off your down-time. I would script the detach and re-attach processes first, to save some extra. Admittedly, I haven’t actually tried that procedure, but I see no reason why it wouldn’t work. Apart perhaps, from inserts happening that should have gone to some of those detached partitions. Maybe those could be sent to a ‘default’ partition that gets detached at step 7, after which you can insert+select those from the default into the appropriate partitions? But you were going to test that first anyway, obviously. Alban Hertroys -- There is always an exception to always.
Re: Insert works but fails for merge
> On 10 Aug 2024, at 22:23, yudhi s wrote: > On Sat, Aug 10, 2024 at 8:22 PM Adrian Klaver > wrote: > > MERGE INTO tab1 AS target > > USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123, > > '2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z')) AS > > source(id, mid,txn_timestamp, cre_ts) > > ON target.id <http://target.id> = source.id <http://source.id> > > WHEN MATCHED THEN > > UPDATE SET mid = source.mid > > WHEN NOT MATCHED THEN > > INSERT (id, mid, txn_timestamp, cre_ts) > > VALUES (source.id <http://source.id>,source.mid, > > source.txn_timestamp, source.cre_ts); > > Actually , as per the business logic , we need to merge on a column which is > not unique or having any unique index on it. Then how is the database supposed to determine which of those duplicate rows it should update? In the best case, it would update all of the duplicates with the same values, which usually is not what you want. > It's the leading column of a composite unique key though. Which could be unique of itself, I suppose that isn’t the case here? In that case, IMHO your best course of action is to do something about those duplicates first. > And in such scenarios the "INSERT ON CONFLICT" will give an error. So we > are opting for a merge statement here, which will work fine with the column > being having duplicate values in it. I’m not so sure about that claim… At least on MSSQL, MERGE has this requirement: "A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times.”. I’ve seen that as an error message on occasion. The MERGE documentation for PostgreSQL says this: "You should ensure that the join produces at most one candidate change row for each target row.”, which also seems to imply that you shouldn’t have duplicates. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Column type modification in big tables
> On 8 Aug 2024, at 20:38, Lok P wrote: > > Thank you so much. > > Can anybody suggest any other possible way here. As, we also need to have the > existing values be updated to the new column value here using update command > (even if it will update one partition at a time). And as I see we have almost > all the values in the column not null, which means it will update almost > ~5billion rows across all the partitions. So my question is , is there any > parameter(like work_mem,maintenance_work_mem etc) which we can set to make > this update faster? > or any other way to get this column altered apart from this method? Just a wild thought here that I’m currently not able to check… Can you add views as partitions? They would be read-only of course, but that would allow you to cast the columns in your original partitions to the new format, while you can add any new partitions in the new format. I suspect it’s not allowed, but perhaps worth a try. Alban Hertroys -- There is always an exception to always.
Re: Destination Table - Condition Amount 0
> On 8 Aug 2024, at 20:07, Anthony Apollis wrote: > > The same code bring in values for FY24, 23 etc. Dont understand why FY25's > values are 0. If you use the same code for FY24, then either there’s some filter being applied somewhere that excludes FY25 (and probably beyond), or something in your data changed. My bet is on the latter. For example, in FY25 the value of NCD_EXCL."Order Quantity" is 'NCD Valid FY25’, which doesn’t match your LIKE expression. Even something like a trailing space to the value could be enough. Alban Hertroys -- There is always an exception to always.
Re: Trigger usecase
> On 30 Jul 2024, at 17:16, sud wrote: > > Hello, > > We have a streaming application (using apache flink and kafka) which > populates data in the tables of a postgres database version 15.4. > > Now while loading transactions data we also get some reference data > information from source (for example customer information) and for these , we > dont want to modify or override the existing customer data but want to keep > the old data with a flag as inactive and the new record should get inserted > with flag as active. So for such use case , should we cater this inside the > apache flink application code or should we handle this using trigger on the > table level which will execute on each INSERT and execute this logic? > > I understand trigger is difficult to debug and monitor stuff. But here in > this case , team mates is saying , we shouldn't put such code logic into a > streaming application code so should rather handle through trigger. Is your data consistent if this operation doesn’t happen correctly? Is it okay to have no, or multiple, records where the flag is active for the same application transaction? The benefit of doing this in a trigger is that the operations happen in a single database transaction, guaranteeing that there is only ever a single row that has the active flag set for every application transaction. There are other ways to guarantee that, using exclusion constraints (which you should probably have on this table anyway), which would allow to handle such in the application. Such constraints can raise exceptions in your code, that need handling. So I say, at least put an exclusion constraint on that table if you didn’t already, and then decide what approach suits you best. Alban Hertroys -- Als je de draak wilt steken met iemand, dan helpt het, als die een punthoofd heeft.
Re: Memory issues with PostgreSQL 15
> On 25 Jul 2024, at 12:58, Christian Schröder > wrote: > > Hi all, > I started this discussion in May and was then dragged into other topics, so I > could never follow up. Sorry for that! > Since then, the problem has resurfaced from time to time. Right now, we seem > to have issues again, which gives me the opportunity to follow up on your > various suggestions. > > The current error messages are similar to what we have seen before: > > <2024-07-25 12:27:38 CEST - > LOG: could not fork autovacuum worker process: > Cannot allocate memory > <2024-07-25 12:27:38 CEST - mailprocessor> ERROR: could not resize shared > memory segment "/PostgreSQL.1226901392" to 189280 bytes: No space left on > device We sometimes encounter a similar issue, but with disk space - on a 1TB virtual disk of which usually only about 1/4th is in use. Our hypothesis is that sometimes some long-running transactions need to process a lot of data and put so much of it in temporary tables that they fill up the remaining space. We’ve seen the disk space climb and hit the ’No space left on device’ mark - at which point the transactions get aborted and rolled back, putting us back at the 1/4th of space in use situation. Have you been able to catch your shared memory shortage in the act? I suspect that the stats you showed in your message were those after rollback. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Finding error in long input file
> On 10 Jul 2024, at 06:58, Adrian Klaver wrote: > > On 7/9/24 17:46, Craig McIlwee wrote: >> Full error message from earlier in the thread: >> > psql:scripts/insert-addrs.sql:488: ERROR: syntax error at or near ";" >> > LINE 488: ...2832,1,default,'85250 Red House Rd','Paisley','OR','97636'); >> > ^ >>The error: >>LINE 488: ...2832,1,default,'85250 Red House Rd','Paisley','OR','97636') >>is giving you the line number and the data: >>a) Navigate to that line number using whatever method Joe has for that. >>b) Search for '85250 Red House Rd'. >> The input file is 488 lines (presumably, since Rich said the file should >> insert 488 rows). It seems like too much of a coincidence that the last >> character of the last line is really the error. My guess is that there > > This assumes that there where only INSERT lines and that each INSERT was only > one line. I have bit by those assumptions before, hence my suggestion to > actually find line 488. > >> is an unmatched character, perhaps a parenthesis, that is throwing off the >> parser because it doesn't expect the statement to terminate yet. Maybe that >> unmatched char really is on the last line, but '85250 Red House Rd' doesn't >> seem like the issue. I don't know anything about the joe editor, but I'd >> hope that any decent editor with syntax highlighting would make it apparent >> where things went awry. >> Craig > Is this a single INSERT statement with multiple tuples after VALUES? Then perhaps an earlier line (my bet would be on line 487) accidentally ends with a semi-colon instead of a comma? Something like this: INSERT INTO table (col1, col2, ..., coln) VALUES (..., ..., ), (..., ..., ), (..., ..., ); -- <-- This terminates the INSERT (..., ..., ); -- <-- Now this line make no sense Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Transaction issue
> On 19 Jun 2024, at 19:56, Rich Shepard wrote: > > I now insert rows using a transaction. Sometimes psql halts with an error: > ERROR: current transaction is aborted, commands ignored until end of > transaction block The error prior to those statements is what you need to look at. That’s what’s causing the transaction to fail. > I issue a rollback; command but cannot continue processing. What is the > appropriate way to respond to that error after fixing the syntax error? I get the impression that you’re executing shell scripts that run the psql command-line utility. That’s a great way to execute known-to-be-good sequences of SQL statements, but in case of errors it can be difficult to debug (although PostgreSQL is quite concise about it’s errors). If a rollback isn’t done from the same psql session, then you’re performing it from a different transaction - a different session even. It won’t affect the failed transaction from the original session, which would have rolled back automatically when that session closed. Instead, I’d suggest to run those statements from within psql, using \i to import your SQL file. Comment out any COMMIT statements in the SQL, add (named) SAVEPOINTs where you’re unsure of the results so that you can roll back to those specific points in the transaction, so that you can figure out where the problem originates. Alternatively, it may help to split your SQL file into chunks that you can run in sequence. Unfortunately, there’s no mode in psql that allows you to import an SQL file and step through the statements one by one. That would be helpful in your case I think. But maybe someone on the list has ideas about that? Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Reset sequence to current maximum value of rows
> company_nbr | company_name > -+- > 1 | Markowitz Herbold PC > 2 | Markowitz Herbold PC > 3 | Markowitz Herbold PC > 4 | Markowitz Herbold PC > 5 | Markowitz Herbold PC > 6 | Markowitz Herbold PC > 7 | Markowitz Herbold PC > 8 | Markowitz Herbold PC > 9 | Markowitz Herbold PC > 10 | Markowitz Herbold PC > 11 | Markowitz Herbold PC > 12 | Markowitz Herbold PC > 13 | Markowitz Herbold PC > 14 | Markowitz Herbold PC > 15 | Markowitz Herbold PC > 16 | Markowitz Herbold PC > 17 | Markowitz Herbold PC > 18 | Markowitz Herbold PC > 19 | Markowitz Herbold PC > 20 | Markowitz Herbold PC > 22 | Markowitz Herbold PC > 23 | Markowitz Herbold PC > --More-- Did those rows contain these values in some earlier transaction in your data-entry process perhaps? I’m thinking that perhaps you overwrote them in a later transaction with the correct values for the names, but forgot to commit that transaction? It’s either that, or you did run an UPDATE statement against those rows without specifying a WHERE-clause, as others already suggested as a likely cause. I think we can rule out the possibility of index corruption (a very rare occurrence, usually caused by factors external to PG) for your case. A data-set this limited would most likely result in an execution plan using a sequential scan instead of an index scan (an EXPLAIN ANALYZE of above select statement would show proof). > It might be quicker for me to restore the entire database from that backup > and then insert all new table rows since I have saved all the scripts. If you end up in the same situation again after doing that, then you know at least it’s repeatable and can analyse how you got there. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Rules and Command Status - update/insert/delete rule with series of commands in action
> On 31 May 2024, at 00:34, johnlu...@hotmail.com wrote: > > On 5/30/24 4:56 PM, David G. Johnston wrote: (…) >> If anything is done it would have to be new syntax. >> >> > A much bigger task surely. > > On 5/30/24 5:19 PM, Adrian Klaver wrote: >> >> 2) Use INSTEAD OF triggers: >> >> > > Unfortunately the same functionality as in my example with the RULE is not > supported for triggers on views : from the manual > INSTEAD OF triggers may only be defined on views, and only at row level; > > A RULE is essentially a statement-level operation which is what I need for > this particular case. A row-level trigger would not work because it > cannot "see" the query causing it to be fired, and also , (most > importantly) is not fired at all if no rows match the original query, > whereas a RULE is always in effect regardless of which rows are involved. > before. I should add that the RULE I showed in my example is not the > only RULE being used on this view -there are other conditional RULEs, > and the combined effect is of being able to change the effect of the original > statement into a set of new statements, one of which does what is needed. > > And if you are now inclined to say "well,maybe the application itself is > poorly written and should be changed" - I would have to agree,but that > is not mine to change. > > But I suppose that my next question, given what you both say about the RULE > system being a dead-end, is whether there is any likelihood of supporting an > INSTEAD OF trigger on a view at statement level? Maybe that stands more > chance of going somewhere? What you’re attempting to do boils down to adding a virtualisation layer over the database. Several middleware products exist that provide data virtualisation, products that are accessed as a database (or as a web service, or both) that pass on queries to connected systems. The virtualisation layer rewrites those queries between the data sources and the user-visible virtual database connection and between generalised SQL and native dialects and languages. If existing products support your particular use-case though, namely rewriting operational data-storage queries to data-source specific DML statements and then report the correct number of affected rows back, I don’t know. However, an important reason that PG rules are deprecated (as I understand it) is that it is very hard to get right for generated columns, which are operations with side-effects (such as incrementing a sequence value, for example) that are included in those queries rewritten by the specified rules. I doubt that a data virtualisation layer would be able to solve that particular problem. Nevertheless, considering what path you’re on, they may be worth looking at. I don’t think there are any open-source initiatives (unfortunately), they’re all commercial products AFAIK, and not cheap. With a suitable use-case they can be rather valuable tools too though. Regards, Alban Hertroys -- Als je de draak wilt steken met iemand, dan helpt het, als die een punthoofd heeft.
Re: Finding "most recent" using daterange
> On 22 May 2024, at 09:58, Rob Foehl wrote: > > Coming back to PostgreSQL after a (decades-)long absence... If I have > something like: > > CREATE TABLE example ( > id integer NOT NULL, > value text NOT NULL, > dates daterange NOT NULL > ); > > INSERT INTO example VALUES > (1, 'a', '[2010-01-01,2020-01-01)'), > (1, 'b', '[2010-01-01,)'), > (1, 'c', '[,2021-01-01)'), > (2, 'd', '[2010-01-01,2021-01-01)'), > (2, 'e', '[2015-01-01,2020-01-01)'), > (3, 'f', '[2014-01-01,2016-01-01)'), > (3, 'g', '[2013-01-01,)'), > (3, 'h', '[2012-01-01,)'), > (3, 'i', '[2013-01-01,2017-01-01)'), > (4, 'j', '[2010-01-01,2015-01-01)'); > > and I want to find the "most recent" value out of each group, meaning > that having the greatest upper bound followed by the greatest lower > bound, what I've managed to come up with thus far is: Sounds like a good candidate for using EXISTS to prove that no more recent value exists for a given id: SELECT e.id, e.value, e.dates FROM example AS e WHERE NOT EXISTS ( SELECT 1 FROM example AS i WHERE i.id = e.id AND (coalesce(upper(i.dates), 'infinity') > coalesce(upper(e.dates), 'infinity') OR (coalesce(upper(i.dates), 'infinity') = coalesce(upper(e.dates), 'infinity') AND coalesce(lower(i.dates), '-infinity') > coalesce(lower(e.dates), '-infinity')) ) ); id | value | dates +---+- 1 | b | [2010-01-01,) 2 | d | [2010-01-01,2021-01-01) 3 | g | [2013-01-01,) 4 | j | [2010-01-01,2015-01-01) (4 rows) Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Updating 457 rows in a table
> On 19 May 2024, at 20:37, Rich Shepard wrote: > > On Sun, 19 May 2024, Christophe Pettus wrote: > >> Of course, you can probably also shorten the query to: >> >> UPDATE people SET active=true WHERE ... >> >> Where ... is the predicate you would have used in the SELECT id WHERE ... > > Ah, yes. Hadn't thought of that. The statement would be > UPDATE people SET active=true WHERE email is not null; That aside, while you’re not absolutely 100% definitely sure that an UPDATE or DELETE statement is going to do exactly what you intended, and for good measure if you are, wrapping such statements in a transaction allows you to ROLLBACK to get back to the state that you started from. So: => BEGIN; => UPDATE people SET active=true WHERE email is not null; (497 rows affected) If that does indeed read “497 rows affected”: => COMMIT; But if that doesn’t read 497, instead of COMMITting the transaction, you now have the opportunity to investigate what other rows changed that shouldn’t have and how to change your predicates - and then simply type: => ROLLBACK; Don’t forget to start a new transaction again for the next attempt. In PostgreSQL this also works for almost all DDL statements (CREATE TABLE, DROP TABLE, TRUNCATE TABLE, etc.), which is one of the features about this database that I really appreciate - some big names don’t have that. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: could not open file "global/pg_filenode.map": Operation not permitted
On Fri, 22 Mar 2024 at 15:01, Nick Renders wrote: > > We now have a second machine with this issue: it is an Intel Mac mini > running macOS Sonoma (14.4) and PostgreSQL 16.2. > This one only has a single Data directory, so there are no multiple > instances running. > I don't think that having a single Data directory prevents multiple instances from running. That's more of a matter of how often pg_ctl was called with the start command for that particular data directory. > I installed Postgres yesterday and restored a copy from our live database > in the Data directory. How did you restore that copy? Was that a file-based copy perhaps? Your files may have incorrect owners or permissions in that case. > The Postgres process started up without problems, but after 40 minutes it > started throwing the same errors in the log: > > 2024-03-21 11:49:27.410 CET [1655] FATAL: could not open file > "global/pg_filenode.map": Operation not permitted > 2024-03-21 11:49:46.955 CET [1760] FATAL: could not open file > "global/pg_filenode.map": Operation not permitted > 2024-03-21 11:50:07.398 CET [965] LOG: could not open file > "postmaster.pid": Operation not permitted; continuing anyway > It's possible that some other process put a lock on these files. Spotlight perhaps? Or TimeMachine? > I stopped and started the process, and it continued working again until > around 21:20, when the issue popped up again. I wasn't doing anything on > the machine at that time, so I have no idea what might have triggered it. > > Is there perhaps some feature that I can enable that logs which processes > use these 2 files? > IIRC, MacOS comes shipped with the lsof command, which will tell you which processes have a given file open. See man lsof. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Inconsistent results in timestamp/interval comparison
On Mon, 4 Mar 2024 at 13:46, Francisco Olarte wrote: > On Mon, 4 Mar 2024 at 13:10, wrote: > > According to the documentation, Table 9.31, IMHO both comparisons should > > produce the same results, as > > > timestamp - timestamp → interval > > timestamp + interval → timestamp > Your problem may be due to interval comparison. > > Intervals are composed of months, days and seconds, as not every month > has 30 days and not every day has 86400 seconds, so to compare them > you have to normalize them somehow, which can lead to bizarre results. > > => select '2 years'::interval > '1 year 362 days'::interval; > ?column? > -- > f > (1 row) > > => select '2 years'::interval > '1 year 359 days'::interval; > ?column? > -- > t > (1 row) > > => select '2 years'::interval > '1 year 360 days'::interval; > ?column? > -- > f > (1 row) > > => select '2 years'::interval = '1 year 360 days'::interval; > ?column? > -- > t > (1 row) > > If you want to do point in time arithmetic, you will be better of by > extracting epoch from your timestamps and substracting that. Intervals > are more for calendar arithmetic on the type "set me a date two > months, three days and four hours from the last". > > Francisco Olarte. > To elaborate, justify_interval(t) shows how the length of the interval ends up when there is no timestamp to base the end of the interval on: => with testtab(t1) as ( select cast(v as timestamp with time zone) from (values ('2022-02-27 11:46:33'), ('2022-03-11 23:39:17'), ('2022-03-21 17:49:02')) x(v) ) select now(), t1, now() - t1 "now()-t1", justify_interval(now() -t1) from testtab; now | t1 |now()-t1 | justify_interval --++-+--- 2024-03-04 13:00:31.00386+00 | 2022-02-27 11:46:33+00 | 736 days 01:13:58.00386 | 2 years 16 days 01:13:58.00386 2024-03-04 13:00:31.00386+00 | 2022-03-11 23:39:17+00 | 723 days 13:21:14.00386 | 2 years 3 days 13:21:14.00386 2024-03-04 13:00:31.00386+00 | 2022-03-21 17:49:02+00 | 713 days 19:11:29.00386 | 1 year 11 mons 23 days 19:11:29.00386 (3 rows) -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: How to do faster DML
> On 3 Feb 2024, at 13:20, Lok P wrote: > > Hello All, > A non partitioned table having ~4.8 billion rows in it and having data size > as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has got > approx ~1billion+ duplicate rows inserted in it and we want to get the > duplicate data removed for this table and create a PK/unique constraint back > so as to not have the duplicate values in future. We are struggling to do the > same. ~4.8 billion rows of which ~1 billion are duplicates… Wait a minute… Did you verify that your ID column is larger than 32-bits? Because if that’s a 32 bit integer, the range of values it can hold is about 4.3 billion, after which it wraps around. With ~4.8 billion rows that would result in about ~0.5 billion repeated ID values, giving you the reported ~1 billion duplicate ID's. If that’s the case, your duplicates obviously aren’t really duplicates and you require a different type of solution. > Teammates suggested doing this using CTAS method, i.e. create a new table > with the unique record set and then drop the main table. Something as below > > create table TAB1_New > as > SELECT * from TAB1 A > where ID in > (select min(ID) from TAB1 > group by ID having count(ID)>=1 ); > > But for the above to work faster , they mentioned to have an index created on > the column using which the duplicate check will be performed i.e ID column. > So, creating the index itself took ~2hrs+ and the index size now shows as > ~116GB. > > Create index idx1 on TAB1(ID) Are your duplicates exact duplicates? Or is there an order of preference among them? And if so, what really makes those rows unique? That matters for solutions on how to deduplicate these rows. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Time zone offset in to_char()
> On 11 Jan 2024, at 18:27, Adrian Klaver wrote: > > On 1/11/24 08:48, Adrian Klaver wrote: >> On 1/11/24 08:04, Alban Hertroijs wrote: > >>> The drawback, as mentioned, being that we need to maintain those functions >>> in each deployment, which is a bit of a hassle (albeit a minor one) because >>> we need to customise both the TDV side and the PostgreSQL side in that >>> case. Our preferred solution would be to just add a few entries to the TDV >>> database-specific capabilities file (as described in my initial message) >> Are you referring to?: >> "It currently have this: >> ToDatetimeOffsetNL(~any) : ciscache.ToDatetimeOffsetNL($1) >> ToDatetimeOffset(~any,~any) : ciscache.ToDatetimeOffset($1, $2) >> " > > It finally dawned on me, you want to replace the user defined functions above > with Postgres builtins only. Try as I might I could not come with that > solution. Exactly. I was having the same problem of finding a solution, quite to my surprise. >> I thought the issue there was maintaining the two Postgres functions? Yup, those two functions in fact. There will be at least 3 separate deployments, while maintenance of the database(-schema) contents is the responsibility of the 3rd party application (TDV). PG is used as a caching DB here, we therefore intend to treat the data in it as volatile; it shouldn’t hurt if we decide to recreate the caches from scratch from source data. Having custom code in there not under control of the 3rd party application breaks that guideline. If they’re necessary, then so be it, but I can’t shake the feeling that we can achieve this without custom code in the database. Regards, Alban Hertroys -- There is always an exception to always.
Re: Time zone offset in to_char()
> On 11 Jan 2024, at 17:43, Adrian Klaver wrote: > > On 1/11/24 07:06, Alban Hertroijs wrote: >> Hi all, > >> In the above, I worked around the issue using a couple of user-defined >> functions in PG. That should give a reasonable idea of the desired >> functionality, but it's not an ideal solution to my problem: >> 1). The first function has as a drawback that it changes the time zone for >> the entire transaction (not sufficiently isolated to my tastes), while >> 2). The second function has the benefit that it doesn't leak the time zone >> change, but has as drawback that the time zone is now hardcoded into the >> function definition, while > > I don't think the set_config and SET are acting the way you think they are: > > set_config(https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-SET) > > " > set_config ( setting_name text, new_value text, is_local boolean ) → text > > Sets the parameter setting_name to new_value, and returns that value. If > is_local is true, the new value will only apply during the current > transaction. If you want the new value to apply for the rest of the current > session, use false instead. This function corresponds to the SQL command SET. > > set_config('log_statement_stats', 'off', false) → off" > " I tried this like so: select ToDatetimeOffset(current_timestamp, 'Europe/Amsterdam'), to_char(current_timestamp, ‘-MM-DD HH24:MI:SS.SU0 TZH:TZM’). The result of the second call was based on time zone ‘Europe/Amsterdam’, where it wasn’t when called outside the transaction (when it was based on UTC corresponding to the server time zone). So the time zone set with set_config(…, …, true) appeared to leak out of function scope and applied to transaction scope (as described in the quoted text). For brevity I could run that query tomorrow when I’m back at work. > SET(https://www.postgresql.org/docs/current/sql-set.html) > > "If SET (or equivalently SET SESSION) is issued within a transaction that is > later aborted, the effects of the SET command disappear when the transaction > is rolled back. Once the surrounding transaction is committed, the effects > will persist until the end of the session, unless overridden by another SET. > > The effects of SET LOCAL last only till the end of the current transaction, > whether committed or not. A special case is SET followed by SET LOCAL within > a single transaction: the SET LOCAL value will be seen until the end of the > transaction, but afterwards (if the transaction is committed) the SET value > will take effect. It says transaction again here. > The effects of SET or SET LOCAL are also canceled by rolling back to a > savepoint that is earlier than the command. > > If SET LOCAL is used within a function that has a SET option for the same > variable (see CREATE FUNCTION), the effects of the SET LOCAL command > disappear at function exit; that is, the value in effect when the function > was called is restored anyway. This allows SET LOCAL to be used for dynamic > or repeated changes of a parameter within a function, while still having the > convenience of using the SET option to save and restore the caller's value. > However, a regular SET command overrides any surrounding function's SET > option; its effects will persist unless rolled back. > " I didn’t succeed in calling SET LOCAL TIMEZONE from within the function. Could be I missed something, then Google (stackoverflow) pointed me to set_config(). I did manage to apply it to the second function header, which I think behaves such that the time zone change stays within function scope. Right now I’m not 100% sure that I verified that. More to check tomorrow. Frankly, I do hope that you’re right here, that would make my work easier. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Issue in compiling postgres on latest macOS 14.1.1
> On 13 Nov 2023, at 18:00, Tom Lane wrote: (…) > * If you use MacPorts or Homebrew, maybe that's out of date? > Try removing the associated directories from your PATH to see > if it works better. Perhaps even worse; you had old binaries from an Intel architecture that were migrated onto a new ARM-based architecture? In that case the Homebrew uninstall scripts won’t even work anymore - at least not w/o Rosetta 2 - as they’re Intel-based too. A migration assistant can also work too well, I found. Alban Hertroys -- There is always an exception to always.
Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones
> On 4 Oct 2023, at 21:30, Lincoln Swaine-Moore wrote: > > > What I do in such cases is to add an extra column with the UTC timestamp to > > serve as a linear scale to the local timestamps. That also helps with > > ordering buckets in reports and such during DST changes (especially the > > ones where an hour repeats). > > > For hours and quarter hours I found it to be fairly convenient to base a > > view on a join between a date calendar and an (quarter of an) hour per UTC > > day table, but materialising that with some indexes may perform better (at > > the cost of disk space). I do materialise that currently, but our database > > server doesn’t have a lot of memory so I’m often not hitting the cache and > > performance suffers a bit (infrastructure is about to change for the better > > though). > > That's an interesting idea, but I'm not sure I fully understand. Assuming > you're aggregating data: what do you group by? For instance, at an hourly > resolution, if you group by both the UTC timestamp and the local one, you > might end up, say, dividing an hour-long bucket in two for time zones with > half-hour-based offsets, no? > > Thanks for the detailed writeup! Definitely helpful to learn more about what > people are using in production to handle this sort of thing. Frankly, I haven’t had to deal with half-hour-based offsets since I got this idea. I’m using it with whole-hour-offsets, where it doesn’t affect bin boundaries. I suppose you could enrich your data in a similar fashion by adding a (virtual) column with the (client) time zone offset, so you could group by local timestamp + offset. That’s not going to match index expressions though, I fear… For sorting, UTC timestamps would probably still be a useful addition, but they’re simple to add by either converting back from the local timestamps or by taking the min and max of the UTC-based column on the above grouping. Both solutions require that offset, obviously. Now of course there are only 2 hours a year where this happens. Our data scientists chose to ignore the problem for simplicity’s sake and be slightly off with their numbers on those dates. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones
> On 4 Oct 2023, at 17:58, Lincoln Swaine-Moore wrote: > > > SELECT > > sub.gs AS ts_in_utc > > ,sub.gs AT TIME ZONE 'America/New_York' AS ts_in_local_tz > > ,date_bin('1 days', sub.gs AT TIME ZONE 'America/New_York', > > '2023-01-01') > > FROM (SELECT generate_series('2023-11-03 00:00:00Z'::timestamptz, > > '2023-11-07 00:00:00Z'::timestamptz, '5 minutes'::interval) AS gs) AS sub > > WHERE > > sub.gs >= '2023-11-05 00:00:00 America/New_York'::timestamptz AND > > sub.gs < '2023-11-06 00:00:00 America/New_York'::timestamptz > > I believe this query will be funky around DST borders, because `sub.gs AT > TIME ZONE 'America/New_York'` will be localized in a way that erases the > difference between hours with different offsets, which are genuinely > different. For instance, I ran this and there are two rows within it that > look like: > > ` 2023-11-05 05:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00` > and > ` 2023-11-05 06:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00` > > I think that the non-unique second column will pose an issue for the date > binning at a resolution finer than 1 day. What I do in such cases is to add an extra column with the UTC timestamp to serve as a linear scale to the local timestamps. That also helps with ordering buckets in reports and such during DST changes (especially the ones where an hour repeats). Filtering in the queries occurs on the UTC scale, with the local timestamps calculated back to UTC, so that it doesn’t matter whether the local time has 23, 24, 25 or 24.5 or 23.5 or whatever number of hours on a date-range - it all maps back because UTC always has 24 hours. Something that I also do is to create calendar tables and views for the buckets, with 2 timestamps per bucket: the start of the bucket and the start of the next bucket. That gives you a range to put actual timestamps between (not BETWEEN between, because that’s inclusive). You can store and index that, as opposed to generated results using generate_series - basically I materialise those. For hours and quarter hours I found it to be fairly convenient to base a view on a join between a date calendar and an (quarter of an) hour per UTC day table, but materialising that with some indexes may perform better (at the cost of disk space). I do materialise that currently, but our database server doesn’t have a lot of memory so I’m often not hitting the cache and performance suffers a bit (infrastructure is about to change for the better though). Regards, Alban Hertroys -- There is always an exception to always.
Re: Making Sure Primary and Secondary Keys Alligns
> On 11 Sep 2023, at 16:09, Anthony Apollis wrote: > > Fact Table: > CREATE TABLE IF NOT EXISTS system."IMETA_ZTRB_MP$F_ZTBR_TA_BW" > ( (…) > ) > and Dimension:CREATE TABLE IF NOT EXISTS dim."IMETA_Region_Mapping" > ( (…) > ) > How do i get that all these columns that are joined are aligned, meaning if > it starts with 1 in one column it must be 1 in the other columns. Or how > would you assign unique keys in Postgres? Are you perhaps asking how to define FOREIGN KEY CONSTRAINTs? https://www.postgresql.org/docs/15/ddl-constraints.html#DDL-CONSTRAINTS-FK Regards, Alban Hertroys -- There is always an exception to always.
Re: Even if I have a lot of free space PgSQL returns "ERROR: could not extend file - No space left on device"
> On 8 Sep 2023, at 13:25, Nature Conservation Geovista Space > wrote: > > Dear Pg-users, > I am coming back to Postgres/PostGIS after a few years. I am dealing with a > big database with a lot of geometries and too many vertices. So a lot of geometry going on then… That seems important in this case. > After hours running a query to Subdivide, I get this Postgres error > 2023-09-08 02:11:23.745 BST [328594] postgres@database ERROR: could not > extend file "base/16388/7985375.1020": No space left on device > 2023-09-08 02:11:23.745 BST [328594] postgres@database HINT: Check free disk > space. > 2023-09-08 02:11:23.745 BST [328594] postgres@database STATEMENT: CREATE > TABLE _gaul_administrative_subdivided100 AS ( > SELECT *, st_subdivide(geom,100) AS geom_subdivided100 > FROM gaul_administrative > ); That _looks_ like a query that could blow through space rather quickly. How large is gaul_administrative in GB? In rows? How many subdivisions does this generate per row on average? How many bytes are those subdivisions on average? Multiply those numbers, and you get a fair indication of how much space that table requires. Does that fit in 1.1TB? Frankly, I don’t see the point of repeating the geom column in that table after dividing it up, aren’t you just wasting space there? The original is still available in the source table, after all. And what about the other columns that you clone into this new table? My suggestion would be to minimise that with an operation like the above. This would be one of those situations where I’d introduce a (bigint) surrogate key and use that to link the two tables together, even though I’m a natural key person. Something like this (I had to make a few assumptions about your source table): CREATE TABLE _gaul_administrative_subdivided100 AS ( id bigint NOT NULL, geomid bigserial NOT NULL, geom_subdivided100 geometry NOT NULL ); -- If you care about referential integrity here ALTER TABLE _gaul_administrative_subdivided100 ADD FOREIGN KEY (id) REFERENCES gaul_administrative (id) ON DELETE CASCADE ON UPDATE RESTRICT; INSERT INTO _gaul_administrative_subdivided100 (id, geom_subdivided100) SELECT id, st_subdivide(geom,100) FROM gaul_administrative; (…) > It seems that it is not a problem of space. I wouldn’t be too sure of that, but I don’t have the numbers. > Command df -h returns: > Filesystem Size Used Avail Use% Mounted on > tmpfs 6.3G 1.1M 6.3G 1% /run > /dev/sda1.3T 164G 1.1T 14% / > tmpfs32G 3.2M 32G 1% /dev/shm > tmpfs 5.0M 0 5.0M 0% /run/lock > tmpfs 6.3G 4.0K 6.3G 1% /run/user/1000 Is this after the error and after PG finished rolling back? What does this show while that query is going on? If gaul_administrative takes up a large part of those 164G, then you probably don’t have enough space for a 10x multiplication in size from the original table to the new table. And that happening looks entirely possible from the information you provided. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: ident auth does not works as usual
> On 26 Aug 2023, at 11:31, pan snowave wrote: (…) > pg_indent.conf > > test rootcce If that is indeed the name of the file, that would explain your problem. No doubt that it should be named pg_ident.conf instead, without the ’n'. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: How to improve the performance of my SQL query?
> On 29 Jul 2023, at 10:59, Peter J. Holzer wrote: > > On 2023-07-26 15:46:16 +0800, gzh wrote: >> SET enable_seqscan TO off; > [...] >>-> Parallel Bitmap Heap Scan on tbl_sha >> (cost=92112.45..2663789.14 rows=800650 width=18) (actual >> time=260.540..21442.169 rows=804500 loops=3) >> Recheck Cond: (ms_cd = 'MLD009'::bpchar) >> Rows Removed by Index Recheck: 49 >> Filter: (etrys = '0001'::bpchar) >> Rows Removed by Filter: 295500 >> Heap Blocks: exact=13788 lossy=10565 >> -> Bitmap Index Scan on index_search_04_mscd_cdate >> (cost=0.00..91632.06 rows=3402599 width=0) (actual time=249.718..249.718 >> rows=330 loops=1) >>Index Cond: (ms_cd = 'MLD009'::bpchar) > > So now it's using index_search_04_mscd_cdate which contains only ms_cd > (and - judging from the name, other fields not relevant to this query), > but it still doesn't use index_search_01 which would fit the query > exactly. I can understand that Postgres prefers a sequential scan over > an index scan (the number of matching rows is about 10% of the total > table size which is a lot), but why would it prefer a less specific > index to a more specific one? > > Can you get Postgres to use that index at all? > > Find a combination of ms_cd and etrys which doesn't cover millions of > rows and try that. > > Also try lowering random_page_cost. Wasn’t this an RDS server with just 4GB of memory? How large are those multi-column indices? Perhaps they don’t (all) fit into available cache memory and the server decided to use the one that it had cached? I’m frankly not at all certain how the server would behave around such resource shortage situations, but I suppose loading an uncached index into cache could get a higher cost than using a less optimal (costlier) index that’s already cached. Regarding lowering random_page_cost; If your index files are on SSD storage, lowering that sufficiently (to a realistic value) could then sufficiently lower the cost of loading that uncached index into memory, evicting the index it was using in above plan to make room (unless other active sessions are using it). Alban Hertroys -- There is always an exception to always.
Re: Nu-B here
> On 20 Jul 2023, at 02:36, Amn Ojee Uw wrote: > > After this command 'sudo -u postgres psql' > I get this message : > could not change directory to "/home/my_account": Permission denied What’s the reason that you’re using the OS user postgres? If you’re simply trying to connect to the database named postgres as database user postgres, you can instead use the command 'psql -U postgres postgres’. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: How To: A large [2D] matrix, 100,000+ rows/columns
> On 9 Jun 2023, at 04:17, Pat Trainor wrote: (…) > Imagine something akin to stocks, where you have a row for every stock, and a > column for every stock. Except where the same stock is the row & col, a > number is at each X-Y (row/column), and that is the big picture. I need to > have a very large matrix to maintain & query, and if not (1,600 column > limit), then how could such data be broken down to work? If your matrix contains values that are all of the same type, as matrices usually do, then a matrix can be described as the Carthesian product of rows and columns, with values connecting those. For rows and columns you could enumerate them using generate_series() or a pair of recursive CTEs, or you could put them into their own table. For the values (or cells), a tuple of (row, column, value) would be sufficient. Then in the end, the matrix would be a presentation of the left joins of the Carthesian product of rows and columns with your cell values. The left joins are to account for missing cell values (empty cells), or you could explicitly add tuples for those with an ‘empty’ value. For presentation, I would use something like Python Pandas and the xlsxwriter. Data-entry is going to be a bit of a pain if you cannot automate it, and it’s not going to be very space-efficient, but it does fit the relational model this way and it would be easy to expand the matrix in either direction. > By wanting postgresql as a solution, am I round-hole, square-pegging myself? I expect that there are solutions that were explicitly designed for handling (large) matrices and that those would perhaps perform better. > I don't mind keeping, say, a 1,500 column max per table, but then adding new > items (stocks in the above analogy) might make it difficult to keep track of > things... That’s also a possibility, but that sort of pushes the column lookups down to the catalog level and induces overhead on all other catalog lookups as well. It’s not a huge amount though. An alternative approach would be a table of rows (or columns) with each an array of values, especially since you don’t expect many updates. That would be a first attempt at optimisation if the pure relational model doesn’t work out. > Hoping someone has tackled this before, and performance isn't too big a > concern, as the data changes seldom. Not in practice, AFAIR, but I was thinking of a solution like this for small matrices (Sudoku’s, I hate the things, but I need some solutions to a few to help me with a number of test scripts). Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: "PANIC: could not open critical system index 2662" - twice
> On 14 Apr 2023, at 9:38, Evgeny Morozov wrote: (…) > I don't know whether ZFS zero-fills blocks on disk errors. As I > understood, ZFS should have been able to recover from disk errors (that > were "unrecoverable" at the hardware level) using the data on the other > two disks (which did not report any errors). Thus, PG should not have > seen any corrupted data (if ZFS was working correctly). > https://unix.stackexchange.com/questions/341614/understanding-the-error-reporting-of-zfs-on-linux > seems to confirm this. Am I misunderstanding something? Your problem coincides with a thread at freebsd-current with very similar data corruption after a recent OpenZFS import: blocks of all zeroes, but also missing files. So, perhaps these problems are related? Apparently, there was a recent fix for a data corruption issue with the block_cloning feature enabled, but people are still seeing corruption even when they never enabled that feature. I couldn’t really find the start of the thread in the archives, so this one kind of jumps into the middle of the thread at a relevant-looking point: https://lists.freebsd.org/archives/freebsd-current/2023-April/003446.html Regards, Alban Hertroys -- There is always an exception to always.
Re: [EXTERNAL]: Re: UPSERT in Postgres
> On 7 Apr 2023, at 2:49, Louis Tian wrote: (…) > I am not expecting an error here. The problem is with no conflict it always > go down the insert path first and results in a not null constraint error. > While I am expecting the insert is never executed in the first place when > that row already exist (as identified by it primary key). So the update > execute without error. > I hope the pesudo code above is enough to clarify the difference? Your assumption on what the problem is, is not correct. The problem is not with the conflict resolution, it is with your statement violating a not null constraint. It doesn’t matter whether you insert first or update first, either operation is going to violate that constraint. You’re specifying a NULL value for a column that doesn’t accept that because it has a NOT NULL constraint. That is your problem. Alban Hertroys -- There is always an exception to always.
Re: Using CTID system column as a "temporary" primary key
> On 29 Mar 2023, at 21:11, Sebastien Flaesch wrote: > > Oh the use of default keyword is new to me, thanks for that. > > But to make PostgreSQL more Informix-compatible, zero should have been > considered as well. …No, I’m not going to be humble about this opinion… Postgres does a sane thing here. It’s Informix that you should be complaining about. Zero is not a sane value to specify special behaviour, it could mean zero and be just as valid. By consequence, Informix probably forbids zero as a sequence value, but that is an artefact (and a limitation) of its implementation, not a feature. The core of your problem however, is that you’re trying to get database-agnostic behaviour by relying on database-specific features. That is not going to work, you’ve just been lucky enough to get away with it until now. There’s really only one realistic answer here: Fix your design. Regards, Alban Hertroys -- There is always an exception to always.
Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause
> On 7 Mar 2023, at 4:11, David G. Johnston wrote: > > On Mon, Mar 6, 2023 at 7:51 PM David Rowley wrote: > On Tue, 7 Mar 2023 at 12:40, Tom Lane wrote: > > > > Ben Clements writes: > > > As shown above, the following calculated column can bring in the city > > > name, > > > even though the city name isn't in the GROUP BY: > > >max(city) keep (dense_rank first order by population desc) > > > > You haven't really explained what this does, let alone why it can't > > be implemented with existing features such as FILTER and ORDER BY. > > (It wasn't clear to me until I watched the youtube video.) > > Likely KEEP is more flexible than just the given example but I think > that something similar to the example given could be done by inventing > a TOP() and BOTTOM() aggregate. Then you could write something like: > > select >country, >count(*), >max(population), >bottom(city, population) > from >cities > group by >country > having >count(*) > 1 > > the transfn for bottom() would need to remember the city and the > population for the highest yet seen value of the 2nd arg. > > BOTTOM() remembers the highest value? > > Where this wouldn't work would be if multiple columns were > required to tiebreak the sort. > > TOP(city, ROW(population, land_area)) ? What should be the expected behaviour on a tie though? Say that we count the number of districts or airfields or train stations per city and query for the one(s) with the most or least of them? There could well be multiple cities with the same max number, and there will be many cities with the same minimum number (namely 0). Should the result be just the first of the maximums (or minimums) through some selection criterium (such as their alphabetical order), should that give each of the tied results, or should there be a means to define that behaviour? I suppose a combination with FIRST and LAST could solve that issue? Regards, Alban Hertroys -- There is always an exception to always.
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
> On 3 Mar 2023, at 20:32, Thorsten Glaser wrote: > > On Fri, 3 Mar 2023, Alban Hertroys wrote: > >> You can rewrite that into something like this: >> >> select jsonb_build_object('opening_times’, >> obj >> ORDER BY >> obj->>'weekday’, >> obj->>'from_hour’, >> obj->>'to_hour') >> ) >> from cot >> cross join lateral jsonb_agg(jsonb_build_object( >> 'weekday', cot.weekday, >> 'from_hour', cot.from_hour, >> 'to_hour', cot.to_hour) obj > > But isn’t that the same as with a regular LEFT JOIN? Similar, but not the same, I’d say. I do now notice that I made some copying errors there, I was a bit nauseous at that time. That should have read: >> select jsonb_build_object('opening_times’, >> jsonb_agg(obj >> ORDER BY >> obj->>'weekday’, >> obj->>'from_hour’, >> obj->>'to_hour') >> ) >> from cot >> cross join lateral jsonb_build_object( >> 'weekday', cot.weekday, >> 'from_hour', cot.from_hour, >> 'to_hour', cot.to_hour) obj The lateral join applies the function to each row returned from the left side of the join and enriches that row with the function result. I used a cross join because there is no join condition to apply to the lateral, otherwise you could also use an inner join on true. I think you could also have used an implicit Cartesian product (using ‘,’ for the join), and that in that case the lateral would be implied. I prefer explicit notation though. A left join wouldn’t make much sense here, unless the function could return NULL - for example if it were a function marked as STRICT and some of the input parameter values (from the table) could be NULL. >>> cgwaj AS ( >>> SELECT cgwa.id AS id, jsonb_build_object( >>> 'weekday', cgwa.weekday, >>> 'forenoon', cgwa.forenoon, >>> 'afternoon', cgwa.afternoon, >>> 'evening', cgwa.evening) AS obj >>> FROM core_generalworkavailability cgwa > > plus There are some differences. You need a sub-select, which in turn creates its own result set. It’s up to the planner whether the left or the right side gets executed first, after which the results of the other side of the join get merged to this, or whether this can all be collected in one go. That’s up to the query planner to decide though, and it could be right. >>> LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id > > With the addition that I can aggregate… You can do so in both situations, but I guess that confusion stems from my copy/paste mistake. In my experience, lateral joins go well with the jsonb functions. They tend to reduce code repetition when referencing object members, such as in your case. Regards, Alban Hertroys -- There is always an exception to always.
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
> On 3 Mar 2023, at 0:02, Thorsten Glaser wrote: > > On Tue, 28 Feb 2023, Alban Hertroys wrote: > >> Perhaps you can use a lateral cross join to get the result of >> jsonb_build_object as a jsonb value to pass around? > > I don’t see how. (But then I’ve not yet worked with lateral JOINs.) You posted this bit: > jsonb_build_object('opening_times', > jsonb_agg(DISTINCT jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour) > ORDER BY > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'weekday', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'from_hour', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'to_hour') > ) You can rewrite that into something like this: select jsonb_build_object('opening_times’, obj ORDER BY obj->>'weekday’, obj->>'from_hour’, obj->>'to_hour') ) from cot cross join lateral jsonb_agg(jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour) obj That’s off the top of my head and I did leave out the DISTINCT. Just to show the concept here. A bit of experimenting and reading should get you there, I’m keeping $work waiting :P (…) > WITH >cgwaj AS ( > SELECT cgwa.id AS id, jsonb_build_object( > 'weekday', cgwa.weekday, > 'forenoon', cgwa.forenoon, > 'afternoon', cgwa.afternoon, > 'evening', cgwa.evening) AS obj > FROM core_generalworkavailability cgwa >), >-- … same for opening times > SELECT cp.email, …, > -- … > jsonb_build_object('possible_work_times', COALESCE( > jsonb_agg(DISTINCT cgwaj.obj ORDER BY cgwaj.obj->>'weekday', > cgwaj.obj->>'forenoon', cgwaj.obj->>'afternoon', > cgwaj.obj->>'evening') > FILTER (WHERE cgwaj.id IS NOT NULL))) || > -- … >FROM core_person cp > -- … > LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id > LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id > -- … > > That is, add a CTE for each m:n-attached table whose “value” is > an object, not a single field, keep the id field; LEFT JOIN that > (instead of the original table), then we have a field to use in > ORDER BY. > > I think. I’ve not yet tried it (I don’t have access to that DB > normally, I was just helping out). > > This avoids sub-SELECTs in the sense of needing to run one for > each user row, because the innermost JSON object building needs > to be done for each (connected (if the query is not filtering on > specific users)) row of the “property table”, anyway. (And even > if filtered, that can be passed down.) > > bye, > //mirabilos > -- > Solange man keine schmutzigen Tricks macht, und ich meine *wirklich* > schmutzige Tricks, wie bei einer doppelt verketteten Liste beide > Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz > hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r > Alban Hertroys -- There is always an exception to always.
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
> On 28 Feb 2023, at 3:54, Thorsten Glaser wrote: (…) >> Well, that may be what you want, but it's not what you wrote in >> the query. Follow David's advice and do > […] >> I'm pretty sure that this will only incur one evaluation of the >> common subexpression, so even though it's tedious to type it's not >> inefficient. > > Thanks. But I fear it’s not as simple as you wrote. More like: > > jsonb_build_object('opening_times', > jsonb_agg(DISTINCT jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour) > ORDER BY > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'weekday', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'from_hour', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'to_hour') > ) > > Isn’t that more like it? Perhaps you can use a lateral cross join to get the result of jsonb_build_object as a jsonb value to pass around? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: WHERE col = ANY($1) extended to 2 or more columns?
> On 9 Feb 2023, at 18:35, Dominique Devienne wrote: > > On Thu, Feb 9, 2023 at 5:37 PM David G. Johnston > wrote: > On Thu, Feb 9, 2023 at 9:28 AM Alban Hertroys wrote: > > On 9 Feb 2023, at 16:41, Dominique Devienne wrote: > > Now we'd like to do the same for composite keys, and I don't know how to do > > that. > > This works: > => select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1, > 'one'::text), (2, 'two'::text)); > But you cannot write the right-side of the IN as a single parameter which > seems to be the primary constraint trying to be conformed to. > > Right. The goal is to (re)use a prepared statement (i.e. plan once), and bind > the RHS (binary) array > and do a single exec (single round-trip) to get the matching rows. AFAIK, > this is the fastest way. > If there's a better/faster way, I'm interested. --DD How would an ORM like that push a list of tuples into a single query parameter though? Is that feasible? Perhaps this is easier to use with an ORM then? It would need a list of placeholders for each item, but I suspect you would need that anyway… => with v(col1, col2) as ( values (1, 'two'::text), (2, 'one'::text), (1, 'one'::text), (2, 'two'::text) ) select * from v where (col1, col2) = (1, 'one'::text); col1 | col2 --+-- 1 | one (1 row) This could be written as a join to a table with fixed values in the OP’s case. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: WHERE col = ANY($1) extended to 2 or more columns?
> On 9 Feb 2023, at 16:41, Dominique Devienne wrote: > > Hi. We are implementing an API which takes a list of row keys, and must > return info about those rows. To implement that efficiently, in as few > round-trips as possible, we bind a (binary) array of keys (ints, uuids, or > strings) and that works great, but only if the key is a scalar one. > > Now we'd like to do the same for composite keys, and I don't know how to do > that. > Is it possible? Could someone please help out or demo such a thing? > We are doing it in C++ using libpq, but a pure SQL or PL/pgSQL demo would > still help (I think). This works: => select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1, 'one'::text), (2, 'two'::text)); ?column? -- t (1 row) Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?
> On 16 Jan 2023, at 15:37, HECTOR INGERTO wrote: > > > The database relies on the data being consistent when it performs crash > > recovery. > > Imagine that a checkpoint is running while you take your snapshot. The > > checkpoint > > syncs a data file with a new row to disk. Then it writes a WAL record and > > updates > > the control file. Now imagine that the table with the new row is on a > > different > > file system, and your snapshot captures the WAL and the control file, but > > not > > the new row (it was still sitting in the kernel page cache when the > > snapshot was taken). > > You end up with a lost row. > > > > That is only one scenario. Many other ways of corruption can happen. > > Can we say then that the risk comes only from the possibility of a checkpoint > running inside the time gap between the non-simultaneous snapshots? I recently followed a course on distributed algorithms and recognised one of the patterns here. The problem boils down to a distributed snapshotting algorithm, where both ZFS filesystem processes each initiate their own snapshot independently. Without communicating with each other and with the database which messages (in this case traffic to and from the database to each FS) are part of their snapshots (sent or received), there are chances of lost messages, where either none of the process snapshots know that a 'message' was sent or none received it. Algorithms like Tarry, Lai-Yang or the Echo algorithm solve this by adding communication between those processes about messages in transit. Alban Hertroys -- There is always an exception to always.
Re: Regular expression for lower case to upper case.
> On 10 Dec 2022, at 12:00, Eagna wrote: > > > Hi, and thanks for your input. > > >> RegExp by itself cannot do this. You have to match all parts of the input >> into different capturing groups, then use lower() combined with format() to >> build a new string. Putting the capturing groups into an array is the most >> useful option. > > > OK - I *_kind_* of see what you're saying. > > There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd care to > give an outline of the solution that you propose. If you put all the regexes and their replacements into a table[1], you could use an aggregate over them to combine all the replacements into the final string. It would need some aggregate like regex_replace_agg, which would probably be a custom aggregate. [1]: If you stick to ASCII, you could just calculate them and even omit storing them in a physical table. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Stored procedure code no longer stored in v14 and v15, changed behaviour
> On 3 Dec 2022, at 20:55, Karsten Hilbert wrote: > >> You would need to wrap the function creation calls into some automation to >> generate and store those diffs, comparing it back, etc, but that may be >> doable. I would also generate new diffs right after major version updates of >> the database (a before and after of the output of pg_get_functiondef, >> applied to the stored diff?). > > I wonder whether that would tie the sanity check to a particular PG version. > > I mean, pg_get_functiondef output being a server runtime artifact it might > well change between server versions, no ? I meant to write: “I would also generate new diffs right _before and_ after…”, precisely for that reason. The before patch should get you the last ’sane’ situation to get back to the source code. Next, you can diff that to the newly tokenised version after the upgrade. It is a bit of a hassle, as you need to remember to do that before an upgrade, but at least you’d have something… Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Stored procedure code no longer stored in v14 and v15, changed behaviour
> On 2 Dec 2022, at 15:10, DAVID ROTH wrote: > > Is there a way to reverse engineer the original code (or its equivalent) from > what is saved in the database? I’m dumping an idea here… Treading back quite a bit with information from later in this thread. With the original procedure source code under version control and assuming the tokenization converts that source code consistently (since it’s done by a computer), you could store another level of diffs: From the source code you pushed, against the output of pg_get_functiondef. Assuming that changes to the tokenised code, when converted back to text, only involve renames of database objects, the result of a reversely applied diff could very well be comparable to the original source code. I suspect that would be sufficient for telling whether a developer is responsible for the changes, or that they were caused by renaming of database artefacts. You would need to wrap the function creation calls into some automation to generate and store those diffs, comparing it back, etc, but that may be doable. I would also generate new diffs right after major version updates of the database (a before and after of the output of pg_get_functiondef, applied to the stored diff?). I’m not so sure that would work for auditing, but that seems to have been tackled down-thread. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Duda sobre como imprimir un campo INTERVAL
> On 19 Nov 2022, at 4:58, Ken Tanzer wrote: > > On Thu, Nov 17, 2022 at 2:30 PM Alejandro Baeza Rangel > wrote: (…) > don't fully understand it. But what really confuses me is the example below. > How can these two intervals be equal and still yield different output in the > to_char function? And as a practical matter, and for the OPs question, how > can you convert from one to the other of these "equal" values? > > WITH inters AS ( > SELECT > '1 day 2 hours'::interval AS i1, > '26 hours'::interval AS i2 > ) > SELECT > *, > to_char(i1,'HH24:MM:SS') AS i1_char, > to_char(i2,'HH24:MM:SS') AS i2_char, > i1=i2 AS "Equal?" > FROM inters; > >i1 |i2| i1_char | i2_char | Equal? > +--+--+--+ > 1 day 02:00:00 | 26:00:00 | 02:00:00 | 26:00:00 | t > > Cheers, > Ken Those intervals are not identical. I think the reasoning is that due to DST changes, ‘1 day 2 hours’ is more specific than its conversion to ’26 hours’ (or 25 or 27 at DST change). And since you’re not converting the number of days in to_char, that information gets lost. That problem doesn’t seem to arise in the OP’s question (as far as I understand his question), he does have dates to base the intervals on. However, converting the differences in dates to intervals decouples the difference from the dates (the intervals could, for example, subsequently be added to an entirely different date) and he ends up in the same boat. It would seem that the way to do this is to convert the difference to (seconds since) epoch and do the math to convert that to a character string yourself. See for example: https://stackoverflow.com/questions/341384/how-to-convert-an-interval-like-1-day-013000-into-253000 That seems unnecessarily complicated, perhaps there is/could be a more convenient method? I’m sort of thinking of a "relative timestamp offset" type, that tracks an exact difference relative to a given timestamp? Alban Hertroys -- There is always an exception to always.
Re: pg_restore creates public schema?
> On 6 Oct 2022, at 22:52, Ron wrote: > > On 10/6/22 12:46, Christophe Pettus wrote: >>> On Oct 6, 2022, at 10:44, Ron wrote: >>> Sadly, that VM doesn't have nearly enough disk space to hold the backup >>> folder. >> Use file mode, and stream the output via scp/ssh to a different machine? > > I thought of that, too. Unfortunately, the ssh version in RHEL 8.6 is > sufficiently old that "three way" ssh (person at HostA wanting to transfer a > file from Server1 to Server2) requires that port 22 be open from Server1 to > Server2. Bit of a wild idea, I haven’t used SSH tunnels in years, so a bit fuzzy on the details: Can you create an SSH tunnel to the new machine from the VM, then pipe that to an SSH connection from a machine that does have enough space to dump? And then vice versa to the new machine to restore? (Unless access to that one is easier of course) Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Getting data from a record variable dynamically
> On 5 Jul 2022, at 1:02, Rhys A.D. Stewart wrote: > > Greetings All, > > I have a trigger that is attached to several different tables. In the > trigger function I retrieve a single row and I want the info from a > specific column. This column is dependent on the table in question. > and I have the column name stored in a variable as well. Without > writing a conditional for each table, what is the best way to > dynamically get the data from the record variable? I would create a simple trigger function for each of those tables that just extracts the value(s) from the field(s) you mentioned, and then pass those values on to a generic function that does the actual table-independent work. The usual trigger pseudo-columns and variables wouldn’t be available in that generic function, but considering that your triggers fire from different tables, you could either pass them on or they (probably) don’t make sense in the context of the generic function. Alban Hertroys -- There is always an exception to always.
Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?
> On 18 Jun 2022, at 2:14, Bryn Llewellyn wrote: > > I implemented two complementary functions: > > —"no_null_keys()" checks that a "jsonb" value has no occurrences of « "some > key": null » > > —"strip_null_keys()" removes « "some key": null » occurrences from a "jsonb" > value > > The code checks with "no_null_keys()" that, as expected, no ingested JSON > document has an occurrence of « "some key": null ». > > And it uses "strip_null_keys()" on the output of "to_jsonb()" — and, as > appropriate, any other built-in JSON function that produces a "jsonb" value. > > It was straightforward to implement these two functions by using REGEXP > built-in functionality on the canonically formatted "text" value produced by > the "jsonb::text" typecast. In my experience, using regular expressions applied to document formats tends to get you false positives. I’d be worried about books with titles similar to 'How we wrote a regular expression to detect occurrences of "some key": null in our JSON documents', for example. For stripping those null occurrences, you are aware of the json_strip_nulls(json) and jsonb_strip_nulls(jsonb) functions, right? For detecting them on a recent PG, the @? operator or json_path_exists(json, jsonpath) functions would probably do the trick. I am not too familiar with JSONPATH expressions, but I expect (it passed some preliminary testing) this would detect your nulls just fine, while taking JSON semantics into account: jsonb_path_exists(document, '$.** ? (@ == null)'::jsonpath) For PG-specifics on JSONPATH, see section 9.16.2 on: https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-OP-TABLE A recursive query is another possible solution. It would probably perform far worse, but I find them more rewarding to write. Some people prefer Sodoku. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Puzzline CROSS JOIN when doing RECURSIVE CTE
> On 18 Apr 2022, at 14:51, Pól Ua Laoínecháin wrote: > > Hi Alban, and many thanks for your input. > >> My first question is why you’re using a recursive CTE here? This doesn’t >> appear to be hierarchical data (such as a tree), unless perhaps you intended >> to actually traverse the HTML document hierarchy? > > This is basically an exercise on my part. > > The question that I'm trying to answer is here: > > https://stackoverflow.com/questions/70574881/how-can-get-html-inner-tag-in-posgresql > > I've already answered it in 3 different ways - but I was trying to do > it with RCTEs in order to improve my comprehension of them. > > So, basically, I want to pick out a subsection of text from a "passage". > > So then, I wanted to establish a true/false state for the lines that I > want and don't want, going through line by line. I know that the RCTE > is a very contrived way of doing this, but it's for learning really. Considering that you’re already looking at the elements of a parsed DOM tree, the exercise boils down to traversing that tree. Due to how xmlparse() is implemented, you probably already get them in the right order even when not using an explicit order by. That is, if you’re looking for a DFT (depth first traversal) as opposed to a BFT (breadth first). One of the difficulties here is that there are some CDATA sections involved with more XML in them. My guess is that that’s the data that you’re actually after, but that’s just a matter of entering the document with the correct path I suppose? > I wonder if you could be so kind as to give me a "skeleton" RCTE for > this - I've been staring at this for hours - and it's not that I'm > lazy or haven't studied RCTEs - I wrote this RCTE > > https://stackoverflow.com/a/71674990/470530 > > recently, so it's not as if I'm completely ignorant of RCTEs - I'm > just stuck in a rut. Any help would be appreciated. You would first need to determine the root node(s). Those are the ones w/o parents, or you may have some other way of determining those. Next is finding all nodes that have an earlier node as their parent. You could go an extra step here with preserving the order of the siblings in the document, by numbering nodes (directly) under the same parent. I usually build an ltree structure with that information, while traversing the tree - that gets you an ltree with entries (1, 1.1, 1.1.1, 1.1.2, 1.2.1, etc) that you then can use for the final order by, for example. In case you didn’t know, ltree is a module you can install. I find it still very useful in tree traversals. The one drawback I see is that for these scenario’s you’d ideally want an ltree based on integers, such that 10 sorts after 9 instead of between 1 and 2. Padding enough zeroes before the ltree text items is a bit of an extra hassle that I’d prefer to do without. I haven’t actually looked at what DOM navigation functions exist for PG, so this is more or less pseudo code. Worse, my local copy of PG was compiled w/o XML support, so I don’t know what kind of result the query from that SO article produces. But then again, I don’t really know what you’re after anyway, so... This is basically how I would go about it. with recursive -- First we need to get the DOM-tree parsed (this is not actually recursive) domtree as ( select node from xmlparse(document(‘...')) ), -- Next we can traverse it cte (node, hierarchy, n) as ( select node, 1::text::ltree, 1 from domtree where parent(node) is null union all select node, cte.hierarchy || (cte.n+1)::text::ltree, n+1 from domtree t join cte on parent(t.node) = cte.node ) select * from cte order by hierarchy; Function parent() is made-up. It would return the parent node of a node, so that there is some way to connect the different parts in the hierarchy. I guess xpath() could fulfil that purpose, but I have no way of testing that hypothesis. I hope that’s a good enough starting point for you? Alban Hertroys -- There is always an exception to always.
Re: Puzzline CROSS JOIN when doing RECURSIVE CTE
> On 18 Apr 2022, at 11:56, Pól Ua Laoínecháin wrote: (…) > All of the code below is available on the fiddle here: > > https://dbfiddle.uk/?rdbms=postgres_13&fiddle=0cc20c9081867131260e6e3550bd08ab (…) > OK, grand, now I wish to perform a RECURSIVE CTE on it. So, I start by > trying something (I thought was) very simple. Obviously, I plan to do > more, but I wanted to get the "mechanics" correct to start with. So, > my query is: > > WITH RECURSIVE cte1 (n, ln) AS > ( > SELECT 1 AS n, string > FROM line Here is your first problem, this will yield a result for each row in your line table, numbering it ‘1’. You seem to have expected just a single result here, but that is something that you need to take care of in your query. This part is called the base case, base step or initial step. > UNION ALL > SELECT n + 1, ln > FROM cte1 > WHERE n < (SELECT COUNT(*) FROM line) And then for each of those rows, it will add all those rows (from the same CTE!) again. This part is called the recursive step. You did add a termination condition here, which indeed manages to terminate, but it does so too late. It seems that you do understand some of the concepts of recursive CTE’s, but you appear to be missing some crucial knowledge. For example, it is actually possible to query multiple trees with a single recursive CTE. It is not limited to a single tree. How many trees the CTE will navigate depends on how you selected the rows in the base case. > ) > SELECT * FROM cte1; > > i.e. have a counter variable and a string from the line table My first question is why you’re using a recursive CTE here? This doesn’t appear to be hierarchical data (such as a tree), unless perhaps you intended to actually traverse the HTML document hierarchy? > > But, then to my horror, the result of this query is > > 1with t(x) as (values( XMLPARSE(DOCUMENT > (' AlarmCode="mail" AlarmStartTime="10:00:00" AlarmTime="0" Id ="2" >> Id="2">
Re: historical log of data records
> On 16 Nov 2021, at 10:20, Laurenz Albe wrote: > > On Tue, 2021-11-16 at 13:56 +0530, Sanjay Minni wrote: >> I need to keep a copy of old data as the rows are changed. >> >> For a general RDBMS I could think of keeping all the data in the same table >> with a flag >> to indicate older copies of updated / deleted rows or keep a parallel table >> and copy >> these rows into the parallel data under program / trigger control. Each has >> its pros and cons. >> >> In Postgres would i have to follow the same methods or are there any >> features / packages available ? > > Yes, I would use one of these methods. > > The only feature I can think of that may help is partitioning: if you have > one partition > for the current data and one for the deleted data, then updating the flag > would > automatically move the row between partitions, so you don't need a trigger. Are you building (something like) a data-vault? If so, keep in mind that you will have a row for every update, not just a single deleted row. Enriching the data can be really useful in such cases. For a data-vault at a previous employer, we determined how to treat new rows by comparing a (md5) hash of the new and old rows, adding the hash and a validity interval to the stored rows. Historic data went to a separate table for each respective current table. The current tables “inherited” the PK’s from the tables on the source systems (this was a data-warehouse DB). Obviously that same PK can not be applied to the historic tables where there _will_ be duplicates, although they should be at non-overlapping validity intervals. Alternatively, since this is time-series data, it would probably be a good idea to store that in a way optimised for that. TimescaleDB comes to mind, or arrays as per Pavel’s suggestion at https://stackoverflow.com/questions/68440130/time-series-data-on-postgresql. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Determining if a table really changed in a trigger
> On 26 Oct 2021, at 16:16, Marcos Pegoraro wrote: > > >> Don’t use this approach with JSON (as opposed to JSONB) type fields though, >> a single extra space in the JSON structure would already lead to a >> difference, as would other formatting differences. >> > I don´t think two equal values being converted to json will be different in > any way. If row_to_json of both are different, I suppose both record really > are different, no ? For row_to_json, as it’s the system that combines the fields in a row into a JSON structure and it probably would do that in the same way each time. The OP however has a field of type JSON in their table, and that can contain the same information between the OLD and NEW fields formatted in a slightly different way. For example: => with x as ( select '{ "x": 1, "y": 2 }'::json union all select '{ "y": 2, "x": 1 }'::json ) select row(x.json)::text, md5(row(x.json)::text) from x; row | md5 +-- ("{ ""x"": 1, ""y"": 2 }") | 84df40e8660dcf371d89dbf5d6a61c3d ("{ ""y"": 2, ""x"": 1 }") | abd6db88c2526be6ea97570aeec7e020 (2 rows) Whereas: => with x as ( select '{ "x": 1, "y": 2 }'::jsonb union all select '{ "y": 2, "x": 1 }'::jsonb ) select row(x.jsonb)::text, md5(row(x.jsonb)::text) from x; row| md5 --+-- ("{""x"": 1, ""y"": 2}") | d5a6dbdec7a5bfe0dc99e090db30322e ("{""x"": 1, ""y"": 2}") | d5a6dbdec7a5bfe0dc99e090db30322e (2 rows) Alban Hertroys -- There is always an exception to always.
Re: Determining if a table really changed in a trigger
> On 26 Oct 2021, at 9:05, Mitar wrote: > > Hi! > > I have a trigger like: > > CREATE TRIGGER update_trigger AFTER UPDATE ON my_table REFERENCING NEW > TABLE AS new_table OLD TABLE AS old_table FOR EACH STATEMENT EXECUTE > FUNCTION trigger_function; > > I would like to test inside trigger_function if the table really > changed. I have tried to do: > > PERFORM * FROM ((TABLE old_table EXCEPT TABLE new_table) UNION ALL > (TABLE new_table EXCEPT TABLE old_table)) AS differences LIMIT 1; > IF FOUND THEN > ... changed ... > END IF; > > But this fails if the table contains a JSON field with the error: > > could not identify an equality operator for type json Perhaps if you store an extra column containing a hash (for example MD5) of the row contents (minus the hash column, obviously)? You can put an index on the hash and match between OLD and NEW tables which ones changed. When calculating the hash, you would have to specify the column names to exclude the hash itself, so something like this: md5(row(col1, col2, col3)::text) The row-to-text conversion already takes care of converting JSONB(!) to text. Don’t use this approach with JSON (as opposed to JSONB) type fields though, a single extra space in the JSON structure would already lead to a difference, as would other formatting differences. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Where is the tsrange() function documented?
> On 19 Oct 2021, at 7:11, Bryn Llewellyn wrote: > > By the way, I was surprised when I tried this: > > with c as ( > select > '2000-01-01'::timestamp as t1, > '2000-01-10'::timestamp as t2, > '2000-01-20'::timestamp as t3) > select > ( tsrange(t1, t3, '[)') && tsrange(t2, t2, '[)') )::text as "range result", > ((t1, t3)overlaps (t2, t2) )::text as "overlaps > result" > from c; > > and got this: > > range result | overlaps result > --+- > false| true > > I can't find anything, neither on the page in question here on Range Types > nor in the doc on the overlaps operator, about the semantics for when a > duration collapses to an instant. Am I missing this too? Your mistake is in how you defined an instant as a range: with c as ( select '2000-01-01'::timestamp as t1, '2000-01-10'::timestamp as t2, '2000-01-20'::timestamp as t3) select tsrange(t2, t2, '[)') as empty , tsrange(t2, t2, '[]') as instant , ( tsrange(t1, t3, '[)') && tsrange(t2, t2, '[)') )::text as "empty range result" , ( tsrange(t1, t3, '[)') && tsrange(t2, t2, '[]') )::text as "instant range result" from c; empty |instant| empty range result | instant range result ---+---++-- empty | ["2000-01-10 00:00:00","2000-01-10 00:00:00"] | false | true (1 row) As I read it, an empty range is not considered to overlap anything, regardless of ‘when' it was defined; it gets normalised to ‘empty’. See also the examples in section 8.17.5 for the difference between an empty range and a single point range (what you call an instant). Regards, Alban Hertroys -- There is always an exception to always.
Re: Growth planning
> On 4 Oct 2021, at 18:22, Israel Brewster wrote: (…) > the script owner is taking about wanting to process and pull in “all the > historical data we have access to”, which would go back several years, not to > mention the probable desire to keep things running into the foreseeable > future. (…) > - The largest SELECT workflow currently is a script that pulls all available > data for ONE channel of each station (currently, I suspect that will change > to all channels in the near future), and runs some post-processing machine > learning algorithms on it. This script (written in R, if that makes a > difference) currently takes around half an hour to run, and is run once every > four hours. I would estimate about 50% of the run time is data retrieval and > the rest doing its own thing. I am only responsible for integrating this > script with the database, what it does with the data (and therefore how long > that takes, as well as what data is needed), is up to my colleague. I have > this script running on the same machine as the DB to minimize data transfer > times. I suspect that a large portion of time is spent on downloading this data to the R script, would it help to rewrite it in PL/R and do (part of) the ML calculations at the DB side? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: SELECT FOR UPDATE returns zero rows with CTE
> On 17 Sep 2021, at 8:32, Roman Guryanov wrote: > > Hello, could you check my problem. > Why does SELECT FOR UPDATE return 0 rows in the scenario below? (execution in > transaction) > > If delete 'FOR UPDATE', 1 row returned > > Test case: > DROP TABLE IF EXISTS > t1; > > CREATE TABLE t1 (_pk serial, t1c1 integer, t1c2 integer > , t1c3 text); > > insert into t1 (t1c1, t1c2, t1c3) values(123456789, 100, 'string_value_1' > ); (…cut everything related to unused t2…) > > WITH > cte1 > AS > ( > UPDATE > t1 > SET t1c3 = 'string_value_1' > WHERE t1c1 = 123456789 > returning t1c1, t1c2 > ), > cte2 > AS > ( > SELECT * FROM > t1 > WHERE t1c1 = 123456789 > AND t1c2 = (SELECT t1c2 FROM cte1) > FOR UPDATE > ) > > SELECT * FROM cte2; Most likely the outer select returns 0 rows because you locked the rows you expected in cte2 and didn’t perform an update on those locked rows yet. I suspect your intention for this query is to first lock the rows, then update them and then select them, but instead you start with updating them, then lock those rows after the fact and then you try to select those locked rows. Also, selecting the updated rows by t1c2 in cte2 seems rather risky, as that is a rather different selection criterium than you use for the actual update. It’s okay for this single-row example, but if you had a table full of data, you would now have locked all rows with the value t1c2 = 100 for update. If that update never happens (or the locking doesn’t get rolled back), well… Regards, Alban Hertroys -- There is always an exception to always.
Re: Help with writing a generate_series(tsmultirange, interval)
> On 1 Aug 2021, at 3:30, Tom Lane wrote: > > =?utf-8?Q?Fran=C3=A7ois_Beausoleil?= writes: >> While mowing the lawn, I thought that since the syntax of multi ranges is >> similar to arrays, maybe I could use unnest(), but sadly, that was not to be >> the case: >> # select >> unnest('{[2021-08-02,2021-08-04],[2021-08-07,2021-08-09)}'::tsmultirange); >> ERROR: function unnest(tsmultirange) does not exist > > That's fixed for beta3: > > regression=# select > unnest('{[2021-08-02,2021-08-04],[2021-08-07,2021-08-09)}'::tsmultirange); >unnest > --- > ["2021-08-02 00:00:00","2021-08-04 00:00:00"] > ["2021-08-07 00:00:00","2021-08-09 00:00:00") > (2 rows) > > > regards, tom lane If what you need is behaving similar to arrays, perhaps arrays of ranges suit your problem? development=> select unnest(array['[2021-08-02,2021-08-04]'::tsrange,'[2021-08-07,2021-08-09)'::tsrange]); unnest --- ["2021-08-02 00:00:00","2021-08-04 00:00:00"] ["2021-08-07 00:00:00","2021-08-09 00:00:00") (2 rows) The drawback of that approach is probably with the operators and functions you have to your avail. It seems to me though that several of those not available for arrays could be emulated using array functions such as array_position(…) for the contains operator, unnest with tsrange functions for others, etc. Another approach could be to store the “rules” of the schedule and generate the relevant portion of the multirange as a set of tsrange rows on-the-fly. That may well perform better than storing the entire range in a table of tsrange records. I’ve done something like that for a hierarchical query on versioned items where I had to base how to slice through the hierarchy on a reference timestamp. That performed adequately on a production data warehouse, as long as you sufficiently constrained the inputs. You can join such a function (laterally) to some other data set too. Regards, Alban Hertroys -- There is always an exception to always.
Re: Recursive Parent-Child Function Bottom Up
> On 26 Jul 2021, at 17:52, Alban Hertroys wrote: > Something like this: > > with recursive foo (id, parent, children_ids) as ( > select id, parent, null::text > from tree t >where not exists ( > select 1 from tree c where c.parent = t.id >) > union all > select t.id, t.parent > , f.id || case f.children_ids when '' then '' else ',’ end || > f.children_ids > from foo f > join tree t on f.parent = t.id >where f.parent <> 0 > ; Almost, the null::text in the initial select should of course be '’ in your case, and a unicode quote slipped into the last string of that case statement. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Recursive Parent-Child Function Bottom Up
> On 26 Jul 2021, at 17:19, Avi Weinberg wrote: > > Hi, > > I would like to populate the children_ids column with all the ids of the > children recursively (+ grandchildren etc.) > If I do it top-bottom I will end up doing extra work since there is no need > to go all levels down if I can just compute my IMMEDIATE children > "children_ids" and just concatenate all their lists. (…) > create table tree(id int primary key, parent int, children_ids text); > insert into tree (id, parent) values > (273, 0), > (274, 273), > (275, 273), > (277, 273), > (278, 277), > (280, 275), > (281, 280), > (282, 281), > (283, 282), > (284, 282), > (285, 282), > (286, 282), > (287, 282), > (288, 282), > (289, 282), > (290, 281), > (291, 290), > (292, 290), > (293, 290), > (294, 290), > (295, 290); First you need to figure out what your starting set of nodes is, and since you’re going to go bottom-up, those are your leaf nodes. Without any indicators for that though, you’ll have to determine that from a sub-query. Something like this: with recursive foo (id, parent, children_ids) as ( select id, parent, null::text from tree t where not exists ( select 1 from tree c where c.parent = t.id ) union all select t.id, t.parent , f.id || case f.children_ids when '' then '' else ',’ end || f.children_ids from foo f join tree t on f.parent = t.id where f.parent <> 0 ; Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Getting "ERROR: upper bound of FOR loop cannot be null" when trying to use a stored proc in a check constraint def.
> On 17 Jul 2021, at 0:26, David Gauthier wrote: (…) > dvdb=# select validate_proj_csv('sabin,strix2,ipu1.0'); > validate_proj_csv > --- > 1 > (1 row) > > dvdb=# select validate_proj_csv('sabin,strix2,ipu1.00'); > NOTICE: Project "ipu1.00" in project csv "sabin,strix2,ipu1.00" is not a > valid project. > validate_proj_csv > --- > 0 > (1 row) > > > But when I try to use it in a check constraint > > dvdb=# alter table projects add constraint validate_sibling_project_csv check > (validate_proj_csv(sibling_project_csv) = 0); > ERROR: upper bound of FOR loop cannot be null > CONTEXT: PL/pgSQL function validate_proj_csv(character varying) line 14 at > FOR with integer loop variable > > What's going on ? > How to get this to work ? As people advised you previously, you would probably be better off normalising your table. For example, add a table for the links: create table project_sibling ( project text not null references public.projects(project) on update cascade on delete cascade , sibling text not null references public.projects(project) on update cascade on delete cascade , primary key (project, sibling) ); -- Populate it from public.projects initially insert into project_sibling(project, sibling) select p.project, s.sibling from public.projects p cross join lateral regex_split_to_table(project_csv, ',') s(sibling) ; I had to make a few guesses there, as I don’t have your original table structure, but that’s the gist of it. If that project_csv column gets populated by some external application, you could keep the link-table updated with insert/update/delete triggers. Alternatively, a writable view replacing public.projects may be a possibility. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: On partitioning, PKs and FKs
On 2021-07-08 13:30, Ron wrote:> Thus, the bigTable PK must be on id, columnX, (No, I don't like it > either.)That's not entirely true. You can keep the PK on id if you additionally create a unique constraint on (id, columnX).That way, you can at least be certain that the uniqueness of the PK remains in-tact, even if that is a surrogate key that has nothing to do with the uniqueness of the actual records.Regards,Alban Hertroys.Alban Hertroys Apollo Tyres (NL) B.V. Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The Netherlands +31 (0)53 4888 776 alban.hertr...@apollotyres.com Chamber of Commerce number: 34223268 CIN: L25111KL1972PLC002449 Registered Office: Apollo Tyres Ltd, 3rd Floor, Areekal Mansion, Panampilly Nagar, Kochi 682036, India Disclaimer: The information contained in this e-mail is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or action in relation to the contents of this information is strictly prohibited and may be unlawful and request you to delete this message and any attachments and advise the sender by return e-mail. The confidentiality of this message is not warranted. Apollo Tyres and its subsidiaries rule out any and every liability resulting from this or any other electronic transmiss
Re: Overlapping timestamptz ranges with priority
> On 28 Jun 2021, at 0:41, Ray O'Donnell wrote: > > Hi all, > (…) > create table bookings ( >booking_id bigint not null, >booking_time tstzrange not null, > >constraint bookings_pk primary key (booking_id) > ); > > insert into bookings (booking_id, booking_time) values > (1, tstzrange('2021-06-20 12:00+01', '2021-06-20 14:00+01', '[)')), > (2, tstzrange('2021-06-20 13:00+01', '2021-06-20 16:00+01', '[)')); > > > And what I'd like to be able to do is pull out the following: > > > booking_id |slot_time > +- > 1 | ["2021-06-20 12:00:00+01","2021-06-20 14:00:00+01") > 2 | ["2021-06-20 14:00:00+01","2021-06-20 16:00:00+01") You could probably achieve this by using window function lag() over (order by booking_id), in combination with a case statement when the range from the previous row overlaps the current range. That would only solve the case for immediately subsequent rows though, if you have multiple rows overlapping you will need to track the first range in that list. Another possible route is a recursive CTE, with a similar approach. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Is there a way to replace select * fields in result ?
> On 12 Jun 2021, at 10:00, Condor wrote: > > select qw.*, case whem length(qw.mm) > 0 THEN COALESCE(SUBSTRING(qw.mm, 1, > 1), '') ELSE qw.mm END AS qw.mm A little off-topic, but isn’t that a roundabout way of writing just this?: select qw.*, coalesce(substring(qw.mm, 1, 1), '') as mm Or even: select qw.*, coalesce(left(qw.mm, 1), '') as mm Regards, Alban Hertroys -- There is always an exception to always.
Re: index unique
> On 8 Jun 2021, at 22:50, Thomas Kellerer wrote: > > Marc Millas schrieb am 03.06.2021 um 22:51: >> on a table we need a primary key and to get a unique combinaison, we need 3 >> columns of that table: >> 1 of type integer, >> 1 of type text, >> 1 of type geometry >> > > How do you define the "uniqueness" of the geometry? That is actually the big question here. Multiple “unique” geometries can specify the same geometry! A geom as simple as a line from (0,0) - (1,0) can just as easily be specified as (1,0) - (0,0). That’s the simplest case, and one could argue that the point of origin is different, but the next example would be a triangle starting at the same origin but traversed in different directions. It gets harder the more vertices a polygon has. I would argue that a geometry type is ill-suited as a primary key column candidate. Now, of course, the OP could have a case where their geometries are guaranteed to be unique regardless, but they’d better make sure before adding them to the PK. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
WARNING: oldest xmin is far in the past
Good day, We have a PG 11.11 instance here that serves as a data-warehouse for us. This morning I was investigating an issue with our ETL's and discovered this error in the logs, that keeps repeating: 2021-05-28 15:01:54.094 CEST [20164] WARNING: oldest xmin is far in the past 2021-05-28 15:01:54.094 CEST [20164] HINT: Close open transactions soon to avoid wraparound problems. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. We don't have any idle in transaction sessions, but we do have a replication slot that turns out to have been inactive for an unknown while. The current situation around our xid's is this: avbv=# select slot_name, slot_type, database, active, catalog_xmin, restart_lsn, confirmed_flush_lsn from pg_replication_slots ; slot_name | slot_type | database | active | catalog_xmin | restart_lsn | confirmed_flush_lsn ---+---+--++--+---+- debezium_prod | logical | avbv | t | 616648922 | 1166/C45B5140 | 1167/65C7AA0 (1 row) avbv=# select datname, datfrozenxid from pg_database ; datname | datfrozenxid ---+-- postgres | 610128180 speeltuin | 610128180 template1 | 610128180 template0 | 591773830 reportinfo | 610128180 avbv_20190314 | 610128180 avbv | 610128180 ensfocus-tst | 610128180 ensfocus | 610128180 ensfocuswf8 | 610128180 portal_prd | 610128180 portal_tst | 610128180 (12 rows) Clearly, the gap between the higher frozen xid's (610128180) and the replication slots xmin (616648922 ) is rather small; a mere 650k xid's apart. We have that single logical replication slot that Debezium subscribes to, to push committed records for some tables to Kafka. Those are tables that get frequent inserts, a batch of new records arrives about every 15 minutes, 24/7. As mentioned, initially when I detected this problem, the Debezium connector (the subscriber) had failed to attach. Restarting it fixed that (that's a known issue that was recently discovered in the current version 1.4.0). I had hopes the xmin issue would be gone once it caught up, but it did catch up earlier today and the issue remains... I did already take several actions in attempts to solve the issue, so far to little avail: * I restarted the database, closing any idle in transaction sessions that might have gone unnoticed otherwise * I ran vacuum -a -U postgres, which printed a number of repetitions of the same error message on the console * I ran vacuum -a -F -U postgres * I added a heartbeat interval of 1ms (10s) to the Debezium connector, although I didn't think that was necessary Should I just wait for the replication slot xmin to increase into a safe area? It is slowly increasing, while the frozen xid's have remained the same while monitoring this issue. Or is there some action I should take? For the record: avbv=# select version(); version -- PostgreSQL 11.11 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit (1 row) Regards, Alban Hertroys P.S. Sorry about below company disclaimer, there is nothing I can do about that. Alban Hertroys D: 8776 |M: |T: +31 (0)53 4888 888 | E: alban.hertr...@apollotyres.com Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The Netherlands Chamber of Commerce number: 34223268 The information contained in this e-mail is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or action in relation to the contents of this information is strictly prohibited and may be unlawful and request you to delete this message and any attachments and advise the sender by return e-mail. The confidentiality of this message is not warranted. Apollo Vredestein and its subsidiaries rule out any and every liability resulting from this or any other electronic transmission. Please consider the environment before printing this e-mail CIN: L25111KL1972PLC002449 Registered Office: Apollo Tyres Ltd, 3rd Floor, Areekal Mansion, Panampilly Nagar, Kochi 682036, India Disclaimer: The information contained in this e-mail is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or action in relation to the contents of this information is strictly prohibited and may be unlawful and request you to delete this message and any attachments and advise the send
Re: How to get CASE statement to recognize null ?
> On 10 Mar 2021, at 21:00, David Gauthier wrote: > > Hey, Thanks Tom ! > > It's actually a little more complicated than the simple example, so I'm not > sure it can be shoehorned into coalesce... > > CASE >WHEN sr.nightly_cl_display_suffix is null THEN cast (d.p4_changelist > as varchar) > ELSE >cast (d.p4_changelist as varchar)||'-'||sr.nightly_cl_display_suffix > END as changelist Sure it can: cast(d.p4_changelist as varchar) || coalesce('-'||sr.nightly_cl_display_suffix, '') > On Wed, Mar 10, 2021 at 2:46 PM Tom Lane wrote: > David Gauthier writes: > > dvdb=# select > > CASE col1 > > WHEN null THEN 'z' > > ELSE col1 > >END as col1, > >col2 > > from foo; > > This test is equivalent to "col1 = null" which will always fail. > You could try something like > > CASE WHEN col1 IS NULL THEN ... ELSE ... END > > Although I think the particular thing you're doing here would > be better solved with COALESCE(col1, 'z'). > > regards, tom lane Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
How to check for existence of nested JSONB property key?
Hello all, We have a table with material properties stored as JSONB objects, as the available properties differ wildly between different kinds of materials. It's semi-structured data. I am in a situation where I want to filter our materials based on the existence of a certain property, regardless its value. A query for such a property with a specific value is easy, that even uses a GIST index we defined on that table. For example: select obj -> 'top' -> 'next' -> 'key' as key_value from (values('{"top":{"next":{"key":3}}}'::jsonb)) x(obj) where obj @> '{"top":{"next":{"key":3}}}'; key_value --- 3 (1 row) I'd like to do the same for an "incomplete" search object, where the "value" is an object instead of a literal: select obj -> 'top' -> 'next' -> 'key' as key_value from (values('{"top":{"next":{"key":3}}}'::jsonb)) x(obj) where obj @> '{"top":{"next":"key"}}'; key_value --- (0 rows) I was hoping for the same result as above. Apparently the jsonb contains operator requires an exact match? Is there an approach that can do this, preferably making use of the GIST index? Mind that the nesting depth of the search object can vary, like the material properties vary, I'm looking for a generic approach here. We're on PG 11(.9), so the enhancements made to JSONB operations in PG12 are not (yet) available to us. This is a 3TB database w/o replica's, so upgrading it is a little scary and requires a maintenance window that would be sufficient to restore a backup in case things go wrong. Regards, Alban Hertroys. Alban Hertroys D: 8776 |M: |T: +31 (0)53 4888 888 | E: alban.hertr...@apollotyres.com Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The Netherlands Chamber of Commerce number: 34223268 The information contained in this e-mail is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or action in relation to the contents of this information is strictly prohibited and may be unlawful and request you to delete this message and any attachments and advise the sender by return e-mail. The confidentiality of this message is not warranted. Apollo Vredestein and its subsidiaries rule out any and every liability resulting from this or any other electronic transmission. Please consider the environment before printing this e-mail CIN: L25111KL1972PLC002449 Registered Office: Apollo Tyres Ltd, 3rd Floor, Areekal Mansion, Panampilly Nagar, Kochi 682036, India Disclaimer: The information contained in this e-mail is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or action in relation to the contents of this information is strictly prohibited and may be unlawful and request you to delete this message and any attachments and advise the sender by return e-mail. The confidentiality of this message is not warranted. Apollo Tyres and its subsidiaries rule out any and every liability resulting from this or any other electronic transmiss
Re: How to keep format of views source code as entered?
> On 12 Jan 2021, at 20:54, Alex Williams wrote: > > Hi Ingolf, > > For comments in views, I create a unused CTE and put my comments there, e.g. > > WITH v_comments AS ( > SELECT 'this is my comment' AS comment > ) > > Alex You do know about COMMENT ON VIEW v_comments IS ’this is my comment’, right? > ‐‐‐ Original Message ‐‐‐ > On Thursday, January 7, 2021 11:19 AM, Markhof, Ingolf > wrote: > >> Hi! >> >> >> >> Switching from Oracle SLQ to PostgreSQL I am facing the issue that the SQL >> code the system returns when I open a views source code is different from >> the code I entered. The code is formatted differently, comments are gone and >> e.g. all text constants got an explicit cast to ::text added. (see sample >> below). >> >> >> >> I want the SLQ code of my views stored as I entered it. Is there any way to >> achieve this? Or will I be forced to maintain my views SQL code outside of >> PostgreSQL views? >> >> >> >> Any hints welcome! >> >> >> >> Here is an example: >> >> >> >> I enter this code to define a simple view: >> >> >> >> create or replace view myview as >> >> select >> >> product_id, >> >> product_acronym >> >> from >> >> products -- my comment here >> >> where >> >> product_acronym = 'ABC' >> >> ; >> >> >> >> However, when I open the view my SQL client (DBeaver) again, this is what I >> get: >> >> >> >> CREATE OR REPLACE VIEW myview >> >> AS SELECT product_id, >> >> product_acronym >> >>FROM products >> >> WHERE product_acronym = 'ABC'::text; >> >> >> >> So, the formatting changed, keywords are capitalized, the comment I added in >> the from-part has gone and the text constant 'ABC' changed to 'ABC'::text. >> >> >> >> >> >> >> >> >> >> >> Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - >> Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - >> Vorsitzender des Aufsichtsrats: Francesco de Maio >> > Alban Hertroys -- There is always an exception to always.
Re: Define hash partition for certain column values
> On 12 Jan 2021, at 16:51, Голубева Яна wrote: > > Values for the key partitioning column are generated randomly and I can't > predict their distribution between ranges. > If I just create some ranges I won't have any guarantee that partitions will > have similar amount of data. It is possible that I will have 2 or 3 extremely > big partitions and a bit of data in others. A hash of a random number is also random, so when using hashes for partitioning you will get the same problem. If you want to distribute values equally over a fixed number of partitions, I suggest you partition on a modulo of a monotonously increasing number (a sequence for example), instead of relying on a random number. > 12.01.2021, 17:55, "Michael Lewis" : > On Tue, Jan 12, 2021 at 1:21 AM Голубева Яна wrote: > List or range partitioning isn't suitable for my case. > I am using a column of numeric(20) type as a base for partitioning. The > values of the column are generated randomly. > So there will be too many partitions if I use list partitioning as is. > > Sorry, but why is range not suited for this? It would seem fairly trivial to > create 50 or 1000 partitions to break up the range of values allowed by your > field definition. Alban Hertroys -- There is always an exception to always.
Re: Trigger with conditional predicates
> On 4 Jan 2021, at 20:02, Dirk Mika wrote: > >>> On 1 Jan 2021, at 16:56, Dirk Mika wrote: >>> >>> Hi all and a happy new Year! >>> >>> We have an Oracle schema that is to be converted to PostgreSQL, where >>> conditional predicates are used in some triggers. >>> >>> In particular, columns are populated with values if they are not specified >>> in the update statement which is used. >>> Usually with an expression like this: >>> >>> IF NOT UPDATING('IS_CANCELED') >>> THEN >>>:new.is_canceled := ...; >>> END IF; >>> >>> I have not found anything similar in PostgreSQL. What is the common >>> approach to this problem? > >> Can't you use column defaults to handle these cases? > > That would work for inserts, but not for updates. Usually, if you don’t mention a column in an UPDATE, you want the value to remain as it was, which is precisely what happens by default. That certainly makes sense to me when you’re dealing with an application that doesn’t know about the existence of said column; overwriting an existing value that some other application put there looks like a problem to me. But of course, that depends a lot on what you’re trying to achieve. What is your use-case that that is not the desired behaviour? Or are we talking about a mixed problem here, where this approach works for some fields, but other fields (such as a status change date) always need to be updated (regardless of whether a value was specified)? Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Trigger with conditional predicates
> On 1 Jan 2021, at 16:56, Dirk Mika wrote: > > Hi all and a happy new Year! > > We have an Oracle schema that is to be converted to PostgreSQL, where > conditional predicates are used in some triggers. > > In particular, columns are populated with values if they are not specified in > the update statement which is used. > Usually with an expression like this: > > IF NOT UPDATING('IS_CANCELED') > THEN > :new.is_canceled := ...; > END IF; > > I have not found anything similar in PostgreSQL. What is the common approach > to this problem? > > BR > Dirk Can't you use column defaults to handle these cases? Alban Hertroys -- There is always an exception to always.
Re: SQL group by help
> On 11 Dec 2020, at 18:24, Chris Stephens wrote: > > I'm trying to create a visual representation of a 6x8 grid of samples on a > rack using the following SQL format: > > with rack_display as ( > select sr.ts rack_ts > , sr.rack_id > , r.rack_barcode > , 1 as row_pos > , max(case when rack_well = 0 then 'A1: '||sample_barcode end) as col1 > , max(case when rack_well = 1 then 'A2: '||sample_barcode end) as col2 > , max(case when rack_well = 2 then 'A3: '||sample_barcode end) as col3 > , max(case when rack_well = 3 then 'A4: '||sample_barcode end) as col4 > , max(case when rack_well = 4 then 'A5: '||sample_barcode end) as col5 > , max(case when rack_well = 5 then 'A6: '||sample_barcode end) as col6 > from rack r > , sample_rack sr > , sample s > where r.rack_id = sr.rack_id >and sr.sample_id = s.sample_id >and sr.rack_well < 6 > group by sr.ts, sr.rack_id, r.rack_barcode, row_pos > union all > select sr.ts rack_ts > , sr.rack_id > , r.rack_barcode > , 2 as row_pos > , max(case when rack_well = 6 then 'B1: '||sample_barcode end) as col1 > , max(case when rack_well = 7 then 'B2: '||sample_barcode end) as col2 > , max(case when rack_well = 8 then 'B3: '||sample_barcode end) as col3 > , max(case when rack_well = 9 then 'B4: '||sample_barcode end) as col4 > , max(case when rack_well = 10 then 'B5: '||sample_barcode end) as > col5 > , max(case when rack_well = 11 then 'B6: '||sample_barcode end) as > col6 > from rack r > , sample_rack sr > , sample s > where r.rack_id = sr.rack_id >and sr.sample_id = s.sample_id >and sr.rack_well >= 6 >and sr.rack_well < 12 > group by sr.ts, sr.rack_id, r.rack_barcode, row_pos > union all > ... > ) > select * from rack_display order by rack_ts, rack_id, row_pos; > > the "union all"s continue for another 6 blocks. reports would filter on > rack_id and timestamp. Is time really what groups these batches? I would double-check whether you may be omitting to store some data relevant to this process. > if timestamps for each load of a rack were guaranteed to be the same, this > would work. however, the "sr.ts" values may vary by a few seconds so there is > potential for the "group by" to break. ts differences will be a minimum of 5 > minutes for each distinct load of a rack. > > what i think i need is to manufacture a group by column based off rows in > "sample_rack" that have "ts" values that are < 1 minute from each other and > rack_id is the same. i'm coming up blank on how to accomplish that though. > my first thought was to create an interval of +/- 1 min then find all rows > that overlap and assign a group number but i'm not sure how to accomplish > that. You could date_trunc those timestamps to the minute and group on that. > there's also no guarantee an entire rack is full of samples so some "cells" > of display might be null. i think that makes the use of tablefunc crosstab a > little harder. if i remember correctly, it does not handle missing values > well. i'm open to any pivoting strategy. Many reporting tools have features to support just that. We use WebFOCUS, which calls those ACROSS columns. It’s a common requirement in reporting. Alban Hertroys -- There is always an exception to always.
Re: Accessing Postgres Server and database from other Machine
> On 5 Dec 2020, at 10:05, Paul Förster wrote: > > Hi Hemil, > >> On 05. Dec, 2020, at 07:50, Hemil Ruparel wrote: >> >> Did you restart postgres after changing pg_hba.conf? > > that shouldn't be necessary for changes in pg_hba.conf. Just do either on the > command line: > > $ pg_ctl reload While you’re in there, also verify that something is listening on the port (see below) $ netstat -an > or from psql: > > postgres=# select pg_reload_conf(); > > You can then see the effective result immediately in pg_hab_file_rules: > > postgres=# table pg_hba_file_rules; Also: postgres=# show listen_addresses; postgres=# show port; Those will tell you whether the server is listening on the network and on the expected port. Alban Hertroys -- There is always an exception to always.
Re: Is it possible to write a generic UPSERT?
> On 12 Nov 2020, at 14:58, Mario Emmenlauer wrote: (…) > But the statement is slightly complex to type, and I find me and my > colleagues often spend more time on this than I would hope. Our two > main challenges are: > (1) we have to look up the uniqueness constraints on the table, and > (2) we have to duplicate the insert statement in the UPDATE section >again, because virtually all fields should get overwritten >(except for the conflicting ones). On long inserts this can be >quite annoying and error-prone. > > I can see how "ON CONFLICT" is very powerful. But that power seems > often a burden for us. We would prefer something that is less manual > effort for the specific use case. Basically, we would like: >INSERT if not exist, and >UPDATE _all_ non-conflicting fields in case of _any_ conflict > > In my (naiive) thinking, such a construct would cover 99% of our > use cases. Or did other people make very different experiences? (…) > Has anybody ever done something like this? Is there an SQL way to > achieve this? Or another programmatic way? We generate the SQL @work based on the definitions in, IIRC, the information_schema. It has tables for both the column lists per table and the primary key definitions. With that, an SQL statement that returns the required SQL statement is easy to generate, after which you can execute it either from a plpgsql execute statement in a function or in a do-block. We do this in plpgsql, but that’s mostly because this code is part of our ETL process and it has to perform some other logic on the same data anyway. I could look up our code for you tomorrow, but that looks to be a busy day, so I can’t promise. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Check constraints do not seem to be working!!!
On 11 Nov 2020, at 11:15, Jitendra Loyal wrote: > > > Thanks Nikolay > > I read that but is there a way to meet the above requirement. And I will like > to add that IS NULL and IS NOT NULL should evaluate to true/false. These > operators are made for this and should not be returning NULL. That is exactly what they do. Your problem is with the equality operator and its behaviour with NULL values, which is described in the referenced document. -- If you can't see the forest for the trees, Cut the trees and you'll find there is no forest. >> On Wed 11 Nov, 2020, 14:18 Nikolay Samokhvalov, >> wrote: >>> On Wed, Nov 11, 2020 at 12:26 AM Jitendra Loyal >>> wrote: >>> Despite the above two constraints, the following rows get into the table: >>> insert into t (b , c) values (null, true), (null, false); >> >> This behavior is described in the docs >> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS: >> >> > It should be noted that a check constraint is satisfied if the check >> > expression evaluates to true or the null value. Since most expressions >> > will evaluate to the null value if any operand is null, they will not >> > prevent null values in the constrained columns. To ensure that a column >> > does not contain null values, the not-null constraint described in the >> > next section can be used.
Re: How to update a table with the result of deleting rows in another table
> On 6 Oct 2020, at 7:37, Hemil Ruparel wrote: > > I am trying to delete orders for a given customer on a given date and add the > cost of those orders to credit for the customer. > > So far, I came up with this: > ``` > with data as ( > delete from orders > where customer_id = > and date = '2020-10-05' returning price > ), total as ( > select sum(price) from data > ) > update paymentdetail > set temp_credit = temp_credit + (select * from total) > where customer_id = > ``` > > which works. but is there a better way to update one table using the result > of deleting rows from another table given that I only want the aggregate of > the result? Adding the customer id to your returning clause and using update..from could help: with data as ( delete from orders where customer_id = returning customer_id, price ), total as ( select customer_id, sum(price) as total_price from data group by customer_id ) update paymentdetail set temp_credit = temp_credit + total.total_price from total where customer_id = total.customer_id You could also do this using subqueries instead of CTE’s, that may perform better as CTE’s act as optimisation fences. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Betr: Re: FATAL: terminating connection due to administrator command
"Srinivasa T N" wrote on 01/10/2020 11:47:33: > On Thu, Oct 1, 2020 at 2:47 PM Alban Hertroys < > alban.hertr...@apollovredestein.com> wrote: > Hi all, > > We're seeing the FATAL error message from the subject pop up in our > logs at regular intervals, but I haven't been able to pinpoint what > is causing it. I'm hoping for some insights here. > > We run a PostgreSQL 11.9 server on CentOS 7, within a vmware environment: > PostgreSQL 11.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 > 20150623 (Red Hat 4.8.5-39), 64-bit > > The package was installed from the PGDG repository. > > I'm not even sure I should be worried, there doesn't appear to be > any impact on the servers' functioning, but it does say 'FATAL'. > What we're seeing are lines like these two instances: > > 2020-09-30 22:27:56.446 CEST [30659] STATEMENT: select count(*) > from "dm_b2b"."prlwytzkofskiv1" > 2020-09-30 22:27:56.446 CEST [30658] FATAL: terminating > connection due to administrator command > 2020-09-30 22:27:56.446 CEST [30658] STATEMENT: select count(*) > from "dm_b2b"."prlwytzkofskiv1" > 2020-09-30 22:27:56.446 CEST [30657] FATAL: terminating > connection due to administrator command > 2020-09-30 22:27:56.446 CEST [30657] STATEMENT: select count(*) > from "dm_b2b"."prlwytzkofskiv1" > 2020-09-30 22:27:56.446 CEST [30656] FATAL: terminating > connection due to administrator command > 2020-09-30 22:27:56.446 CEST [30656] STATEMENT: select count(*) > from "dm_b2b"."prlwytzkofskiv1" > 2020-09-30 22:27:56.446 CEST [30655] FATAL: terminating > connection due to administrator command > 2020-09-30 22:27:56.446 CEST [30655] STATEMENT: select count(*) > from "dm_b2b"."prlwytzkofskiv1" > 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker > "parallel worker" (PID 30655) exited with exit code 1 > 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker > "parallel worker" (PID 30656) exited with exit code 1 > 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker > "parallel worker" (PID 30657) exited with exit code 1 > 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker > "parallel worker" (PID 30658) exited with exit code 1 > 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker > "parallel worker" (PID 30659) exited with exit code 1 > 2020-09-30 22:43:08.459 CEST [8055] 172.30.2.25 selfservice_prd > ERROR: schema "somethingelse" does not exist at character 71 > I am guessing that 6 background workers are started, 1 worker had > the result and hence killing the other 5 workers. Maybe, some more > pg experts can comment. Anyway, explain of your query helps. I think you may have the right of it: QUERY PLAN -- Finalize Aggregate (cost=3065970.74..3065970.75 rows=1 width=8) -> Gather (cost=3065970.21..3065970.72 rows=5 width=8) Workers Planned: 5 -> Partial Aggregate (cost=3064970.21..3064970.22 rows=1 width=8) -> Nested Loop Left Join (cost=2772.30..2743631.23 rows=128535594 width=0) Join Filter: ((avl.xx)::text <> ''::text) -> Parallel Hash Left Join (cost=2772.01..943286.00 rows=5574292 width=13) Hash Cond: (avl.x = (dc.x)::integer) -> Parallel Seq Scan on avl (cost=0.00..596772.71 rows=5574171 width=21) -> Parallel Hash (cost=2262.01..2262.01 rows=40800 width=8) -> Parallel Index Only Scan using on dc (cost=0.42..2 -> Index Scan using ix_xxxx on xxx dm (cost=0.29..0.31 rows=1 width=19) Index Cond: ((avl.xx)::text = (xx)::text) Filter: ((xx)::text <> ''::text) (14 rows) So, apparently these FATAL errors are just caused by parallel workers being aborted because they're no longer needed. Good to know. Regards, Alban. Alban Hertroys D: +31 (0)53 4 888 888 | T: +31 (0)53 4888 888 | E: alban.hertr...@apollovredestein.com Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The Netherlands Chamber of Commerce number: 34223268 The
FATAL: terminating connection due to administrator command
h exit code 1 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker "parallel worker" (PID 30658) exited with exit code 1 2020-09-30 22:27:56.459 CEST [6482] LOG: background worker "parallel worker" (PID 30659) exited with exit code 1 2020-09-30 22:43:08.459 CEST [8055] 172.30.2.25 selfservice_prd ERROR: schema "somethingelse" does not exist at character 71 Apparently, something is sending SIGTERM to our pg processes. I know that I'm not doing that, certainly not at those hours, and I'm the one who set up this system and am the only DBA of it. Advice I found on the Internet is to use systemtap with some tap-script, but the scripts that I found just displayed the PID's of processes without telling me their names, which I didn't find all that useful in figuring out who was responsible, so I made an attempt (I have no experience with stap) at modifying it to print process names of signal sender and target: /* * killsnoop-nd.stp Trace process signals. * For Linux, uses SystemTap (non-debuginfo). * * Copyright (C) 2015 Brendan Gregg. (etc) */ global target; global signal; probe begin { printf("%-6s %-12s %-5s %-6s %6s\n", "FROM", "COMMAND", "SIG", "TO", "COMMAND"); } probe nd_syscall.kill { target[tid()] = uint_arg(1); signal[tid()] = uint_arg(2); } probe nd_syscall.kill.return { if (signal[tid()] == 15 && target[tid()] != 0) { printf("%-6d %-12s %-5d %-6d %12s\n" , pid(), execname() , signal[tid()] , target[tid()], pid2execname(target[tid()])); } delete target[tid()]; delete signal[tid()]; } The output of last night was: FROM COMMAND SIG TO COMMAND 30068 systemd-udevd 1514151 systemd-udevd 30068 systemd-udevd 1514836 systemd-udevd (...) 6482 postmaster 1530649postmaster 6482 postmaster 1530648postmaster 6482 postmaster 1530647postmaster 6482 postmaster 1530646postmaster 6482 postmaster 1530645postmaster 6482 postmaster 1530659postmaster 6482 postmaster 1530658postmaster 6482 postmaster 1530657postmaster 6482 postmaster 15 30656postmaster 6482 postmaster 1530655postmaster 6482 postmaster 152065 postmaster 6482 postmaster 15 2064 postmaster 6482 postmaster 152063 postmaster Several of these TO-pid's match those in the PG log. $ ps aux | grep 6482 postgres 6482 0.0 0.5 51755776 1043872 ?Ss Sep28 3:25 /usr/pgsql-11/bin/postmaster -D /data/11/data Am I correct in concluding that postmaster is killing its own processes? If so, what is going on here? And more importantly, what do I do about it? Regards, Alban Hertroys. P.S. I'm mailing from my work account, as this is work related. You may remember me from my private account. Alban Hertroys D: +31 (0)53 4 888 888 | T: +31 (0)53 4888 888 | E: alban.hertr...@apollovredestein.com Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The Netherlands Chamber of Commerce number: 34223268 The information contained in this e-mail is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or action in relation to the contents of this information is strictly prohibited and may be unlawful and request you to delete this message and any attachments and advise the sender by return e-mail. The confidentiality of this message is not warranted. Apollo Vredestein and its subsidiaries rule out any and every liability resulting from this or any other electronic transmission Please consider the environment before printing this e-mail
Re: How to enumerate partitions from a window function?
On Thu, 3 Sep 2020 at 20:59, Michael Lewis wrote: > It seems like you are maybe wanting this- If the previous row is the same, > then get the previous row's run_nr. If it is different, then increment. > > case when lag( property_A ) over() = property_A and lag( property_B ) > over() = property_B then coalesce( lag( run_nr ) over(), 1 ) else lag( > run_nr ) over() + 1 end > > Perhaps there is a much simpler implementation though. > That would work were it not that the very column we're defining is the one to be aliased run_nr. The data does not contain that information, it's what I'm trying to enrich it with and what I'm having trouble wrapping my head around. Your query (adopted a tiny bit) unfortunately results in: select datetime, property_A, property_B , first_value(datetime::time) over run_win as swap_time , case when lag(property_A) over time_win = property_A and lag(property_B) over time_win = property_B then coalesce(lag(run_nr) over time_win, 1) else lag(run_nr) over time_win +1 end , value from process_data window time_win as (order by datetime) , run_win as (partition by property_A, property_B order by datetime) order by datetime ; ERROR: column "run_nr" does not exist LINE 6: then coalesce(lag(run_nr) over time_win, 1) ^ SQL state: 42703 Character: 221 I turned my example into a proper test-case (better late than never): CREATE TABLE process_data ( datetime timestamp without time zone NOT NULL, property_a text NOT NULL, property_b text NOT NULL, value numeric(12,3) ); COPY process_data (datetime, property_a, property_b, value) FROM stdin; 2020-09-03 15:06:00 tea earl grey 0.230 2020-09-03 15:07:00 tea earl grey 0.220 2020-09-03 15:08:00 tea ceylon 0.340 2020-09-03 15:09:00 coffee cappucino 0.450 2020-09-03 15:10:00 coffee cappucino 0.430 2020-09-03 15:11:00 tea earl grey 0.230 \. With the desired result (note that swap_time and run_nr are calculated columns): datetime | property_a | property_b | swap_time | run_nr | value -+++---++--- 2020-09-03 15:06:00 | tea| earl grey | 15:06:00 | 1 | 0.230 2020-09-03 15:07:00 | tea| earl grey | 15:06:00 | 1 | 0.220 2020-09-03 15:08:00 | tea| ceylon | 15:08:00 | 2 | 0.340 2020-09-03 15:09:00 | coffee | cappucino | 15:09:00 | 3 | 0.450 2020-09-03 15:10:00 | coffee | cappucino | 15:09:00 | 3 | 0.430 2020-09-03 15:11:00 | tea| earl grey | 15:06:00 | 4 | 0.230 (6 rows) I've been looking around on the Internet in the meantime, and it seems people either solve this with a recursive CTE (referencing the previous row by row_number() over (...)) or by writing a set-returning function that walks over the data in datetime order using a cursor. Since the actual query is growing more and more state-tracking flags, using a function has the added benefit that referencing state columns from the previous row gets a lot easier (lots of repeated window functions otherwise). It would become a procedural solution instead of a set-based one, but considering that this data is order-sensitive (on datetime), that's probably what a set-based solution would also end up doing anyway. Regards, Alban. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: How to enumerate partitions from a window function?
On Thu, 3 Sep 2020 at 16:01, Tom Lane wrote: > Alban Hertroys writes: > > As stated above, I want to enumerate the runs, starting at 1 and > > incrementing by 1 every time a partition from the 'run' window closes, > > Is there a way to achieve this through window functions, or do we need to > > wrap the thing in a subquery to achieve this? > > I think this'll work: > > select datetime, property_A, property_B > , first_value(datetime)::time over run as swap_time > , dense_rank() over (order by property_A, property_B) > , value > from process_data > window run as (partition by property_A, property_B order by datetime) > ; > > You can't do it with a window function over the "run" window because > no window function ever looks outside the current partition. But > that's easy to fix by using a different window definition. The > planner is smart enough to see that these windows are compatible > and only need one sort to be performed. > > regards, tom lane > Thanks Tom, That gets us close, but it ignores the order of the runs over time. I think it also reassigns the same number to later runs at the same 'day' that happen to have the same values for property_A and _B. That's some crucial information that I forgot to include. To expand on my original example: datetime | property_A | property_B | swap_time | run_nr | value 2020-09-03 15:06 | tea | earl grey | 15:06 | 1 | 0.23 2020-09-03 15:07 | tea | earl grey | 15:06 | 1 | 0.22 2020-09-03 15:08 | tea | ceylon | 15:08 | 2 | 0.34 2020-09-03 15:09 | coffee | cappucino | 15:09 | 3 | 0.45 2020-09-03 15:10 | coffee | cappucino | 15:09 | 3 | 0.43 2020-09-03 15:11 | tea | earl grey | 15:11 | 4 | 0.23 etc. Where the last row has the same characteristic properties as the first 2 rows (from run 1), but is in run 4 due to it having started after run 3. The runs normally start at 1 hour before midnight, with run 1, and continue 24h from there (it's a shifted day-schedule). The above example starting at 15:06 is unlikely to occur in reality, although possible (with long downtime). That's mostly to clarify how the run numbers should function, it would require to at least partition run_nr by a date shifted 1 hour back, as long as they number their runs correctly along the time axis. Regards, Alban. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
How to enumerate partitions from a window function?
We are trying to add some information to a query over data from a continuous process. Most of what we want can be done quite nicely using window functions, but I got stuck on this particular problem: The data has records with a timestamp and a few properties that make subsequent rows belong to the same group. Say we have: create table process_data ( timestamp timestamp not null, property_A text not null, property_B text not null, value numeric(12, 3) ); And a query like: select datetime, property_A, property_B , first_value(datetime)::time over run as swap_time --, here I want to enumerate the runs themselves , value from process_data window run as (partition by property_A, property_B order by datetime) ; As stated above, I want to enumerate the runs, starting at 1 and incrementing by 1 every time a partition from the 'run' window closes, which would result in something like this: datetime | property_A | property_B | swap_time | run_nr | value 2020-09-03 15:06 | tea | earl grey | 15:06 | 1 | 0.23 2020-09-03 15:07 | tea | earl grey | 15:06 | 1 | 0.22 2020-09-03 15:08 | tea | ceylon | 15:08 | 2 | 0.34 2020-09-03 15:09 | coffee | cappucino | 15:09 | 3 | 0.45 2020-09-03 15:10 | coffee | cappucino | 15:09 | 3 | 0.43 etc. Is there a way to achieve this through window functions, or do we need to wrap the thing in a subquery to achieve this? Regards, Alban Hertroys -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: How to properly query lots of rows based on timestamps?
> On 29 Aug 2020, at 10:24, Thorsten Schöning wrote: > > Hi all, > > I have a table containing around 95 million rows, pretty much only > storing a timestamp and further IDs of related tables containing the > actual data in the end. > >> CREATE TABLE clt_rec >> ( >> id BIGSERIAL NOT NULL, >> oms_rec BIGINT NOT NULL, >> captured_at TIMESTAMP(6) WITH TIME ZONE NOT NULL, >> rssiSMALLINT NOT NULL, >> CONSTRAINT pk_clt_rec PRIMARY KEY (id), >> CONSTRAINT fk_clt_rec_oms_rec FOREIGN KEY (oms_rec) REFERENCES "oms_rec" >> ("id"), >> CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec) >> ); > > In many use cases I need to search all of those rows based on their > timestamp to find rows arbitrary in the past: Sometimes it's only 15 > minutes into the past, sometimes it's 2 years, sometimes it's finding > the first day of each month over 15 months for some of those telegrams > etc. In the end, I pretty often need to compare those timestamps and > some queries simply take multiple seconds in the end, especially > adding up if multiple, but slightly different queries need to be > executed one after another. The following are two abstracts of > Postgres' query plans: > > Plan 1: > >> -> Nested Loop (cost=1.14..343169.49 rows=43543 width=20) (actual >> time=0.313..113.974 rows=34266 loops=3) >>-> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec >> (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.153..20.192 >> rows=34266 loops=3) >>Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp >> with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 >> 00:00:00+02'::timestamp with time zone + '1 day'::interval))) >>-> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.80 rows=1 >> width=12) (actual time=0.002..0.002 rows=1 loops=102799) >>Index Cond: (id = clt_rec.oms_rec) What happens here is that the planner looks up the lower and upper boundaries, everything in between those index nodes is a candidate record. Next, it loops over those to match the other condition of your query (id = clt_rec.oms_rec). You didn’t tell whether there’s an index on that column. You’d probably see a performance improvement were you to create an index on (captured_at, id). If your Postgres version is somewhat recent, that could even lead to an Index Only Scan. > Plan 2: > >> -> Nested Loop (cost=1.14..836381.50 rows=111934 width=20) (actual >> time=0.379..911.697 rows=334465 loops=3) >>-> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec >> (cost=0.57..8819.57 rows=111934 width=24) (actual time=0.193..154.176 >> rows=334465 loops=3) >>Index Cond: ((captured_at >= ('2020-08-28 10:21:06.968+02'::timestamp >> with time zone - '14 days'::interval)) AND (captured_at <= ('2020-08-28 >> 10:21:06.968+02'::timestamp with time zone + '00:00:00'::interval))) >>-> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.39 rows=1 >> width=12) (actual time=0.002..0.002 rows=1 loops=1003394) >>Index Cond: (id = clt_rec.oms_rec) And this situation is very much the same issue, apart from the larger number of candidate records. > Postgres seems to properly use available indexes, parallel workers and > stuff like that. But looking at the actual times and compared to all > the other parts of the query, comparing those timestamps simply takes > the most time. It only needs to compare 2 timestamps. > I've looked into this topic and found statements about that one > shouldn't put too many rows into the index[1] and stuff like that or > it will be ignored at all. But that doesn't seem to be the case for me > according to the plan. OTOH, my index really simply is about the > column containing the timestamp, no function reducing things to dates > or stuff like that to reduce the number of rows. > >> CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at ); Try this: CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at, id ); Alban Hertroys -- There is always an exception to always.
Re: Postgres and alias
> On 28 Aug 2020, at 2:14, Fontana Daniel C (Desartec S.R.L.) > wrote: > > Perfect. > > now let's imagine that '1234567890' is a function f_art_get_price(id_code), > which returns in a string like the following 'XXXZMMM1234567890123/mm/dd' > where 1234567890123 is the price and /mm/dd the date it was last changed > price. > How would you do in this case to obtain these values separately? > without calling the function 2 times avoiding overloading the base? > > something like this > > select art.description, >f_art_get_price_str( art.id ) as ls_price_and_date > SUBSTRING( ls_price_and_date, 7, 13 ) > from articulos; Let's assume art is supposed to be an alias for articulos. Something like this?: select art.description, p.ls_price_and_date, SUBSTRING( p.ls_price_and_date, 7, 13 ) from articulos art cross join lateral f_art_get_price_str( art.id ) p(ls_price_and_date); Alban Hertroys -- There is always an exception to always.
Re: table name
> On 11 Jun 2020, at 20:58, Paul Förster wrote: > > Hi Marc, > >> On 11. Jun, 2020, at 20:54, Marc Millas wrote: >> sorry if my question is tooo simple :-) > > it's not. :-) > >> obviously if I ask: >> select * from regions-20180101; >> I get a syntax error. >> if I try select * from $$regions_20180101$$; >> I get another syntax error. >> If I try to rename that table, same thing. >> if I try a cte, same thing. >> >> What should I do ? > > you can just quote its name: > > select * from "regions-20180101"; > > Cheers, > Paul The background here is that ‘’ and $$ are quoting of literals (strings, integers, JSON objects, etc.), while “” is identifier quoting (tables, indices, types, etc.). Identifier quoting not only allows to include special symbols, it also makes the identifier case-sensitive. That’s probably why Paul suggested to rename the table to no longer require identifier quoting - many people consider it a PITA, but it can be used to get out of trouble like yours - some people insist on it, for example because it makes using camel-caps in identifiers meaningful. Regards, Alban Hertroys -- There is always an exception to always.
Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key
> On 2 Jun 2020, at 9:30, Shaheed Haque wrote: > > >> I do something like this to get a set of sub-paths in a JSONB field (no idea >> how to write that in Django): >> >> select snapshot->’pay_definition’->k.value->’name’ >> from MyModel >> join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true >> > I was unaware of the LATERAL keyword, so thanks. After a bit of Googling > however, it seems that it is tricky/impossible to use from the ORM (barring a > full scale escape to a "raw" query). One question: as a novice here, I think > I understand the right hand side of your JOIN "... k(value)" is shorthand for: > > ... AS table_name(column_name) > > except that I don't see any clues in the docs that jsonb_object_keys() is a > "table function". Can you kindly clarify? Correct. Thomas already explained the return type, but the plural form of the function name is also an indication that it returns multiple results. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key
> On 1 Jun 2020, at 20:18, Shaheed Haque wrote: > > Hi, > > I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB > column called 'snapshot'. In Python terms, each row's 'snapshot' looks like > this: > > == > snapshot = { > 'pay_definition' : { > '1234': {..., 'name': 'foo', ...}, > '99': {..., 'name': 'bar', ...}, > } > == > > I'd like to find all unique values of 'name' in all rows of MyModel. I have > this working using native JSON functions from the ORM like this: > > = > class PayDef(Func): > function='to_jsonb' > > template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')" > > MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef', > flat=True) > = > > So, skipping the ordering/distinct/ORM parts, the core looks like this: > > to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name’) I do something like this to get a set of sub-paths in a JSONB field (no idea how to write that in Django): select snapshot->’pay_definition’->k.value->’name’ from MyModel join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true I don’t know how that compares performance-wise to using jsonb_each, but perhaps worth a try. Obviously, the way it’s written above it doesn’t return distinct values of ’name’ yet, but that’s fairly easy to remedy. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Advice request : simultaneous function/data updates on many databases
> On 4 Mar 2020, at 23:42, Guyren Howe wrote: > > On Mar 4, 2020, at 14:33 , Rory Campbell-Lange > wrote: >> >> Essentially we wish to reduce the window where the frontend and backend >> aren't synchronised. >> >> If we have (for example) 200 databases which each take 2 seconds to >> update, a client could be on the wrong frontend code for over 6 minutes. >> Send each of the servers a PL/PGSQL method that executes all the things in a >> transaction and then waits until the same clock time to commit. Then all the >> servers are committing at the same moment. They will still be out of synch >> somewhat, but this would reduce the degree. I’m wondering whether this could be done with a more generic event-based approach, where each server sends a ‘done’ event to a central machine once it’s ready to commit, and the central machine returns an ‘acknowledged’ once the last server sent it’s ‘done’ event. The challenge there is that the ‘ack’ needs to be caught and processed within the same waiting transaction… Not sure how to do that right now - maybe through web services, MQTT or similar. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Trigger
> On 25 Feb 2020, at 17:53, Adrian Klaver wrote: > > On 2/25/20 12:01 AM, Sonam Sharma wrote: >> I have a trigger, like many other triggers that fire after >> update and checks a field of the OLD set. For some reason this trigger throw >> this error: >> ERROR: record "old" has no field "ivo_sts_cd" CONTEXT: SQL statement > >> if exc_count = 0 then >> UPDATE pps.T8071_CAI_IVO_HDR SET IVO_STS_CD = 1 where >> T616_VBU_NBR=old.T616_VBU_NBR and T617_FNC_TYP_CD=old.T617_FNC_TYP_CD and >> T8071_CAI_IVO_ID=old.T8071_CAI_IVO_ID and T8071_ADD_DM= old. T8071_ADD_DM >> and old.ivo_sts_cd != 10 and old.ivo_sts_cd != 3; > > Realized I went through the above to quickly. I do not see a SET, nor am I > clear what table you are trying to UPDATE. I’m pretty sure that if the OP were to format their query in a more readable and consistent way, they would spot their error pretty quickly. It’s a simple typo. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: UPDATE many records
> On 6 Jan 2020, at 21:15, Israel Brewster wrote: > >> On Jan 6, 2020, at 10:08 AM, Christopher Browne wrote: >> >> On Mon, 6 Jan 2020 at 13:36, Israel Brewster wrote: >> Thanks to a change in historical data, I have a need to update a large >> number of records (around 50 million). The update itself is straight >> forward, as I can just issue an "UPDATE table_name SET >> changed_field=new_value();" (yes, new_value is the result of a stored >> procedure, if that makes a difference) command via psql, and it should work. >> However, due to the large number of records this command will obviously take >> a while, and if anything goes wrong during the update (one bad value in row >> 45 million, lost connection, etc), all the work that has been done already >> will be lost due to the transactional nature of such commands (unless I am >> missing something). >> >> Given that each row update is completely independent of any other row, I >> have the following questions: >> >> 1) Is there any way to set the command such that each row change is >> committed as it is calculated? >> 2) Is there some way to run this command in parallel in order to better >> utilize multiple processor cores, other than manually breaking the data into >> chunks and running a separate psql/update process for each chunk? Honestly, >> manual parallelizing wouldn’t be too bad (there are a number of logical >> segregations I can apply), I’m just wondering if there is a more automatic >> option. >> >> Yeah, I'd be inclined to do this in batches. I think you’re overcomplicating the matter. I’d just do it as a single update in one transaction. It’s only 50M rows. It may take half an hour or so on decent hardware, depending on how resource-intensive your function is. If that fails[1], only then would I start looking into batching things. But then you still need to figure out why it fails and what to do about that; if it fails it will probably fail fast, and if not, then you’re looking at a one-off situation that won’t require more than a few workarounds - after which you can just run the update again. Ad 1). No harm has been done, it’s a single transaction that rolled back. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: SQL operator '*='
> On 23 Dec 2019, at 15:33, Matthias Apitz wrote: > > #ifdef DBSPOS > EXEC SQL DECLARE land_cursor CURSOR FOR > SELECT stammprio, lkz, landbez, plkz, postbez, karenz1, > karenz2, > karenz3, land.wkz, webez, we, kurs, land.del > FROM land LEFT JOIN devisen ON land.wkz = devisen.wkz AND land.brgroup = devisen.brgroup > WHERE land.brgroup = :brgroupHost_for_helpland_cursor > ORDER BY stammprio, landbez; > #endif Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Conditional return of aggregated data
On Mon, 2 Dec 2019 at 12:11, Laura Smith wrote: > > My initial idea was something along the lines of : > select (select sum(statcount) from stats_residence) as > aggstat,statcount,short_name_en from stats_residence where > aggstat>some_number; > One option is to move the aggregate to the where-clause. If you also need the value in your select-list, you can just repeat the subselect there, usually the planner is smart enough to figure out that it can just re-use the result. select short_name_en from stats_residence where (select sum(statcount) from stats_residence) >some_number; -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Range contains element filter not using index of the element column
> On 27 Nov 2019, at 10:32, Lauri Kajan wrote: > > Hi all, > I'm wondering if there are anything to do to utilize a index when doing a > range contains element query. I have tested this with 9.6 and 12.0. > > I have a table with a timestamp column that has a btree index. > I would like to do a query: > SELECT * FROM table WHERE ts <@ tsrange($1, $2, '(]'); > The index is not used and a seq scan is done instead. > > To use the index correctly I have to do the query like this: > SELECT * FROM table WHERE ($1 IS null OR $1 < ts) AND ($2 IS null OR ts <= > $2); > I like the <@ syntax more. Is there something I can do differently? Maybe a > different type of index instead? Does it help to use timestamps -infinity and infinity instead of nulls in your case? => select t, t < current_timestamp, current_timestamp <= t from (values ('-infinity'::timestamp), ('infinity'::timestamp)) x(t); t | ?column? | ?column? -------+--+-- -infinity | t| f infinity | f| t (2 rows) Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Finding out about the dates of table modification
> On 23 Nov 2019, at 3:24, Martin Mueller > wrote: > > I've moved happily from MySQL to Postgres but miss one really good feature of > MYSQL: the table of tables that let you use SQL queries to find out metadata > about your table. Thus looking at the table of tables and sorting it by last > change, lets you quickly look at the most recently modified table. Which is > really useful if you have a bad memory, as I do, and can't remember the name > of a particular table that I worked on last Wednesday. > > Are those capabilities hidden somewhere in Postgres? There isn't an obvious > section in the documentation. At least I can't find it. AFAIK, there’s nothing like that built-in, but it’s not impossible to deduce. You could start with getting a list of files in $PG_DATA/base that were modified in that period (provided you have sufficient privileges on that directory): find base/ -mtime -2 -type f -print For figuring out to what tables these files belong [1]: pg_filenode_relation(0, ); and: pg_relation_filepath(); For example, I did: # create table creation_date(test text); [/home/postgres/10/data]$ find base/ -mtime -2 -type f -print base/16403/2608 base/16403/29784 base/16403/2659 base/16403/29789 base/16403/2678 base/16403/29787 base/16403/2662 base/16403/2703 base/16403/2679 base/16403/2673 base/16403/2658 base/16403/1249 base/16403/2610 base/16403/2704 base/16403/2674 base/16403/3455 base/16403/2663 base/16403/1247 base/16403/1259 The lower numbers are probably core tables, such as pg_depend: # SELECT pg_filenode_relation(0, 2608); -- 0 being the default table-space pg_filenode_relation -- pg_depend (1 row) But!: # SELECT pg_filenode_relation(0, 29784); pg_filenode_relation -- creation_date (1 row) And indeed: # select pg_relation_filepath('creation_date'); pg_relation_filepath -- base/16403/29784 (1 row) I was looking for the inverse function pg_filepath_relation(), but that does not appear to exist; That would have been useful in combination with file listings like those from `find`. Mind that larger tables consist of multiple files. I’m sure this would become a head-ache quick on a larger database. Having an actual creation-date of a file would be nice too, but that doesn’t necessarily mean much when growing tables create extra files too. Apparently, someone already turned the process into a number of queries[2]. As they mention though, it’s not 100% reliable though, as there are operations that recreate table files, such as CLUSTER. Then again, if you’re just looking for the table you created last Wednesday, that’s probably not a major concern. Another option is to add a DDL Event trigger on create table statements and log that to some table[3]. Regards, Alban Hertroys [1]: https://www.2ndquadrant.com/en/blog/postgresql-filename-to-table/ [2]: https://stackoverflow.com/questions/18849756/automatically-drop-tables-and-indexes-older-than-90-days/18852752#18852752 [3]: https://www.postgresql.org/docs/current/event-triggers.html -- There is always an exception to always.
Re: The connection to the server was lost. Attempting reset: Failed.
> On 10 Oct 2019, at 17:55, Yessica Brinkmann > wrote: > > I really thought a lot, but I don't understand why but the function fails > after the expression is executed: > appendStringInfo (& cols, "% s a.attnum =% d", (i> 0? "OR": ""), idxcd-> > varattno [i]); > The error appears only to me when entering the cycle: > foreach (cell, candidates) / * foreach cell in candidates * / > more than once, that is, when you have more than one candidate index. If the > cycle is entered only once, the function works correctly. > The error that appears to me is that the connection to the PostgreSQL server > is directly lost. I proved that the error occurs in that statement, printing > some values. There is probably an error in the Postgres log-file providing you more info. That said, at least the below bit in your code is dangerous: foreach( cell, candidates ) /* foreach cell in candidates */ { idxcd = (IndexCandidate*)lfirst( cell ); if( !idxcd->idxused ) continue; if (idxcd!=NULL) { You should at least check for NULL before referencing an attribute of that structure. Personally, I would invert the test like so (and then move it before the idxused test: if (idxcd == NULL) { elog( INFO, "idxcd IS NULL" ); continue; /* Or is that fatal enough to break instead? */ ) if (!idxcd->idxused) continue; Alban Hertroys -- There is always an exception to always.
Re: Arrays and ANY problem
> On 25 Sep 2019, at 22:50, Alban Hertroys wrote: > > >> On 25 Sep 2019, at 22:25, David Salisbury wrote: >> db=# select name from table_name_ds_tmp where ARRAY[categoryid] = ANY ( >> select string_to_array( '200,400', ',')::bigint[] ); >> name >> -- >> (0 rows) > > You are comparing two arrays for equality. Since the left-hand array has only > 1 item and the right-hand one has two, there’s not much equality between them. > > You probably meant: > select name from table_name_ds_tmp where categoryid = ANY ( select > string_to_array( '200,400', ',')::bigint[] ); Or rather: select name from table_name_ds_tmp where categoryid = ANY ( string_to_array( '200,400', ',')::bigint[] ); Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Arrays and ANY problem
> On 25 Sep 2019, at 22:25, David Salisbury wrote: > db=# select name from table_name_ds_tmp where ARRAY[categoryid] = ANY ( > select string_to_array( '200,400', ',')::bigint[] ); > name > -- > (0 rows) You are comparing two arrays for equality. Since the left-hand array has only 1 item and the right-hand one has two, there’s not much equality between them. You probably meant: select name from table_name_ds_tmp where categoryid = ANY ( select string_to_array( '200,400', ',')::bigint[] ); Alban Hertroys -- There is always an exception to always.
Re: Question about password character in ECPG's connection string
> 2) "tcp:postgresql://localhost?user=myuser&password=password" looks like > > "tcp:postgresql://localhost?user=myuser&password=my&pwd" > > and password is parsed on the & and you also end up with an extra parameter > pwd Perhaps it helps to URL-encode the & in the password as %26? Alban Hertroys -- There is always an exception to always.
Re: A GROUP BY question
> On 13 Aug 2019, at 15:19, David G. Johnston > wrote: > > On Tuesday, August 13, 2019, Alban Hertroys wrote: > > > On 13 Aug 2019, at 13:10, stan wrote: > > > > select > > project.proj_no , > > Removed columns that get in the way of your desired result. You can’t have > both details and the sum over them in a meaningful way. > > Sure you can, at least generally, with Window Functions/Expressions (i.e., > OVER) That’s why I added “in a meaningful way” ;) Repeating the same SUM-result on every line in a group is not what I’d call a meaningful result; the SUM has no bearing on the detailed line and leads to the kind of mistakes I already mentioned. (For the record; I do this kind of grouping in a hierarchical database regularly, but there the grouped SUM is at a different level in the hierarchy and I consider it thus sufficiently separated from the detail rows.) Besides, I figured the OP was already struggling with the query syntax, adding window functions into the mix didn’t seem a good idea in the context. Possible?, sure, desirable?, I would say not. Alban Hertroys -- There is always an exception to always.
Re: A GROUP BY question
> On 13 Aug 2019, at 13:10, stan wrote: > > select > project.proj_no , Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a meaningful way. > SUM (rate.rate * task_instance.hours) > from > task_instance > join rate on > rate.employee_key = task_instance.employee_key > AND > rate.work_type_key = task_instance.work_type_key (break) > inner join employee on > rate.employee_key = employee.employee_key > inner join work_type on > rate.work_type_key = work_type.work_type_key These are now probably redundant, you don’t need them unless they filter your results. > inner join project on > project.project_key = task_instance.project_key And this JOIN could be dropped if project_key and proj_no weren’t different fields. If both are unique in project, you could drop one of them and keep the same functionality with fewer joins. That said, in the “war” between surrogate and natural keys I’m on the natural keys side. Clearly, not everyone agrees on that. > GROUP BY > project.project_key , Same columns removed here too. > ORDER BY > project.proj_no > ; That should give you the total cost for each project. You could get the same result repeated per employee and per work type as you tried originally, by putting the above revised query as a subquery and joining that back into the full query in the place of your project-related tables (add the project_key so you have something to join against). The repeated sum risks getting multiplied in the final output though, especially if unaware people will be putting the results in an Excel sheet or something. From experience, that either results in people reporting the wrong financial results (several orders too high) or blaming your query. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: adding more space to the existing server
> On 5 Aug 2019, at 17:27, Julie Nishimura wrote: > > Thanks for your reply Alban. Currently we only have A->B replication. Is > adding B->C replication difficult? I remember in the past I tried to seed > pg_basebackup from hot standby, and it was erroring out after awhile, so > needed to switch to run from master. I’ve never used multi-level replication (or cascading replication, as it’s called in the PG docs). I expect that replication slots (w/ streaming replication) would be resilient to streaming to multiple levels, provided that you have the disk space left to keep the logs that your farthest-behind replica is at, but I do not know that. I’m pretty sure that’s being done though and not just theoretically possible. The basic problem you have is a bit like Towers of Hanoi. The details of setting up each replica may be a little complicated, but if you look at it from a high-level design phase, you’re just shuffling around clusters. The details come later, when you’re actually designing how to apply those replicas/clusters. One of the main problems is that creating a new replica takes a lot of time, you want to minimise the total time that takes. Apparently, you have neither C nor D ready yet, so you need to create two replicas - being able to do those in parallel instead of sequentially would save you time. The other problem is that, to be on the safe side, you want to have a full replica of A at any point in time. If you disconnect B from A before either C or D is complete, you run a risk: If A fails, you don’t have the data that accumulated while B was ‘offline’. So that’s not the best scenario. That is why I think your initial set of replicas should look like: A — B — C \ D IIRC, streaming replication uses pg_basebackup to create the initial replica and then it streams what’s needed to catch up. With replication slots, the master knows what the slaves still need, so it won’t clean up too early. Apparently, the slave (B) knows that it needs to retain data for C as well. It looks perfectly safe on paper, except for the replication lag between A — B. You can, according to the docs, cascade replicas from each other and that even allows replication (among the slaves) to go on after the master gets disconnected - quite what you need for the B — C chain, I would think. Take a look at: https://www.postgresql.org/docs/11/warm-standby.html#STREAMING-REPLICATION And note the sections on Replication Slots and on Cascading Replication. The replication lag between A — B can be solved by pausing the clients connecting to A (half of them need to be changed to B anyway) while B is catching up on its lag. You probably have at least that much down-time to change the connections anyway. Regards, Alban. > From: Alban Hertroys > Sent: Monday, August 5, 2019 5:01 AM > To: Julie Nishimura > Cc: Adrian Klaver ; > pgsql-general@lists.postgresql.org ; > pgsql-general > Subject: Re: adding more space to the existing server > > >> On 5 Aug 2019, at 0:39, Julie Nishimura wrote: >> >> Alban, thank you for your reply. Your suggestion makes sense, and I will be >> talking to our engineers about it. Currently we need to understand: >> >> a) How do we break A -> B replication such that both can become >> independent primaries > > That is pretty much like normal failover from A to B, except that you don’t > reverse replication. > > You will need to stop your clients from sending data for a bit (if it’s > continuous data, having a buffer in between is a big help - at our company > we’re looking into Apache Kafka for that), so that you can switch half of > them to connect to B instead of A. > > Next, you promote B to master. I used the docs for that last time, and they > were pretty clear on the subject. > > >> b) How do we reassign C from B->C replication to A->C replication > > I don’t think you need to. If you indeed already have A->B->C, after > promoting B to master, you end up with B->C, which is alright. > You just need to add A->D for the other set. > >> c) Thoughts on why this isn’t a good plan > > That depends on your clients and how you decide which database in the current > cluster they connect to. If you connect specific clients to specific > databases, then all you need to do is to configure half your clients to > connect to B instead. > > Another option is to put a virtual database layer in front, such that both > clusters still look like a single database to the outside world. We have some > experience with Dremio for similar purposes (although for read-only > reporting). Mind that the community edition doesn’t do authorization. > >> Current: >> A replicates to B >> all requests go to A >>
Re: adding more space to the existing server
On 5 Aug 2019, at 0:39, Julie Nishimura wrote: Alban, thank you for your reply. Your suggestion makes sense, and I will be talking to our engineers about it. Currently we need to understand: a) How do we break A -> B replication such that both can become independent primaries That is pretty much like normal failover from A to B, except that you don’t reverse replication. You will need to stop your clients from sending data for a bit (if it’s continuous data, having a buffer in between is a big help - at our company we’re looking into Apache Kafka for that), so that you can switch half of them to connect to B instead of A. Next, you promote B to master. I used the docs for that last time, and they were pretty clear on the subject. b) How do we reassign C from B->C replication to A->C replication I don’t think you need to. If you indeed already have A->B->C, after promoting B to master, you end up with B->C, which is alright. You just need to add A->D for the other set. c) Thoughts on why this isn’t a good plan That depends on your clients and how you decide which database in the current cluster they connect to. If you connect specific clients to specific databases, then all you need to do is to configure half your clients to connect to B instead. Another option is to put a virtual database layer in front, such that both clusters still look like a single database to the outside world. We have some experience with Dremio for similar purposes (although for read-only reporting). Mind that the community edition doesn’t do authorization. Current: A replicates to B all requests go to A Soon: A replicates to B -> cascading to C and D Transition: break A replication to B such that both can become primary Correct. stop B replication to C then setup A to replicate to C I would change this in: setup A to replicate to D End state: A replicates to C B replicates to D End state: A replicates to D B replicates to C we remove some of the dbs from A and B, then reassign the traffic based on db selections I hope it all makes sense... Thank you It does to me. Now would be a good time for people to chime in if they don't agree ;) From: Alban Hertroys Sent: Saturday, August 3, 2019 3:15 AM To: Julie Nishimura Cc: Adrian Klaver ; pgsql-general@lists.postgresql.org ; pgsql-general Subject: Re: adding more space to the existing server > On 2 Aug 2019, at 21:45, Julie Nishimura wrote: > 1) We use streaming replication, and due to hardware limitation, we cannot add more drives to the existing host. That is why we thought by breaking the existing streaming replication (from a->b), instead of currently identical standby (b), we can introduce twice larger host, then start the replication to the newly larger host, and when it is caught up, break it again. Then break rep again, make modification to 'a" host, making it larger, then replicate b->a. After it is caught up, break the rep again, switch master->standby (if necessary). Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done (streaming) replication once and I managed to mess that up in a minor way (disabled the wrong service during failover, so data still went to the database I was attempting to replace for, like, 15 minutes). > 2) I am not sure about the time, but it is understood it is required 2 full replication cycles, and might be up to 2 weeks with no standby situation No standby situation? Murphy is probably just waiting for that to strike… I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a migration of a couple dozen terabytes from her main server (because of some failing disks in her RAID set) using her backup server to move data around (with backups removed to make room), when, due to an accident outside the building, an aerial 10KV power line hit another power line in the ground, causing a fire in one UPS and frying the other one. Losing power at that point meant that the file systems (ZFS) on both servers ended up in an unrecoverable state with no backups. It didn’t help that the UPS’s were at the bottom of the rack, with the heat and smoke going up into the servers. What are the chances, right? (And then it turned out that it is really hard to try to recover data from a ZFS file system in such a state, which is what her actual inquiry was about) I would definitely prefer to add a 3rd machine into the mix, even if it were just a temporary machine - a rental perhaps? >From there, I’m certain Adrian knows more about replication than I do. I’d go with the approach he suggested. > 4) by pg_basebackup and restore > > As of now, we are thinking about possibly other solutions, as of splitting existing 37 databases on the cluster into 2 hosts with their own standbys. This solution requires breaking up existing replication as well. Can you please point me to some document which lists a
Re: adding more space to the existing server
> On 2 Aug 2019, at 21:45, Julie Nishimura wrote: > 1) We use streaming replication, and due to hardware limitation, we cannot > add more drives to the existing host. That is why we thought by breaking the > existing streaming replication (from a->b), instead of currently identical > standby (b), we can introduce twice larger host, then start the replication > to the newly larger host, and when it is caught up, break it again. Then > break rep again, make modification to 'a" host, making it larger, then > replicate b->a. After it is caught up, break the rep again, switch > master->standby (if necessary). Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done (streaming) replication once and I managed to mess that up in a minor way (disabled the wrong service during failover, so data still went to the database I was attempting to replace for, like, 15 minutes). > 2) I am not sure about the time, but it is understood it is required 2 full > replication cycles, and might be up to 2 weeks with no standby situation No standby situation? Murphy is probably just waiting for that to strike… I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a migration of a couple dozen terabytes from her main server (because of some failing disks in her RAID set) using her backup server to move data around (with backups removed to make room), when, due to an accident outside the building, an aerial 10KV power line hit another power line in the ground, causing a fire in one UPS and frying the other one. Losing power at that point meant that the file systems (ZFS) on both servers ended up in an unrecoverable state with no backups. It didn’t help that the UPS’s were at the bottom of the rack, with the heat and smoke going up into the servers. What are the chances, right? (And then it turned out that it is really hard to try to recover data from a ZFS file system in such a state, which is what her actual inquiry was about) I would definitely prefer to add a 3rd machine into the mix, even if it were just a temporary machine - a rental perhaps? From there, I’m certain Adrian knows more about replication than I do. I’d go with the approach he suggested. > 4) by pg_basebackup and restore > > As of now, we are thinking about possibly other solutions, as of splitting > existing 37 databases on the cluster into 2 hosts with their own standbys. > This solution requires breaking up existing replication as well. Can you > please point me to some document which lists all steps describing breaking up > the existing replication properly? we are using 9.6 postgres I’m going to assume that you will have data coming in while this split is taking place and that you therefore cannot offline the entire set of databases for as long as this takes. If not, that would probably allow for a simpler (faster) scenario. I think the easiest for this scenario would be to add two more machines (c and d) and replicate them off the current setup. You want that to happen as parallel as possible, so perhaps replicate c off a and d off b. If you aren’t already using “replication slots”, I found that to make things both easier to understand and more reliable. You can query their status, for one thing. Those replicas will take extra time of course (about double) because you’re replicating twice what you need, but I don’t think you can replicate parts of a cluster with your setup unless you go for a different replication approach (I think per database replication requires statement level replication?). After that, decouple both sets into: a —> b (your current machine) c —> d (the new ones) (Although any order should be fine, really, as long as they have caught up.) At that point I would probably (temporarily) pause replication in at least one set and create a backup of that. This is the point to start removing superfluous databases from a and c (so that a+c make up the complete set again). After verifying that no databases are missing, unpause replication. If instead you find that you accidentally removed a database from both a and c, you still have replicas to recover it from. And the backups, of course, but that will not contain the data that came in after replication was paused. I do hope the remaining 3% disk space is enough to cover all that, though... Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Request for resolution || Support
> On 24 Jul 2019, at 10:08, jay chauhan wrote: > > Hi Thomas, David/Team, > > Thanks you for your response. However we need your confirmation whether my > Error/issue as mentioned below will be resolved if we upgrade our PostgreSQL > Version. It won’t, you are talking about Oracle-specific features. You need to change the code. PostgreSQL is not Oracle, some features are quite RDBMS-specific. > < compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit">> > David response: Use a newer version > Tomas response: Yeah, you should use release 11 for a new project. > > My Issue while migrating procedure/function from Oracle to PostgreSQL: > Error-1) > ERROR: cannot begin/end transactions in PL/pgSQL > HINT: Use a BEGIN block with an EXCEPTION clause instead. > CONTEXT: PL/pgSQL function > icmsuatnew.eload_commission_payout_active(text,text,text,text,text,text) line > 486 at SQL statement > SQL state: 0A000 > David Response on it : Rewrite your code as instructed How to handle these depends on your use of sub-transactions, but the HINT gives a pretty good general approach. > Error-2) > ERROR: schema "utl_http" does not exist > LINE 38: L_HTTP_REQUEST UTL_HTTP.REQ; > ^ > SQL state: 3F000 > Character: 1785 > Thomas response: That's an Oracle thing for doing HTTP requests from PL/SQL. > To do that from plpgsql you could try an extension like this one: > https://github.com/pramsey/pgsql-http > Or you could write your own function in Python or favourite PL>. That's what I'd probably do. > https://www.postgresql.org/docs/11/plpython-funcs.html Initiating TCP/IP from the database means that a database process needs to wait for a response. In the meantime, it cannot do anything else. You’re effectively blocking it for other transactions and keeping that particular transaction ‘waiting in transaction’ until, in the worst case, a time-out. That means that no maintenance can be done on records touched by this transaction, which can lead to bloat. This is generally considered a bad idea, at least in this community. You’re usually better off handing the connection over to an external process that reports back to the database when appropriate. The exception to that is if your transaction cannot be allowed to commit without a response from the other peer. In such cases it is appropriate to use plpython, plperl, etc All that said, I am talking about PostgreSQL here. If you’re instead using EnterpriseDB, which does have an Oracle compatibility layer that could perhaps support these features (I don’t know), you should talk to the EnterpriseDB guys. Alban Hertroys -- There is always an exception to always.