Re: Help. The database was created using collation version 2.17, but the operating system provides version 2.34.
Dmitry O Litvintsev wrote: > Just want to make clear (sorry I am slow on uptake). I should first > REINDEX and then ALTER DATABASE xxx REFRESH COLLATION VERSION, or > first ALTER and then REINDEX or does the order of these action > matter at all? The order does not matter. The ALTER DATABASE command will simply update the pg_database.datcollversion field with the current version of libc. That will stop the warning being issued, but it doesn't have any other concrete effect. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Help. The database was created using collation version 2.17, but the operating system provides version 2.34.
Dmitry O Litvintsev wrote: > WARNING: database "xxx" has a collation version mismatch > DETAIL: The database was created using collation version 2.17, but the > operating system provides version 2.34. > HINT: Rebuild all objects in this database that use the default collation > and run ALTER DATABASE xxx REFRESH COLLATION VERSION, or build PostgreSQL > with the right library version. This upgrade comprises the major change in GNU libc 2.28, so indeed text indexes created by 2.17 are very likely unsafe to use on your new server. See https://wiki.postgresql.org/wiki/Locale_data_changes > REINDEX database xxx > ALTER DATABASE xxx REFRESH COLLATION VERSION > But this defeats the whole idea of having short downtime because REINDEX > will take forever. The indexes that don't involve collatable types (text,varchar), and those that use the C collation don't need to be reindexed. Maybe you can reduce significantly the downtime by including only the ones that matter. The wiki page gives the query to obtain the list of affected indexes: SELECT DISTINCT indrelid::regclass::text, indexrelid::regclass::text, collname, pg_get_indexdef(indexrelid) FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index, generate_subscripts(indcollation, 1) g(i)) s JOIN pg_collation c ON coll=c.oid WHERE collprovider IN ('d', 'c') AND collname NOT IN ('C', 'POSIX'); > I do not recall having similar issue when going from RH6 to RH7. This warning was added relatively recently, in Postgres 15 (october 2022). Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Time zone offset in to_char()
Alban Hertroijs wrote: > 1). The first function has as a drawback that it changes the time zone for > the entire transaction (not sufficiently isolated to my tastes) But if you add in the function declaration SET timezone TO 'Europe/Amsterdam' like in your 2nd function, or simply SET timezone FROM CURRENT doesn't that solve this problem? Because as the doc says If a SET clause is attached to a function, then the effects of a SET LOCAL command executed inside the function for the same variable are restricted to the function: the configuration parameter's prior value is still restored at function exit The actual value to which the timezone is set through this statement does not matter, as the first instruction in the function overwrites it: perform set_config('timezone', tz_, true /* local */); The point is that the prior value of timezone being restored automatically at function exit, the temporary setting will not leak out of the function. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Import csv to temp table
arun chirappurath wrote: > Do we have any scripts that create a temp table with column names > from the first row of csv files? csvkit [1] does that. [1] https://csvkit.readthedocs.io/en/latest/ Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: how can I fix my accent issues?
Igniris Valdivia Baez wrote: > 3. After the revision the data is loaded to the destiny database and > here is were I believe the issue is, because the data is reviewed in > Windows and somehow Pentaho is not understanding correctly the > interaction between both operating systems. On Windows, a system in spanish would plausibly use https://en.wikipedia.org/wiki/Windows-1252 as the default codepage. On Unix, it might use UTF-8, with a locale like maybe es_CU.UTF-8. Now if a certain component of your data pipeline assumes that the input data is in the default encoding of the system, and the input data appears to be always encoded with Windows-1252, then only the version running on Windows will have it right. The one that runs on Unix might translate the bytes that do not meet its encoding expectations into the U+FFFD code point. At least that's a plausible explanation for the result you're seeing in the Postgres database. A robust solution is to not use defaults for encodings and explicitly declare the encoding of every input throughout the data pipeline. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: how can I fix my accent issues?
Igniris Valdivia Baez wrote: > hello, thank you for answering, it's not a typo, in the attachments > you can see that this is actually my collation, algo a pic of the > problem for more clarification, This character is meant to replace undisplayable characters: From https://en.wikipedia.org/wiki/Specials_(Unicode_block): U+FFFD � REPLACEMENT CHARACTER used to replace an unknown, unrecognised, or unrepresentable character It would useful to know whether: - this code point U+FFFD is in the database contents in places where accented characters should be. In this case the SQL client is just faithfully displaying it and the problem is not on its side. - or whether the database contains the accented characters normally encoded in UTF8. In this case there's a configuration mismatch on the SQL client side when reading. To break down a string into code points to examine it, a query like the following can be used, where you replace SELECT 'somefield' with a query that selects a suspicious string from your actual table: WITH string(x) AS ( SELECT 'somefield' ) SELECT c, to_hex(ascii(c)) AS codepoint FROM string CROSS JOIN LATERAL regexp_split_to_table(x, '') AS c ; Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Feature request for INITCAP() function
Jeff Gerbracht wrote: > It would be great if there was a way to set exceptions to the 'word' > delimiter list used in the INITCAP() function.For example, I have > hyphenated words like blue-green or possessives and contractions like > don't and cat's tail > These become Blue-Green, Don'T and Cat'S Tail. An ICU collation may produce better results for this: select initcap('blue-green don''t cat''s tail' COLLATE "en-x-icu"); initcap - Blue-Green Don't Cat's Tail With hyphens, it's not clear what the results should be. Looking at the IMDb movies database [1] for instance, titles have a mix of upper and lower case following the hyphen, even within the same english title: Hell-to-Pay Austin The Co-respondent A Pit-boy's Romance A Cowboy's Mother-in-Law Our Parents-In-Law The Good-for-Nothing When the Fire-Bells Rang The post-millennial wave ... [1] https://datasets.imdbws.com/ Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Thanks! Re: Who adds the "start transaction" and "commit" to the intended SQL statement in "autocommit" mode?
Bryn Llewellyn wrote: > 2023-02-20 12:42:44.993 PST [2540504] d0$u0@d0 LOG: 0: statement: > insert into s.t(v) values(17); insert into s.t(v) values(42); > 2023-02-20 12:42:44.993 PST [2540504] d0$u0@d0 LOCATION: exec_simple_query, > postgres.c:971 > > It seems a bit odd that psql has no syntax to ask for this in its > interactive mode. Backslash-semicolon is the syntax. Quoted from the doc: \; Backslash-semicolon is not a meta-command in the same way as the preceding commands; rather, it simply causes a semicolon to be added to the query buffer without any further processing. Normally, psql will dispatch an SQL command to the server as soon as it reaches the command-ending semicolon, even if more input remains on the current line. Thus for example entering select 1; select 2; select 3; will result in the three SQL commands being individually sent to the server, with each one's results being displayed before continuing to the next command. However, a semicolon entered as \; will not trigger command processing, so that the command before it and the one after are effectively combined and sent to the server in one request. So for example select 1\; select 2\; select 3; results in sending the three SQL commands to the server in a single request, when the non-backslashed semicolon is reached. The server executes such a request as a single transaction, unless there are explicit BEGIN/COMMIT commands included in the string to divide it into multiple transactions. (See Section 55.2.2.1 for more details about how the server handles multi-query strings.) Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Q: error on updating collation version information
Karsten Hilbert wrote: > The database encoding is UTF8. That br_FR@euro.LATIN9 had > _not_ been added manually. It is also not actively used in my > database(s). br_FR@euro.LATIN9 cannot be used actively in an UTF-8 database because it's for a different encoding than the database. It was probably available in the OS at initdb time. Every available locale (see locale -a) gets imported into the template databases, and then into the other databases, since CREATE DATABASE copies a template without filtering out the locales that are incompatible with the target database. There's no need to upgrade anything in the OS to get the ALTER COLLATION... REFRESH error you mention. It's sufficient to have a mix of collations for different encodings and try to refresh collations whose encoding are not compatible with the current database. > What is the canonical advice on the way forward here ? Is > the _suggested_ solution to delete the collation or am I > missing to see the "proper" approach to fixing it ? ISTM that dropping that collation from any non-LATIN9 database is the more sensible action. Maybe it could be discussed as a possible improvement to have ALTER COLLATION... REFRESH ignore the database encoding and still refresh the version number. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Q: fixing collation version mismatches
Karsten Hilbert wrote: > Which is why my question still stands: does the above > three-strikes operation safely take care of any collation > issues that may currently exist in a database ? For the indexes, yes, but theorically, all constraints involving collatable types need a recheck. For foreign key constraints with non-deterministic collations, there might be equality tests that pass with an older Unicode version and fail with a newer Unicode version. For check constraints as well, checks applied to strings with recent Unicode characters can give different results after an upgrade. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: How to store "blobs" efficiently for small and large sizes, with random access
Dominique Devienne wrote: > PostgreSQL bytea is much better and simpler, except limited to 1GB... > Bytea also has no direct random access, except via substr[ing], but > how efficient and "random access" is that? Bytea contents are compressed before being sliced (in chunks of TOAST_MAX_CHUNK_SIZE bytes, typically it's 2000 IIRC), so it's not possible to access a piece of data without decompressing the contents before it. By contrast large objects are sliced before compression, so the performance of random access is likely to be completely different. > Here are the main requirement I need to fulfil: > 1) store literally millions of rows, 1 "blob" per row. (scientific data). > 2) most "blobs" are smallish, below a few KBs. bytea's perfect for that. > 3) yet many blobs are on the dozens of MBs. bytea's still OK. > 4) but some blobs exceed the 1GB byte limit. Found at least a dozen > just in our test data, so clients will have them. > 5) for accessing larger blobs, the API I must implement accesses > contiguous chunks of the blobs. Thus I need random access. In your case I would probably opt for bytea (as opposed to large objects), and slicing the blobs in the application in chunks of a fixed size much larger than what TOAST does (for instance, somewhere between 128 kB and 32 MB). That is, having a table like: create table blobs ( blob_id some_type, chunk_no int, /* 0->N */ chunk bytea ); It's not as simple as using a single bytea field or large objects, but overall it avoids the management difficulties of both large objects and very large contents in bytea columns. Random access is achieved by skipping the chunks before the requested piece of data. If the app is able to read/write the chunks in binary mode, its queries should perform as well as the large objects functions. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: How to store "blobs" efficiently for small and large sizes, with random access
Dominique Devienne wrote: > the fact the lo table is unique for the whole database would allow > users to see blobs from any schema, as I understand it. Direct access to pg_largeobject is only possible for superusers. If lo_compat_privileges is on, any user can read any large object with the lo* functions. If it's off, they can read a large object only if they're the owner or they have been granted permissions with GRANT { { SELECT | UPDATE } [, ...] | ALL [ PRIVILEGES ] } ON LARGE OBJECT loid [, ...] TO role_specification [, ...] [ WITH GRANT OPTION ] [ GRANTED BY role_specification ] Each large object has its own set of permissions. This is a significant difference with bytea, since every creation of a new large object may need to be followed by GRANT statements. Also if the roles and the access policies are changed in the lifetime of the app, that might imply massive REVOKE/GRANT statements to apply to existing objects. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Number of updated rows with LibPQ
Laurenz Albe wrote: > > Or is there a libpq API on PGresult that would allow to get the type > > of statement the result is from? > > The command tag is not what you look at. Yet that's what psql does. from PrintQueryResult(): /* if it's INSERT/UPDATE/DELETE RETURNING, also print status */ if (last || pset.show_all_results) { cmdstatus = PQcmdStatus(result); if (strncmp(cmdstatus, "INSERT", 6) == 0 || strncmp(cmdstatus, "UPDATE", 6) == 0 || strncmp(cmdstatus, "DELETE", 6) == 0) PrintQueryStatus(result, printStatusFout); } > You simply check the result from an INSERT statement. If it is > PGRES_TUPLES_OK, it must have been INSERT ... RETRUNING. Or a SELECT, or an UPDATE RETURNING, or a DELETE RETURNING. The code still need to look at PQcmdStatus() to learn which kind it is, if it needs to know this. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: ICU is not supported in this build. install from source code.
jian he wrote: > ./configure --with-perl --with-python --with-icu > ICU_CFLAGS='-I/usr/include/unicode' ICU_LIBS='-L/usr/lib/icu' > --enable-debug --with-pgport=5440 The location with -L is not sufficient in ICU_LIBS. The list of libraries to link with should be put as well. For instance: ICU_LIBS="-L/usr/lib/icu -licui18n -licuuc -licudata" When using the version of ICU that ships with the distribution (libicu-dev package on Ubuntu), adding only "--with-icu" should be enough. The locations of includes and libraries will be automatically found. I'm not sure that's your case though. On one hand, /usr/include/unicode is where the package puts the ICU includes, but on the other hand it doesn't create any /usr/lib/icu directory. And Ubuntu 22.04 LTS brings ICU 70, not ICU 71. If you need a self-built version of ICU, it might be simpler to use pkgconfig as an alternative to ICU_CFLAGS/ICU_LIBS, for instance: ./configure --with-icu PKG_CONFIG_PATH=/path/to/icu/lib/pkgconfig where /path/to/icu was passed as the prefix to ICU4C configure invocation. It's recommended to install custom builds to specific locations instead of overwriting the libraries that are part of the distribution (otherwise it becomes unmaintainable) Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Differences in Escaped bytea's when creating a plain pg_dump
WR wrote: > First run worked. > Second run worked. > Then I changed to SET standard_conforming_strings = off; > Third run worked. > Fourth run throw the error > Then I changed back to SET standard_conforming_strings = on; > Fifth run throw the error too. > And only adding E and second backslash helped. This kind of unstable behavior can be seen if the SET may not be executed by the same session (same connection to the server) as the subsequent queries. SET affects only the session it's being run in. For instance a connection pooler configured in statement mode may produce that behavior. The solution in the case of a connection pooler is to group related statements into a transaction. Maybe pgAdmin has a pooler like that, but if you're issuing the statements in the same SQL window, I would find it quite surprising that it doesn't run them by the same session. Or maybe you're mixing queries from different SQL windows that each have their own connection, and in some cases you do the SET in a window and the INSERT in another window. Or maybe it's a pgAdmin bug. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Differences in Escaped bytea's when creating a plain pg_dump
WR wrote: > But one question is left. > > When I read this valid data into a c++ std::string (and I checked that > the single backslashes are still there). Why can't I put this > SQL-command to a pqxx-transaction and execute it. It looks like the > pqxx-transaction unescapes the bytea-string and then it finds the 0x00 > bytes, which are not allowed in text-strings but should be in bytea-strings. You may check your server logs. They would have the error messages with the queries as received by the server. Note that passing '\000'::bytea with a single backslash can be incorrect if standard_conforming_strings if set to off (not recommended since version 8.1). In that case the backslashes need to be doubled. Example: postgres=# set standard_conforming_strings to off; SET postgres=# set escape_string_warning to off; SET postgres=# select '\000'::bytea; ERROR: invalid byte sequence for encoding "UTF8": 0x00 postgres=# select '\\000'::bytea; bytea --- \x00 (1 row) There might also be a problem with how the string is being fed with the C++ code, but you'd need to show us the code to get feedback on it. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: gawk extension linux missing stuff in package manager fedora 36
Jim McNamara wrote: > The package doesn't contain the sql or control file in > /usr/share/pgsql/extensions. It did install .so files though. I ran sudo > dnf repoquery -l package name and looked at the files. It's probably this library: http://gawkextlib.sourceforge.net/pgsql/gawk-pgsql.html Quote: This API can be used by either invoking gawk with a command-line argument of -l pgsql or by inserting @load "pgsql" in your script. It's not a PostgreSQL server-side extension. It's a client-side gawk plugin to query a postgres database from inside a gawk script. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: link that explain ICU_LOCALE locale concept about local variant code and keywords.
jian he wrote: > So which website can I get the info about the "kf" and "u". https://unicode.org/reports/tr35/#u_Extension https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: COPY TO STDOUT WITH (FORMAT CSV, HEADER), and embedded newlines
Dominique Devienne wrote: > so I can easily do that "\n" encoding myself, as a post-processing on > the buffer I get back. Alternatively, it might be easier to use the default TEXT format of COPY rather than CSV, as the TEXT format already produces \n for line feeds, along with half a dozen other special backslashes sequences. See https://www.postgresql.org/docs/current/sql-copy.html Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: COPY TO STDOUT WITH (FORMAT CSV, HEADER), and embedded newlines
Dominique Devienne wrote: > These values are 'normal'. I'm not use to CSV, but I suppose > such newlines > must be encoded, perhaps as \n, since AFAIK CSV needs to be 1 line per row, > no? No, but such fields must be enclosed by double quotes, as documented in RFC 4180 https://datatracker.ietf.org/doc/html/rfc4180 Consider this output: psql> COPY (values (1, E'ab\ncd'), (2,'efgh')) TO STDOUT CSV; 1,"ab cd" 2,efgh That's 2 records on 3 lines. If you feed this to a parser and it chokes on it, it means that it's not a valid CSV parser. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: varchar::bytea fails when varchar contains backslash
Matthias Apitz wrote: > I wanted to look into a column of type varchar converting the content > with ::bytea to a hex string, but this fails when the column contains a > backslash: Yes, casting from text to bytea cannot be used for that. The convert_to() function must be used instead. From the doc at https://www.postgresql.org/docs/current/functions-binarystring.html convert_to ( string text, dest_encoding name ) → bytea Converts a text string (in the database encoding) to a binary string encoded in encoding dest_encoding (see Section 24.3.4 for available conversions). convert_to('some_text', 'UTF8') → \x736f6d655f74657874 Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Can we go beyond the standard to make Postgres radically better?
Peter J. Holzer wrote: > > My use case for such a feature are tables which contain one column (or a > > small number of columns) which you usually don't want to select: A bytea > > column or a very wide text column. In a program I don't mind (in fact I > > prefer) listing all the columns explicitely, but exploring a database > > interactively with psql typing lots of column names is tedious > > (especially since autocomplete doesn't work here). > > Forgot to add: I think that the syntax would have to be more explicit. > It's too easy to mix up >SELECT * - b.a_id FROM ... > and >SELECT *, - b.a_id FROM ... > > Maybe >SELECT * EXCEPT b.a_id FROM ... The solution to this by the SQL standard might be that it can be done with a Polymorphic Table Function, introduced in SQL:2016. https://webstore.iec.ch/preview/info_isoiec19075-7%7Bed1.0%7Den.pdf A practical example with the Oracle implementation can be seen here: https://blogs.oracle.com/sql/post/select-star-except-queries-in-oracle-database Excluding all columns of a given type from a relation is also possible, with both the relation and the type as parameters of the PTF. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Force re-compression with lz4
Florents Tselai wrote: > I have a table storing mostly text data (40M+ rows) that has > pg_total_relation_size ~670GB. > I’ve just upgraded to postgres 14 and I’m now eager to try the new LZ4 > compression. You could start experimenting with data samples rather than the full contents. FWIW, in my case I've found that the compression ratio of lz4 was only marginally better than pglz (like 2% on text). As for decompression time, it doesn't seem to differ significantly from pglz, so overall, recompressing existing data did not seem worth the trouble. However lz4 appears to be much faster to compress than pglz, so its benefit is clear in terms of CPU usage for future insertions. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: JWT decoder
Masih Tavassoli wrote: > But there doesn't seem to be a way doing it in postgres. > Has anyone got any suggesions? RFC 7519 says: A JWT is represented as a sequence of URL-safe parts separated by period ('.') characters. Each part contains a base64url-encoded value. The number of parts in the JWT is dependent upon the representation of the resulting JWS using the JWS Compact Serialization or JWE using the JWE Compact Serialization. base64url is similar to base64 except that the two characters + and / are replaced by - and _ Postgres provides decode(..., 'base64') but it's stricter than the Oracle version showed in your sample code (which seems to ignore the dot character that is illegal in base64 whereas Postgres would reject it). The JWT may be decoded with built-in Postgres functions by splitting the dot-separated parts with regexp_split_to_table(), converting them from base64url into binary, then into UTF-8, and then the results could be cast into the json type if needed. So the SQL code could be: create function decode_base64url(text) returns bytea as $$ select decode( rpad(translate($1, '-_', '+/') -- pad to the next multiple of 4 bytes ,4*((length($1)+3)/4) ,'=') ,'base64'); $$ language sql strict immutable; with parts(x,n) as ( select * from regexp_split_to_table('', '\.') with ordinality ) select n, convert_from(decode_base64url(x), 'utf-8') from parts where n in (1,2); "n" in the query is the part number, 1 for the header, 2 for the payload, 3 for the signature which is ignored here. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: unicode match normal forms
Hamann W wrote: > in unicode letter ä exists in two versions - linux and windows use a > composite whereas macos prefers > the decomposed form. Is there any way to make a semi-exact match that > accepts both variants? Aside from normalizing the strings into the same normal form before comparing, non-deterministic ICU collations will recognize them as identical (they're "canonically equivalent" in Unicode terms) For instance, CREATE COLLATION nd ( provider = 'icu', locale='', deterministic = false ); SELECT nfc_form, nfd_form, nfc_form = nfd_form COLLATE nd AS equal1, nfc_form = nfd_form COLLATE "C" AS equal2 -- or any deterministic collation FROM (VALUES (E'j\u00E4hrlich', E'j\u0061\u0308hrlich')) AS s(nfc_form, nfd_form); nfc_form | nfd_form | equal1 | equal2 --+--++ jährlich | jährlich | t | f (1 row) Normalizing is available as a built-in function since Postgres 13 and non-deterministic collations appeared in Postgres 12. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: CROSSTAB( .. only one column has values... )
(resent to the list, previous post was rejected) Pavel Stehule wrote: > > *That* is a function of how Postgres set returning functions work, and not > > specific to crosstab(). It is not easily fixed. Patches to fix that would > > be > > welcomed! > > > > https://www.postgresql.org/message-id/flat/CAFj8pRC%2BhNzpH%2B0bPRCnqNncUCGjEvpwX%2B0nbhb1F7gwjYZZNg%40mail.gmail.com#9b3fbdd968c271668049a103bfc93880 Oracle has Polymorphic Table Function but still it doesn't work seamlessly for dynamic pivots. This blog post illustrates why: https://blog.sqlora.com/en/dynamic-pivot-with-polymorphic-table-function/ The "related posts" at the end are also interesting about this feature. Even before considering more dynamic SRFs in postgres, having a variable number of columns for a DML statement is problematic in general. When an SQL statement is parsed/prepared, it's not supposed to change its output structure, unless a DDL statement intervenes. The dynamic pivot must break this promise, since a simple INSERT or UPDATE or DELETE in the pivoting rows may cause the number of output columns to change across invocations of the same statement. That would mean that PREPARE would be basically unusable or unreliable for such statements. I think the query parser is also not supposed to read data outside of the catalogs to determine the output structure of a query. This promise would also need to be broken by dynamic pivots implemented as a single-pass DML query. On the other hand, dynamic pivots can be done by generating the SQL dynamically and getting at the results in a second step, or returning a resultset embedded in a scalar (json). Speaking of features that don't exist but might someday, ISTM that CALL crosstab_dynamic(...) was more plausible than SELECT * FROM crosstab_dynamic(...), since CALL doesn't have the same contingencies as SELECT. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Multiple result set to be returned in procedure/function
Thomas Kellerer wrote: >arthur=> \set AUTOCOMMIT off Alternatively, start an explicit transaction block with BEGIN. The point is that the lifespan of the cursor is the transaction block in which it's instantiated. >arthur=> select * from get_results(); >get_results > > > Friendlier names may be used by assigning them in the function, i.e. plpgsql does support: declare c1 refcursor := 'mycursorname'; Then the caller might simply hardcode the cursor names in the FETCH statements rather than building them dynamically at runtime. Also it allows to put the server-side code into an anymous DO block instead of creating a function, because it doesn't have to return any refcursor when the caller already knows the cursor names. BEGIN; DO $$ declare c refcursor := 'c_pgclass'; begin open c for select relname from pg_class; end $$ LANGUAGE plpgsql; FETCH ALL from c_pgclass; ... COMMIT; This form might be closer to how inline blocks of code are written with some other db engines, when they produce resultsets without an explicit cursor interface. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: passing linux user to PG server as a variable ?
David Gauthier wrote: > I can avoid the error by just throwing a namespace in there... > atletx7-reg036:/home/dgauthie[ 214 ] --> setenv PGOPTIONS "-c > 'os.user=$USER' " > But once in, "show os.user" is undefined. It's documented to work [1], but you need to remove these single quotes. For instance: $ env PGOPTIONS="-c os.user=$USER" psql -d postgres psql (12.4 (Debian 12.4-1.pgdg90+1)) Type "help" for help. postgres=> show "os.user"; os.user - daniel (1 row) If characters needed to be quoted in the value, backslash should be used as the quote character. Since it's also a quote character for the shell, two levels of quoting are needed: $ env PGOPTIONS="-c os.user=user\\ is\\ $USER" psql -d postgres psql (12.4 (Debian 12.4-1.pgdg90+1)) Type "help" for help. postgres=> show "os.user"; os.user user is daniel (1 row) [1] https://www.postgresql.org/docs/current/config-setting.html Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Switching Primary Keys to BigInt
Mohamed Wael Khobalatte wrote: > We lock the table as a precaution, with the understanding that we are > undergoing a "small" downtime to finish replacing the int id by the new > bigint Ah, sorry I overlooked that most row updates are done pre-transaction in a preliminary step: /* in batches, we update all the rows to new_id = id */ Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Switching Primary Keys to BigInt
Mohamed Wael Khobalatte wrote: > > alter table change_seq alter COLUMN id set data > > type bigint; > This is significant downtime, since it locks exclusively, no? We want to > avoid that. Well, in the steps you mentioned upthread, the transaction starts by doing LOCK TABLE some_table, so it will hold an exclusive lock on it for the rest of the transaction. If you can test how the ALTER TABLE... SET TYPE ... compares to your procedure in terms of downtime, that would be interesting. To me, it's not clear why the procedure in multiple steps would be better overall than a single ALTER TABLE. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
RE: psql: FATAL: database "postgres" does not exist or ERROR: 23505: duplicate key value violates unique constraint "pg_namespace_nspname_index"
TALLURI Nareshkumar wrote: > [0]postgres@axmdevhkdb008$ [PHKGAXMD008] psql > psql: FATAL: database "postgres" does not exist That's not necessarily a problem. The "postgres" database is generally empty and some installations don't even have one. Use "psql -d template1" instead, or start postgres in single-user mode. > Since we are not able to connect to DB , we did a new initialization and > copy the directories from base directory to new base directory(in new data > location) This can't work. If you have a backup from just before that point, you should restore it and work from that without writing anything into the data directory. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
Jim Hurne wrote: > "Daniel Verite" wrote on 06/22/2020 06:00:37 PM: > > If you can arrange a maintenance window, a faster way to rebuild > > pg_largeobject when it contains mostly empty pages can be to: > > Thanks Daniel. If we cannot arrange for a maintenance window, is this the > sort of thing that pg_repack[1] could be used to do? Unfortunately it can't work with pg_largeobject: https://github.com/reorg/pg_repack/issues/163 Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
Jim Hurne wrote: > We are of course going to continue to try different things, but does > anyone have any other suggestions on what we should be looking at or what > settings we might want to adjust? If you can arrange a maintenance window, a faster way to rebuild pg_largeobject when it contains mostly empty pages can be to: - export the contents into files: for id in select oid from pg_largeobject_metadata loop perform lo_export(id, '/tmp-path/'||id::text); end loop; - SET allow_system_table_mods to ON; (needs a restart) - truncate table pg_largeobject, pg_largeobject_metadata; - reimport the files with the same OIDs for id in select pg_ls_dir('/tmp-path/') loop perform lo_import('/tmp-path/' || id::text, id::oid); end loop; - remove the files in /tmp-path - Set allow_system_table_mods back to OFF and restart again, unless you don't need that safety check and prefer to leave it permanently to ON to avoid the restarts. With less than 60MB of actual contents, all this might take no more than a few minutes, as these operations don't need to fully scan pg_largeobject, which is what is problematic with vacuum. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]
Thorsten Schöning wrote: > > The caveat you mention about IF NOT EXISTS does not apply to > > temporary tables, as they're not shared across sessions.[...] > > That's what I understood as well, but I'm creating those concurrently > WITHIN one and the same session and transaction. :-) But a SQL session on the server takes its statements from a FIFO queue and processes them serially, so there's no intra-session concurrency. In fact multi-threaded SQL clients *must* make sure that they don't send concurrent queries to the same connection. The best they can do in terms of throughput is to queue up a new query while the server is busy executing a previous one, but that's pipelining, not parallelism. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]
Thorsten Schöning wrote: > I'm using "CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]" and the > associated queries can take a long time. So the following lists some > questions about executing those concurrently, even thouzgh I've > already read threads like the following: > > The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend > > to handle concurrency issues any better than regular old CREATE > > TABLE, which is to say not very well.[...] > > https://www.postgresql.org/message-id/ca+tgmozadyvtwbfp1fl2smzbihcwt4uprzrlnnx1nb30ku3...@mail.gmail.com The caveat you mention about IF NOT EXISTS does not apply to temporary tables, as they're not shared across sessions. That is, if two concurrent transactions execute at the same time CREATE TEMP TABLE IF NOT EXISTS foo(...) it can't fail as described above because that creates two distinct tables, each private to their session. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: psql show me the : and ask user input, when running one sql file
Adrian Klaver wrote: > On 4/5/20 9:46 AM, Adrian Klaver wrote: > > On 4/5/20 5:50 AM, arden liu wrote: > > > 4) I don't see anything wrong the statements, so I am wondering if it is > > a shell issue? > > Seems to be. I removed the RETURNING *_id from the INSERT statements and > the file ran without interruption: Presumably these results are being displayer with a pager, and it's the pager that is asking for keyboard input. You may add -P pager=off to psql options to suppress this, or remove permanently the RETURNING clauses that seem pointless in that context. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Could postgres12 support millions of sequences? (like 10 million)
pabloa98 wrote: > When I have a medium number of sequence I will report how it behaves. It > will take some time though. Be aware that creating the sequences on the fly has the kind of race condition that you wanted to avoid in the first place. For instance consider this execution in two concurrent sessions: S1: BEGIN; S1: CREATE SEQUENCE seq1 IF NOT EXISTS; S2: BEGIN; S2: CREATE SEQUENCE seq1 IF NOT EXISTS; S2: (now blocked waiting for S1) S1: COMMIT; S2:ERROR: duplicate key value violates unique constraint "pg_type_typname_nsp_index" DETAIL : Key (typname, typnamespace)=(seq1, 36434) already exists. The client could catch these errors and retry, but then it could also do that with serializable transactions on serialization failures (SQLSTATE 40001), and you'd get the guarantee of consecutive numbering without creating all these sequences, not to mention the protection against other potential concurrency anomalies. See https://www.postgresql.org/docs/current/transaction-iso.html Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: SQL Query Syntax help
srikkanth wrote: > Can you please help me in writing the syntax for the below mentioned This looks like an UNPIVOT operation. Here's a generic method that does this without having to specify the columns individually, with the help of json functions: SELECT ID, key, value FROM (SELECT ID, row_to_json(t.*) AS line FROM PivotTableName t) AS r JOIN LATERAL json_each_text(r.line) on (key <> 'ID'); Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Inexplicable duplicate rows with unique constraint
Richard van der Hoff wrote: > synapse=> select count(*), room_id, type, state_key from > current_state_events group by 2,3,4 order by count(*) DESC LIMIT 2; > count | room_id | type | state_key > ---+---+---+- > 3 | !ueLfVrSWYGMnFnoCbT:darkfasel.net | m.room.member | > @irc_ebi_:darkfasel.net > 3 | !HwocBmCtBcHQhILtYQ:matrix.org| m.room.member | > @freenode_AlmarShenwan_:matrix.org Looking at these columns which are of type text but do not contain words of any particular language, there's probably no point in using a linguistic-aware collation for them. If you maintain the database schema, what you could do to avoid the dependency on the OS collation and stay clear of the particular upgrade difficulty of collations is to use COLLATE "C" for this kind of field, as opposed to the default collation of the database. As a bonus, operations with the "C" collations tend to be faster, sometimes even much faster. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Inexplicable duplicate rows with unique constraint
Richard van der Hoff wrote: > So, question: what could we be doing wrong to get ourselves into this > situation? OS/libc upgrades without reindexing come to mind. See https://wiki.postgresql.org/wiki/Collations > * At least one user reports that he has recently migrated his database > from one server to another via a `pg_dump -C` and later piping into psql. This kind of migration recreates indexes (since it recreates everything) so it's not subject to the same hazard as an OS/libc upgrade without reindexing. In fact it would detect the problem if it existed, as the creation of the unique constraint in the new db would fail if the data in the dump did not satisfy it. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: replace single char for string using regexp_replace
PegoraroF10 wrote: > I have a > Replace(Replace(Replace(Replace($$Text,with;On'It"$$,',','chr(59)'),';','chr(44)'),'"','chr(34)'),,'chr(39)') > It works but I would like to call just one Replace. For performance, it might be the fastest method, despite the lack of elegance and the multiple passes on the string. In a recent discussion on the list [1] I've asked on how best to do multiple string replacements. That's a generalized version of your question and the answer might be heavier but anyway here's some code on the wiki with plperl and plpgsql versions: https://wiki.postgresql.org/wiki/Multi_Replace_Perl https://wiki.postgresql.org/wiki/Multi_Replace_plpgsql If you're okay with plperl see the first version, as it's much more efficient, with Perl implementing the multiple replacement natively in its regexp engine (plus implementing the search with a trie since 5.10). The plpgsql version tries do its best with regexp_replace, but it scales much worse as the number of replacements grows. But it does work in the situations where a stack of nested replace() calls wouldn't work, like replace foo with bar and bar with foo. In your case, as the strings to replace always consist only of one character, you might also split the string by characters, replace them with a CASE WHEN... construct, and reassemble the result with string_agg, as in: select string_agg(c, '') from (select case c when ',' then 'chr(59)' when ';' then 'chr(44)' ... other substitutions... else c end from regexp_split_to_table($$The Original String$$, '') as s1(c) ) as s2(c); If the strings are not too large and there are many occurrences of the characters to replace, I would expect this to be more efficient than the more generic plpgsql-based solution above. Against the Perl version I don't know. There is a per-call overhead with plperl that can't be ignored if you're focused on performance. [1] https://www.postgresql.org/message-id/306b726b-f185-4668-bffe-ac8e7f788...@manitou-mail.org Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Mixing greediness in regexp_matches
Tom Lane wrote: > regression=# select regexp_split_to_array('junkfoolbarfoolishfoobarmore', > 'foo|bar|foobar'); > regexp_split_to_array > --- > {junk,l,"",lish,more} > (1 row) > > The idea would be to iterate over the array elements, tracking the > corresponding position in the source string, and re-discovering at > each break which of the original alternatives must've matched. > > It's sort of annoying that we don't have a simple "regexp_location" > function that would give you back the starting position of the > first match. It occurred to me too that regexp_split_to_table or array would make this problem really easy if only it had a mode to capture and return the matched parts too. FWIW, in plperl, there's a simple solution: $string =~ s/(foobar|foo|...)/$replace{$1}/g when %replace is a hash of the substitutions %(foo=>baz,...). The strings in the alternation are tested in their order of appearance, so you can choose to be greedy or not by just sorting them by length. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Mixing greediness in regexp_matches
Tom Lane wrote: > I'd try forcing the match to be the whole string, ie > > ^(.*?)(foo|bar|foobar)(.*)$ > > which would also save some work for restarting the iteration, > since you'd have already captured the all-the-rest substring. In that case regexp_matches will return 0 or 1 row. In the above-mentioned example, that would be: => select regexp_matches('the string has foo and foobar and bar and more', '^(.*?)(foo|foobar|bar)(.*)$', 'g'); regexp_matches {"the string has ",foo," and foobar and bar and more"} So the next iteration would consist of calling regexp_matches() on result[3], and so on until no match is found. I think it would work as desired, but probably much less efficiently on large strings/large number of matches than if a single call of regexp_matches() could return all matches. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Mixing greediness in regexp_matches
Hi, When looking into how to implement a global replace of multiple substrings (each with their own replacement) in sql or plpgsql, I'm wondering if/how an RE with an alternation can be used. The basic idea is to iterate on the rows produced by regexp_matches(string, '(.*?)(foo|bar|foobar)', 'g') to break down the string into pairs of (non-matching segment, matching segment) so that a final result can be assembled from that (setting aside the last non-matching segment, that can be retrieved in a final step). The difficulty is that the longest strings in the alternation should be prioritized, but the starting (.*?) makes the RE non-greedy so "foo" is choosen over "foobar". The doc at [1] leaves me unoptimistic when it mentions that: "...when an RE contains both greedy and non-greedy subexpressions, the total match length is either as long as possible or as short as possible, according to the attribute assigned to the whole RE. The attributes assigned to the subexpressions only affect how much of that match they are allowed to “eat” relative to each other." Also it gives this example of forcing the RE as a whole to be greedy despite it having a non-greedy sub-RE: regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}') but it doesn't seem to be able to produce the desired result in the case of the RE in the middle being an alternation with strings of different lengths. The ideal RE with a (foo|foobar|bar) alternation, when applied globally to a string like 'the string has foo and foobar and bar and more' would produce something like: {"the string has ","foo"} {" and ","foobar"} {" and ","bar"} Is that possible with regexp_matches? [1] https://www.postgresql.org/docs/current/functions-matching.html
Re: Tuple concurrency issue in large objects
Shalini wrote: > Could you also please state the reason why is it happening in case > of large objects? Because concurrent transactions are very well > handled for other data types, but the same is not happening for > lobs. Is it because the fomer are stored in toast table and there is > no support for concurrent txns in pg_largeobject table? Keeping in mind that large objects are not a datatype, but rather a functionality that is built on top of the bytea and oid datatypes plus a set of functions, I wouldn't say that concurrent writes would be better handled if you had a table: document(id serial, contents bytea) with "contents" being indeed toastable. To illustrate with a basic example: transactions Tx1 and Tx2 want to update the contents of the same document concurrently, with this order of execution: Tx1: begin Tx1: update document set contents=... where id=... Tx2: begin Tx2: update the same document (gets blocked) Tx1: commit Tx2: commit If using the read committed isolation level, Tx2 will be put to wait until Tx1 commits, and then the update by Tx1 will be overwritten by Tx2. That's a well known anomaly known as a "lost update", and probably not what you want. If using a better isolation level (repeatable read or serializable), the update by Tx2 will be rejected with a serialization failure, which, to me, seems the moral equivalent of the "Tuple concurrently updated" error you're reporting with large objects. When this occurs, your application can fetch the latest value in a new transaction and see how it can apply its change to the new value, unless another conflict arises and so on. In short, the best the database can do in case of conflicting writes is to inform the application. It can't know which write should be prioritized or if the changes should be merged before being written. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Tuple concurrency issue in large objects
Shalini wrote: > > Is there a workaround to this concurrency issue without creating a > > new large object? The transaction failing with the "Tuple concurrently updated" error could be resubmitted by the client, as if it was a serialization failure. Or the failure could be prevented by using advisory locks: https://www.postgresql.org/docs/current/explicit-locking.html Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Issue upgrading from V11 to V12 on Debian
stan wrote: > BUT, I went to the directory where I have the exentsion's source, did a > make clean ; make ; make install, and the files were still installed in the > V11 tree. How can I instruct the system to put these in the V12 tree? With the Debian packaging, /usr/bin/pg_config is a shell script that has this comment on top: # If postgresql-server-dev-* is installed, call pg_config from the latest # available one. Otherwise fall back to libpq-dev's version. According to this comment, invoking /usr/bin/pg_config refers to the v11 server probably because you didn't install postgresql-server-dev-12 yet. When you'll have both postgresql-server-dev-11 and postgresql-server-dev-12 installed, independently of which clusters are actually running, you can still force a specific target version when compiling an extension with : $ make PG_CONFIG=/usr/lib/postgresql/11/bin/pg_config [install] or $ make PG_CONFIG=/usr/lib/postgresql/12/bin/pg_config [install] Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE
Jeff Lanzarotta wrote: > I have a question about nondeterministic collations in PostgreSQL 12. I > have created a new collation that is nondeterministic and created several > columns which use this collation. Querying these columns works great until > I use LIKE. When I do, I get the following error: > > SQL Error [0A000]: ERROR: nondeterministic collations are not supported for > LIKE > > Is there any plan to allow this functionality? PostgreSQL development is conducted without a roadmap [1]. Maybe someone will submit a patch to enable LIKE with nondeterministic collations, but so far it did not happen according to the current set of patches at https://commitfest.postgresql.org Such matches can be weirder than you might think (not to mention much slower). Consider for instance a collation that ignores punctuation: CREATE COLLATION "nd3alt" ( provider = 'icu', locale='und@colAlternate=shifted', deterministic = false ); In the icu_ext extension, icu_strpos [2] can match a substring with a nondeterministic collation, which is one part of what LIKE would need to do for such collations. The implementation uses the string search facility of the ICU library. With the above-defined collation, we can have for instance: SELECT icu_strpos('abc. ...de', 'c,d' COLLATE nd3alt); icu_strpos 3 So even though 'c,d' is not a substring of 'abc. ...de' in the common sense, it is recognized as such by this collation, by design. A LIKE operator for nondeterministic collations should be able to recognize this too, but with an arbitrary number of substrings to match in the pattern, plus it should handle the underscore wildcard in a way that hopefully makes sense. With the example above, 'abc. ...de' LIKE '%c,d%' COLLATE nd3alt should certainly be a match, but in the case of this variant: 'abc. ...de' LIKE '%c_d%' COLLATE nd3alt it's not necessarily clear how (or even if) it should work. [1] https://www.postgresql.org/developer/roadmap/ [2] https://github.com/dverite/icu_ext#icu_strpos Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Text search lexer's handling of hyphens and negatives
raylu wrote: > to_tsvector('simple', 'UVW-789-XYZ') is > 'uvw':1 '-789':2 'xyz':3 > because -789 is a negative integer. If we turn the query '789-XYZ' > into the tsquery as before, we get to_tsquery('simple', '789 <-> xyz') > which doesn't match it. > > Are we missing something here? Is there either a way to > 1. generate tsvectors without this special (negative) integer behavior or As an ad-hoc solution, you could add a dictionary that turns a negative integer into its positive counterpart. There's a dictionary in contrib that can be used as a starting point: https://www.postgresql.org/docs/current/dict-int.html It's a matter of ~10 lines of C code to add an "abs" parameter to that dictionary that would, when true, produce "789" as a lexem when fed "-789" as input. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Perl DBI converts UTF-8 again to UTF-8 before sending it to the server
Christoph Moench-Tegeder wrote: > And then it doesn't know that your terminal expects UTF-8 (perl > just dumps the binary string here), because you didn't tell it: > "binmode(STDOUT, ':encoding(utf8)')" would fix that. Or use perl -C, so that it gets that from the environment. From https://perldoc.perl.org/perlrun.html : -C on its own (not followed by any number or option list), or the empty string "" for the PERL_UNICODE environment variable, has the same effect as -CSDL. In other words, the standard I/O handles and the default open() layer are UTF-8-fied but only if the locale environment variables indicate a UTF-8 locale. Now for what the OP is doing, I'd suggest to use Dump() from the Devel::Peek module instead of print. To see the difference between a literal "ä" and "\xc3\xa4" from the point of view of Perl: use Devel::Peek; use utf8; $str = "\xc3\xa4"; Dump($str); $str = "ä"; Dump($str); Result: SV = PV(0x55af63beeda0) at 0x55af63c185d0 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x55af63c3c230 "\303\244"\0 CUR = 2 LEN = 10 COW_REFCNT = 1 SV = PV(0x55af63beeda0) at 0x55af63c185d0 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x55af63c58dc0 "\303\244"\0 [UTF8 "\x{e4}"] CUR = 2 LEN = 10 COW_REFCNT = 1 Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Case Insensitive Comparison with Postgres 12
Igal Sapir wrote: > > Out of curiosity is there a eason not to use the citext type for th? > > > > > Using the collation seems like a much cleaner approach, and I trust ICU to > do a better job at comparing strings according to language rules etc. One notable difference between citext and case-insensitive collations by ICU is that the latter recognizes canonically equivalent sequences of codepoints [1] as equal, while the former does not. For instance: =# CREATE COLLATION ci (locale='und@colStrength=secondary', provider='icu', deterministic=false); =# SELECT E'E\u0302TES'::citext = 'Êtes'::citext AS "citext-equal", E'E\u0302TES' = 'Êtes' collate "ci" AS "ci-equal"; citext-equal | ci-equal --+-- f| t Another significant difference is that building or rebuilding an index on a text column with a CI collation appears to be way faster than with citext (I've seen 10:1 ratios, but do your own tests). On the minus side, substring matching with LIKE or other methods is not possible with CI collations whereas it does work with citext. [1] https://en.wikipedia.org/wiki/Unicode_equivalence Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: psql \copy hanging
Arnaud L. wrote: > Anyway, it hung using this syntax during last night's run. > I'll give it another try tonight just to be sure. When psql.exe is hanging, maybe you could use a tool like Process Monitor [1] or Process Explorer [2] to get insights about what it's stuck on or what it's doing exactly. [1] https://docs.microsoft.com/en-us/sysinternals/downloads/procmon [2] https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: psql \copy hanging
Arnaud L. wrote: > As a side note, COPY (...) TO STDOUT \g 'somefile' does not work in a > script file (i.e. it does not work if the command is passed in a file > via the -f argument). > The command runs fine, no error is raised either by the client or the > server, but no file is written. Testing this with 11.5, it works for me. Make sure you're running the latest minor release (on the client side in this case), because a related fix was issued last February. For the 11 branch it was in version 11.2. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Postgres 12: backend crashes when creating non-deterministic collation
Thomas Kellerer wrote: > > To get a case insensitive collation you'd have to use something like > > > > LOCALE = 'de-DE-u-ks-level2' > > Creating works, but apparently on Windows ICU does not support this. After installing v12 on windows with the EDB installer, I notice that it ships with ICU 53, a relatively old version (2014). Concerning the problem just above (not the crash), ICU 53 is too old to support BCP47 tags as collation attributes, as mentioned at https://www.postgresql.org/docs/12/collation.html : "The first example selects the ICU locale using a “language tag” per BCP 47. The second example uses the traditional ICU-specific locale syntax. The first style is preferred going forward, but it is not supported by older ICU versions. With ICU 53 or older, instead of the locale above, we must use the old-style syntax: locale = 'de-DE@colStrength=secondary' If you use that in your example, the case insensitive lookups should work. But it's unfortunate that the EDB build did not switch to a recent ICU version for PostgreSQL 12. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Postgres 12: backend crashes when creating non-deterministic collation
Thomas Kellerer wrote: > It also works on Windows when I specify "correct" locale names - the above > seems to be an edge case. > Is it worth the effort to report that through the bug reporting form? Sure. Both the crash with 'de-x-icu' and the difference in behavior between Linux and Windows on locale = 'de-DE-u-ks-level2' look like "must-fix" bugs to me. Both are valid ICU locales and should work the same in all operating systems. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: "Failed to connect to Postgres database"
Marco Ippolito wrote: > (base) postgres@pc:~$ psql --cluster 11/fabmnet -h localhost > Password for user postgres: > psql: FATAL: password authentication failed for user "postgres" > FATAL: password authentication failed for user "postgres" Did you set a password for the postgres user in that newly created cluster? If not, try psql --cluster 11/fabmnet (without -h localhost), it should connect you without a password, then set a password with the \password command in psql, then try again with -h localhost. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Issues with inconsistent COLLATION installation
Cory Nemelka wrote: > ERROR: 22023: could not create locale "fr_FR.utf8": No such file or > directory > DETAIL: The operating system could not find any locale data for the locale > name "fr_FR.utf8". > LOCATION: report_newlocale_failure, pg_locale.c:1312 I can reproduce this by creating a new locale *after* starting PostgreSQL and trying to use it before a restart. Example with Ubuntu 18.04 and Postgres 11.5: # locale-gen es_ES.utf8 Generating locales (this might take a while)... es_ES.UTF-8... done Generation complete. Immediately refering to the locale will fail: $ sudo -u postgres psql -d test -U postgres test=# create collation es (provider=libc, locale='es_ES.utf8'); ERROR: could not create locale "es_ES.utf8": No such file or directory DÉTAIL : The operating system could not find any locale data for the locale name "es_ES.utf8". Now restart postgresql $ sudo /etc/init.d/postgresql restart [ ok ] Restarting postgresql (via systemctl): postgresql.service. And now it works: $ sudo -u postgres psql -d test -U postgres test=# create collation es (provider=libc, locale='es_ES.utf8'); CREATE COLLATION test=# select * from pg_collation where collname='es' \gx -[ RECORD 1 ]-+--- collname | es collnamespace | 2200 collowner | 10 collprovider | c collencoding | 6 collcollate | es_ES.utf8 collctype | es_ES.utf8 collversion | Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: database "cdf_100_1313" does not exist
nikhil raj wrote: > /usr/lib/postgresql/11/bin/psql -p 5432 -U postgres -d $DBLIST -c "GRANT > CONNECT ON DATABASE \"$DBLIST\" TO cpupdate" > ERROR: database " CDF_100_1313" does not exist That's because you don't use the unaligned format to get the list of results. You should add -A to psql options. Or better yet, use a simpler method that does not use shell variables at all and work with a single psql call: $ psql -U postgres -d postgres < \gexec EOF That's possible because you don't need to be connected to a database to grant the right to connect to that database. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Permissions on postgresql.conf, psql and patroni
Peter J. Holzer wrote: > 2) Why does psql need to read postgresql.conf, and more specifically, > why does it care about the location of the data directory? It > shouldn't access files directly, just talk to the server via the > socket. It's not psql itself, it's pg_wrapper. $ ls -l /usr/bin/psql lrwxrwxrwx 1 root root 37 Aug 8 12:48 /usr/bin/psql -> ../share/postgresql-common/pg_wrapper See http://manpages.ubuntu.com/manpages/bionic/man1/pg_wrapper.1.html Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?
Bryn Llewellyn wrote: > B.t.w., I’m guessing that the “begin” SQL command that you see in the log > that I mentioned is actually issued by (some) clients—at least psql and > Python-on-psycopg2—as an explicit call from the client. In other words, it > isn’t the server that generates this. Does anyone know for sure how this > works? Well, that's the point of Autocommit, and moreover it does nothing else. Maybe you're still confused about this. * Autocommit off = the client automatically adds a "BEGIN" when it thinks a transaction must be started on behalf of the user. * Autocommit on = the client does nothing. The fact that "off" implies doing something and "on" implies not interfering is counterintuitive, but that's how it is. Autocommit is for compatibility with other databases. If it was only for Postgres, I guesss either it wouldn't exist in the first place or it should be called "AutoStartTransactionBlock" or something like that, because that's just what it really does. Anyway, the server only know whether a BEGIN has been issued. It never knows or cares whether it has been added implicitly or explicitly, which is why it can be quite confusing to reason about server-side differences in terms of Autocommit, as you do in some of your previous messages. It should be stressed that Autocommit is not a parameter of the session between Postgres and the SQL client, but rather it's a parameter of the session between the user and their SQL client. So when you're hypothesizing that a plpgsql block in a procedure would look at this parameter or change it temporarily (your points #2 and #5 in your analysis of p2's execution), you should see that it's impossible, because on the server-side, this parameter just does not exist. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Inserting into the blob
Igor Korot wrote: > It is not a problem in general, but just curious - is there a more > generic solution (to get the file from the client)? With psql: \lo_import /path/to/file It creates a large object with the contents of the file from the client file system and returns its unique ID, in the output and into the :LASTOID variable in psql. There's no equivalent for bytea contents. Bytea contents need to be either injected into the query as text, or passed separately as parameters, but psql does not provide helper methods for this, and it also lack binary support for variables. So it's easier to implement "upload bytea to server" in a script language than in psql. Alternatively, if you don't care about the contents being written twice, a file can be imported as a large object, copied as bytea into a row of the target table, and the large object purged immediately. In psql, a sequence like this should work: \lo_import /path/to/file \set tmp_oid :LASTOID insert into tablename ( bytea_col, [other columns] ) values ( lo_get(:tmp_oid), [other values] ) \lo_unlink :tmp_oid Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: delimeters psql /CSV
Paul Malm wrote: > C:\Users\an\bin>C:\Tmp\psql -h 10.211.43.22 -p 5432 -U postgres -d aatest -w > -c "\copy public.""Bayern"" FROM 'C:\Users\an\test\Bayern.csv' WITH > DELIMITER ';' CSV" > Error: extra data after expected last column > CONTEXT: COPY Bayern, row 1: > "48.456;11.8008333;Anglberg/Amper;Industrial > plant;722;220;220;2133;0;0;Undefined..." > > I have a table (Bayern) with all columns in the right format. There is a > column-value which has a '/' in the string (Anglberg/Amper). I think that > '/' is taken for a delimeter, since when I replace / with 'white space' it > works. There's no reason for '/' to be taken as a delimiter. Can you can share the table definition and the offending line as an attachment, so that someone can try to reproduce this with the exact same data? Also mention your PostgreSQL version and the shell the command is invoked from (cmd.exe, powershell, something else?) just in case it matters. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: pldbgapi error
Prakash Ramakrishnan wrote: > which code > will be support for postgresql 11 can you please share me the link or file. It appears indeed from the error messages that you're trying to compile an old version. The lastest version here: git://git.postgresql.org/git/pldebugger.git does compile successfully for me with PG11 (Debian 9). Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Upgrading locale issues
rihad wrote: > On 05/03/2019 05:35 PM, Daniel Verite wrote: > > For non-English text, I would recommend C.UTF-8 over "C" because of > > BTW, there's no C.UTF-8 inside pg_collation, and running select > pg_import_system_collations('pg_catalog') doesn't bring it in, at least > not on FreeBSD 11.2. Yes, aside from "C", locales are quite system-dependent, unfortunately. It looks like FreeBSD 13 does provide C.UTF-8: https://unix.stackexchange.com/questions/485073/how-to-backport-freebsd-13-current-c-utf-8-locale-to-11-2-release/485077#485077 Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Upgrading locale issues
rihad wrote: > Thanks, I'm a bit confused here. AFAIK indexes are used for at least two > things: for speed and for skipping the ORDER BY step (since btree > indexes are already sorted). Will such an "upgrade-immune" C.UTF-8 index > still work correctly for table lookups? If the lookup is based on a equality test or a pattern match with LIKE or a regexp, it makes no difference. But the locale makes a difference with inequality tests, such as < > or BETWEEN. Around version 9.1 and in the pre-ICU days, Robert Haas wrote this post that explained it pretty well, I think: http://rhaas.blogspot.com/2012/03/perils-of-collation-aware-comparisons.html Quote: If you happen to need the particular sorting behavior that collation-aware sorting and comparisons provide, then you may find this price worth paying, but I suspect there are a lot of people out there who are paying it more or less accidentally and don't really care very much about the underlying sorting behavior. If, for example, all of your queries are based on equality, and you don't use greater-than or less-than tests, then it doesn't matter what collation is in use. You might as well use "C" instead of whatever your local default may be, because it's faster. For non-English text, I would recommend C.UTF-8 over "C" because of its better handling of Unicode characters. For instance: =# select upper('été' collate "C"), upper('été' collate "C.UTF-8"); upper | upper ---+--- éTé | ÉTÉ The "price" of linguistic comparisons that Robert mentioned was about performance, but the troubles we have with the lack of long-term immutability of these locales are worth being added to that. > And can the existing en_US.UTF-8 fields' definition be altered in > place, without a dump+restore? Changing the COLLATE clause of a column with ALTER TABLE does implicitly rebuild an index on this column if there is one, A dump+restore is not needed, nor an explicit REINDEX. The dump+restore is needed in another scenario, where you would decide to change the LC_COLLATE and LC_CTYPE of the database, instead of doing it only for some columns. This scenario makes perfect sense if the locale of the database has been set implicitly and it uses linguistic sorts without really needing them ("accidentally" as said in the post). > en_US.UTF-8 is the default encoding+locale+collation, it > isn't set explicitly for any of our string columns. I assume there's > some "catch-all" ordering taking place even for the C locale, so there > won't be any bizarre things like b coming before a, or generally for any > language, the second letter of its alphabet coming before the first? 'b' < 'a' is never true for any locale AFAIK, it is 'B' < 'a' that is true for some locales such as C or C.UTF-8. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Upgrading locale issues
rihad wrote: > Thanks for the reply. Do you know what would a "decent" ICU collation be > to bind to a field's schema definition so it would mimic a UTF-8 > encoding for a multilingual column? Maybe und-x-icu? We aren't as much > concerned about their sortability in most cases, we just want indexes to > better handle future PG/ICU upgrades. But what does und(efined) even > mean with respect to collations? "undefined" in this context means unspecified language and unspecified country or region. It implies that no language-specific nor regional rule will be applied to compare strings. Using C.UTF-8 as the collation for text fields to index may be the best trade-off in your case. It should be immune to libc and ICU upgrades. With C.UTF-8, a string like 'BC' will sort before 'ab', and punctuation and accents will also sort differently than with a linguistic-aware collation. If your applications care about that, it can be fixed by simply adding COLLATE "default" to the ORDER BY clause of the queries that are meant to present data to users. COLLATE "default" means the collation of the database, which presumably would be something like "language_REGION.UTF-8" in your case. If you never specified it explicitly, it came from initdb which itself got it from the environment of the server. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
CVE-2019-9193 about COPY FROM/TO PROGRAM
Hi, I've noticed this post being currently shared on social media: https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/cve-2019-9193-authenticated-arbitrary-command-execution-on-postgresql-9-3/ The claim that COPY FROM PROGRAM warrants a CVE seems groundless because you need to be superuser in the first place to do that. Apparently these guys have not figured out that a superuser can also inject arbitrary code with CREATE EXTENSION or even CREATE FUNCTION since forever, or maybe that will be for a future post? The CVE itself has not been published, in the sense that it's not on https://cve.mitre.org, but the ID is reserved. I don't know if there are precedents of people claiming CVE entries on Postgres without seemingly reaching out to the community first. Should something be done proactively about that particular claim? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
RE: Oracke BLOB to Postgres BYTEA using ora2pg
Mihalidesová Jana wrote: > nipjd=> select distinct encode(serializable_value, 'escape') from > alf_node_properties_zaloha where serializable_value is not null; > > encode > > decode(E'aced00057e72002c6f72672e616c66726573636f2e736572766963652e636d722e76657273696f6e2e56657273696f6e54797065127872000e6a6176612e6c616e672e456e756d1278707400054d494e4f52', > 'hex') These contents are code, in the form of function calls, instead of data. Possibly if you ran these function calls they would output the original data, but that's a pretty weird form of storage. If the insert was done by Ora2Pg, you want to report this as a bug. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text file that contains comma and double quotes
wrote: > When I save that Excel as a tab delimited text file, I get this:rec_no > item1item2item3item4item5 > 1Denny'sorange juice"1,500 yen""""Dear John""""32"" > TV"(As seen when I opened that file with Notepad) This looks good. Fields are properly enclosed and double quotes in contents are doubled, as expected in CSV. > 5.while (($line = fgetcsv($fileRead, 0, "\t")) !== FALSE) { //0 means I can > read row whatever its length > 6.if($row == 1){ $row++; continue; } //skip header > 7.$line = implode(" ",$line). "\n"; > 8.$twoDarray[] = $line; > ... > 14.if (pg_copy_from($con, $tableName, $twoDarray) !== FALSE) { It goes wrong at line 7. pg_copy_from() expects lines in the COPY "text format" documented at https://www.postgresql.org/docs/current/sql-copy.html It implies that: - since your call to pg_copy_from() doesn't specify a delimiter it uses tab, not a space, so implode() must be passed a tab, not a space. - if there are backslashes in the contents they must be quoted by doubling them. - if there are newline or carriage return characters in the contents they must be replaced by \n and \r respectively, so as to not be confused with an end of record. - if there are tabs in the contents they must be replaced by \t. These replacements can all be done by a single strtr() call in php. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
RE: pg_dump out of memory for large table with LOB
Jean-Marc Lessard wrote: > Another area where LOB hurts is the storage. LOB are broken and stored in 2K > pieces. > Due to the block header, only three 2k pieces fit in an 8k block wasting 25% > of space (in fact pgstattuple reports ~ 20%). Yes. bytea stored as TOAST is sliced into pieces of 2000 bytes, versus 2048 bytes for large objects. And that makes a significant difference when packing these slices because 2000*4+page overhead+ 4*(row overhead) is just under the default size of 8192 bytes per page, whereas 2048*4+(page overhead)+4*(row overhead) is obviously a bit over 8192, since 2048*4=8192. If the data is compressible, the difference may be less obvious because the slices in pg_largeobject are compressed individually (as opposed to bytea that gets compressed as a whole), so more than 3 slices can fit in a page inside pg_largeobject The post-compression size can be known with pg_column_size(), versus octet_length() that gives the pre-compression size. > Would you recommend bytea over LOB considering that the max LOB size is well > bellow 1GB? > Are bytea preferable in terms of support by the community, performance, > feature, etc? For the storage and pg_dump issues, bytea seems clearly preferable in your case. As for the performance aspect, large objects are excellent because their API never requires a binary<->text conversion. This may be different with bytea. The C API provided by libpq allows to retrieve and send bytea in binary format, for instance through PQexecParams(), but most drivers implemented on top of libpq use only the text representation for all datatypes, because it's simpler for them. So you may want to check the difference in sending and retrieving your biggest binary objects with your particular app/language/framework stored in a bytea column versus large objects. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Trouble Upgrading Postgres
Adrian Klaver wrote: > > So there's no way it can deal with the contents over 500MB, and the > > ones just under that limit may also be problematic. > > To me that looks like a bug, putting data into a record you cannot get out. Strictly speaking, it could probably get out with COPY in binary format, but pg_dump doesn't use that. It's undoubtedly very annoying that a database can end up with non-pg_dump'able contents, but it's not an easy problem to solve. Some time ago, work was done to extend the 1GB limit but eventually it got scratched. The thread in [1] discusses many details of the problem and why the proposed solution were mostly a band aid. Basically, the specs of COPY and other internal aspects of Postgres are from the 32-bit era when putting the size of an entire CDROM in a single row/column was not anticipated as a valid use case. It's still a narrow use case today and applications that need to store big pieces of data like that should slice them in chunks, a bit like in pg_largeobject, except in much larger chunks, like 1MB. [1] pg_dump / copy bugs with "big lines" ? https://www.postgresql.org/message-id/1836813.YmyOrS99PX%40ronan.dunklau.fr Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Trouble Upgrading Postgres
Charles Martin wrote: > but the second one returned this: > > 0 "623140" > 1 "53" > 2 "12" > 3 "10" > 4 "1" > 5 "1" > 7 "1" > [null] "162" > > Not quite sure what that means, but if there is just a small number of > overly-large records, I might be able to delete them. If I can find them. The query was: SELECT octet_length(docfilecontents)/(1024*1024*100), count(*) FROM docfile GROUP BY octet_length(docfilecontents)/(1024*1024*100); The results above show that there is one document weighing over 700 MB (the first column being the multiple of 100MB), one between 500 and 600 MB, one between 400 MB and 500 MB, 10 between 300 and 400 MB, and so on. The hex expansion performed by COPY must allocate twice that size, plus the rest of the row, and if that resulting size is above 1GB, it will error out with the message you mentioned upthread: ERROR: invalid memory alloc request size . So there's no way it can deal with the contents over 500MB, and the ones just under that limit may also be problematic. A quick and dirty way of getting rid of these contents would be to nullify them. For instance, nullify anything over 400MB: UPDATE docfile SET docfilecontents=NULL WHERE octet_length(docfilecontents) > 1024*1024*400; Or a cleaner solution would be to delete them with the application if that's possible. You may turn the above query into a SELECT that retrieve the fields of interest (avoid SELECT * because of the huge column). Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Trouble Upgrading Postgres
Charles Martin wrote: > SELECT max(length(docfilecontents::text)) FROM docfile; > and after a very long time, got: > ERROR: invalid memory alloc request size 1636085512 SQL state: XX000 It would mean that at least one row has a "docfilecontents" close to 0.5GB in size. Or that the size fields in certain rows are corrupted, although that's less plausible if you have no reason to suspect hardware errors. Does the following query work: SELECT max(octet_length(docfilecontents)) FROM docfile; or maybe a histogram by size in hundred of megabytes: SELECT octet_length(docfilecontents)/(1024*1024*100), count(*) FROM docfile GROUP BY octet_length(docfilecontents)/(1024*1024*100); Note that the error message above does not say that there's not enough free memory, it says that it won't even try to allocate that much, because 1636085512 is over the "varlena limit" of 1GB. AFAICS I'm afraid that this table as it is now cannot be exported by pg_dump, even if you had enough free memory, because any individual row in COPY cannot exceed 1GB in text format. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Trouble Upgrading Postgres
Charles Martin wrote: > >So where is the server located relative to the pg_dump client? > >On the same machine? > >If so is it a virtual machine e.g AWS? > >Across a local or remote network? > > > I gave the command in a terminal session after SSHing to the server from > the same network. It is not a virtual machine. That means that pg_dump runs on the same machine as the server. It's plausible that, with only 4GB of RAM, the table that fails to dump has some very large rows that can't be allocated, especially since both the backend and pg_dump need to have it simultaneously in memory. > pg_dump: The command was: COPY public.docfile (docfile_pkey, > docfileoriginalname, ordernumber, versionnum, docfilecontents, > docfilepath, docfileextension, enddatetime, endby, editnum, insby, > insdatetime, modby, moddatetime, active, doc_fkey) TO stdout; The "docfilecontents" column suggests that it might contain large contents. If its type is bytea, it's going to be expanded to twice its size to build the hex representation. You may get a sense on how big is the biggest row expressed as text with this query: SELECT max(length(contents.*::text)) FROM public.docfile; If it's big enough that it might cause the OOM issue, try to run pg_dump remotely through an SSH tunnel [1], which you can already do in terms of network permissions since you log in with SSH, so pg_dump itself does not use any memory on the server. Also, if the machine doesn't have swap space, it might be that just adding a few GB's of swap would make the operation succeed. [1] https://www.postgresql.org/docs/current/static/ssh-tunnels.html Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: PostgreSQl, PHP and IIS
Mark Williams wrote: > Can anyone please shed any light on what more I need to do? Overall, what you did looks good. I've never used IIS, but when faced with similar problems with Apache, ProcessExplorer [1] has been quite effective to figure out which dependent DDLs were missing, where PHP was searching for them, and the precise errors it encountered. [1] https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: pg_sample
Naveen Dabas wrote: > Creating sample schema sampledb1 > DBD::Pg::db do failed: ERROR: cannot execute CREATE SCHEMA in a read-only > transaction at ./pg_sample line 296. Maybe you ran this on a hot standby? It can't work because this program needs to write data into the database, which is not allowed outside of the primary node. Or you may have default_transaction_read_only set to true for the particular db/user of the connection. https://www.postgresql.org/docs/current/static/runtime-config-client.html#GUC-DEFAULT-TRANSACTION-READ-ONLY Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Linker errors while creating a PostgreSQL C extension function.
TalGloz wrote: > Do I have to replace my -shared in the link command with -fPIC? No, but -fPIC should go into your CXXFLAGS. The pgxs makefile handles CFLAGS, but as it doesn't do C++, you're on your own for CXXFLAGS. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Linker errors while creating a PostgreSQL C extension function.
TalGloz wrote: > If yes then the -lseal is added with the $(LDFLAGS) at the end of the > command. But it doesn't happen because LDFLAGS is overriden by the makefile included just after you set it. The relevant part copy-pasted from your mail: LDFLAGS = -L$(INCLUDE_SEAL_LIB) -llibseal.a -lpthread include $(PGXS) seal_diff_cpp.so: seal_diff_cpp.o $(CXX) -Wl,--no-undefined -shared -o seal_diff_cpp.so seal_diff_cpp.o $(LDFLAGS) As as solution, you could leave LDFLAGS alone and use a different variable, or include $(PGXS) first and *append* your options to LDFLAGS if your prefer. For instance (replacing -llibseal.a with -lseal while at it): SEAL_LDFLAGS = -L$(INCLUDE_SEAL_LIB) -lseal -lpthread include $(PGXS) seal_diff_cpp.so: seal_diff_cpp.o $(CXX) -Wl,--no-undefined -shared -o seal_diff_cpp.so seal_diff_cpp.o $(LDFLAGS) $(SEAL_LDFLAGS) Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Postgres - search for value throughout many tables?
Adrien NAYRAT wrote: > On 08/08/2018 04:09 PM, czezz wrote: > > Hi everyone, > > I want to aks if anyone knows is there a way to search for specific > > "value" throughout list of tables OR all tables in databse? > > > > Cheers, > > czezz > > Hello, > > Maybe this article by Daniel could help you : > https://blog-postgresql.verite.pro/2017/06/06/global-search.html > > Unfortunately it is not translated, you have to play with a translate tool. You may also get the code and README from here: https://github.com/dverite/postgresql-functions/tree/master/global_search Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: pg_upgrade and wraparound
Alexander Shutyaev wrote: > Is there any hope the issue with pg_upgrade can be resolved? If not, > could you give me some hints as to how can I decrease time needed > for pg_dumpall | psql? Not sure about the larger problem, but for the part about having more than 1 million large objects that get committed individually, setting fsync=off for the reload phase is likely to help a lot. Just don't forget to turn it on again when it's done. See https://blog.2ndquadrant.com/postgresql-fsync-off-warning-in-config-file/ Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: pg_upgrade and wraparound
Andres Freund wrote: > I'm not entirely clear why pg_restore appears to use a separate > transaction for each large object, surely exascerbating the problem. To make sure that per-object locks don't fill up the shared lock table? There might be hundreds of thousands of large objects. If it had to restore N objects per transaction, would it know how to compute N that is large enough to be effective and small enough not to exhaust the shared table? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Problem compiling PostgreSQL.
Paul Linehan wrote: > Now, I've installed the icu libraries using the classic ./configure, make > and sudo make install. So it got installed under /usr/local > collationcmds.c:(.text+0xe36): undefined reference to `uloc_getAvailable_61' > collationcmds.c:(.text+0xe5b): undefined reference to > `uloc_toLanguageTag_61' 61 is ICU's major version number. You have two ICU versions installed, one from Fedora, with its icu*.pc config files located in usr/lib64/pkgconfig, the other (presumably v61, as it's the latest) in /usr/local/lib/pkgconfig/ Without specifying which one should be used, it looks like Postgres get them mixed between compilation and link. To use self-compiled ICU, I've been having success configuring postgres with: PKG_CONFIG_PATH=/path/to/icu/lib/pkgconfig \ ./configure --with-icu [other flags] I tend to install ICUs versions into their own directories rather than /usr/local, and use Debian rather than Fedora, but you might try PKG_CONFIG_PATH=/usr/local/lib/pkgconfig Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Adding AVG to a JOIN
Alexander Farber wrote: > Here is such a query for the best player > > # SELECT AVG(score) FROM words_moves WHERE uid = 1201; > avg > - > 18.4803525523319868 > > However I am not sure, how to "marry" the 2 queries? > > I have tried to add words_moves through another JOIN, but that does not > work: You may use a correlated subquery in the SELECT clause, like this: SELECT u.elo, u.uid, (SELECT AVG(score) FROM words_moves WHERE uid=u.uid), s.given, s.photo Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Conflict between JSON_AGG and COPY
Đỗ Ngọc Trí Cường wrote: > I want to export it to a file in JSON format so I run the query as below: > COPY (SELECT...) TO '/home/postgres/test1.json' COPY TO applies its own format rules on top of the data, which include among other things, quoting the newline characters. What you seem to expect is to copy a single-row, single-column piece of data verbatim to the output file, but COPY does not offer that. Such an option has already been discussed among developers, for instance in this thread as a 'RAW' format: https://www.postgresql.org/message-id/CAFj8pRAfer%2Bip3JCMTnvzra2QK7W9hk0hw1YqE8%2B%2BPZA1OqSJw%40mail.gmail.com but eventually it was not implemented. The unformatted output can be obtained client-side without COPY: psql -Atc "SELECT row_to_json(...)" > /path/to/file If you really need it server-side, a function that can write a result to a file is required. I don't think it exists as such out of the box, but you may either: - write your own function in any untrusted language to do just that (open file, write the piece of text into it, close it). - or see pg_file_write() from the adminpack contrib module https://www.postgresql.org/docs/current/static/adminpack.html With that, you may call, as a superuser: select pg_file_write('path/to/file.json', (select row_to_json(...))::text, false); But note that the path must be relative to the PGDATA directory. - or export the data as an intermediary large object at the cost of some data churn in the large objects. And in that case, the path is not constrained to postgres data directory. do $$ declare id oid; j json; begin j := (select row_to_json(t) from ); id := lo_from_bytea(0, convert_to(j::text, 'UTF-8')); perform lo_export(id, '/path/to/file.json'); perform lo_unlink(id); end; $$ language plpgsql; For all these server-side methods, you need to be superuser, just like for COPY TO file anyway. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Must re-connect to see tables
Blake McBride wrote: > data.sql is an untouched dump of a database from PostgreSQL 9.5.12. > > The import seems to have worked. I just need another \c after the \i. An effect of reconnecting is to reset the search_path to what it was before including your dump file. You're most certainly affected by the search_path-related changes in pg_dump done in the latest versions following CVE-2018-1058 See https://www.postgresql.org/support/security/ or https://www.postgresql.org/about/news/1834/ Manually issuing RESET search_path; after \i data.sql should get "public" back in the search_path without reconnecting and after that \dt would work as usual. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Dynamic PL/pgSQL select query: value association propblem
Thiemo Kellner, NHC Barhufpflege wrote: > > Why you don't create query like > > > > EXECUTE 'SELECT xxx FROM TAB WHERE A = $1.x AND B = $1.y' USING NEW; > > I shall try. This would be the direct way, but I doubt the placeholder > $1 can be a record. It could be written without refering to any individual column: IF EXISTS (select 1 from tablename where tablename.* is not distinct from NEW) THEN -- do something END IF; But since the select cannot see not-yet-committed changes from other sessions, such a trigger cannot reliably detect duplicates, unless you make sure that there are no concurrent writes to the table. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: execute block like Firebird does
PegoraroF10 wrote: > Another approach to solve my problem would be a function that receives a > dynamic SQL, runs it and returns a XML or JSON and on client side I convert > that XML back to a recordset. Is that possible ? Yet another tool that can be handy to transfer polymorphic results is a cursor through the plpgsql REFCURSORs: https://www.postgresql.org/docs/10/static/plpgsql-cursors.html Here's an actual example with a DO block : BEGIN; DO $$ DECLARE c1 refcursor := 'cursor1'; c2 refcursor := 'cursor2'; BEGIN OPEN c1 FOR select 'result #1 column 1', 'result #1 column 2'; OPEN c2 FOR select 'result #2 column 1', 'result #2 column 2'; END; $$ LANGUAGE plpgsql; FETCH cursor1; FETCH cursor2; COMMIT; The interface is a bit weird because the value of the refcursor variable is the name of the underlying SQL cursor object. The name can also be auto-generated by postgres; the above code uses fixed names instead. Anyway that cursor, once instanciated in the DO block, can be fetched from with FETCH statements initiated client-side or by other server-side code. The above code will retrieve two independant resultsets: postgres=# FETCH cursor1; ?column? | ?column? + result #1 column 1 | result #1 column 2 (1 row) postgres=# FETCH cursor2; ?column? | ?column? + result #2 column 1 | result #2 column 2 (1 row) These cursors disappear at transaction end, or they can be explicitly closed with CLOSE statements. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: To all who wish to unsubscribe
Magnus Hagander wrote: > So do you have any suggestions for actually fixing that? Given that we have > more lists to migrate, if you can figure out a way to make those changes > without peoples filters not matching, we'd be happy to hear it.. I guess it's a bit late at this point, but in theory keeping the same List-Id headers as before may help with how many people have to change their filters. Using lists.postgresql.org as the new domain was explained in the announce, but it's not clear why List-Id had to follow the other List-* fields. Unless it was the reason mentioned in the last sentence below. Quoting RFC-2919: 4. Persistence of List Identifiers Although the list identifier MAY be changed by the mailing list administrator this is not desirable. (Note that there is no disadvantage to changing the description portion of the List-Id header.) A MUA may not recognize the change to the list identifier because the MUA SHOULD treat a different list identifier as a different list. As such the mailing list administrator SHOULD avoid changing the list identifier even when the host serving the list changes. On the other hand, transitioning from an informal unmanaged-list-id-namespace to a domain namespace is an acceptable reason to change the list identifier. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite