Re: Help. The database was created using collation version 2.17, but the operating system provides version 2.34.

2024-06-24 Thread Daniel Verite
Dmitry O Litvintsev wrote:

> Just want to make clear (sorry I am slow on uptake). I should first
> REINDEX and then ALTER DATABASE xxx REFRESH COLLATION VERSION, or
> first ALTER and then REINDEX or does the order of these action
> matter at all?

The order does not matter. The ALTER DATABASE command will simply
update the pg_database.datcollversion field with the current version
of libc. That will stop the warning being issued, but it doesn't have
any other concrete effect.

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Help. The database was created using collation version 2.17, but the operating system provides version 2.34.

2024-06-20 Thread Daniel Verite
Dmitry O Litvintsev wrote:

> WARNING:  database "xxx" has a collation version mismatch
> DETAIL:  The database was created using collation version 2.17, but the
> operating system provides version 2.34.
> HINT:  Rebuild all objects in this database that use the default collation
> and run ALTER DATABASE xxx REFRESH COLLATION VERSION, or build PostgreSQL
> with the right library version.

This upgrade comprises the major change in GNU libc 2.28,
so indeed text indexes created by 2.17 are very likely unsafe to use
on your new server.
See https://wiki.postgresql.org/wiki/Locale_data_changes

>  REINDEX database xxx 
> ALTER DATABASE xxx REFRESH COLLATION VERSION
> But this defeats the whole idea of having short downtime because REINDEX
> will take forever. 

The indexes that don't involve collatable types (text,varchar), and those
that use the C collation don't need to be reindexed.
Maybe you can reduce significantly the downtime by including only
the ones that matter.

The wiki page gives the query to obtain the list of affected indexes:

SELECT DISTINCT indrelid::regclass::text, indexrelid::regclass::text,
collname, pg_get_indexdef(indexrelid) 
FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index,
generate_subscripts(indcollation, 1) g(i)) s 
  JOIN pg_collation c ON coll=c.oid
WHERE collprovider IN ('d', 'c') AND collname NOT IN ('C', 'POSIX');

> I do not recall having similar issue when going from RH6 to RH7.

This warning was added relatively recently, in Postgres 15 (october 2022).


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Time zone offset in to_char()

2024-01-11 Thread Daniel Verite
Alban Hertroijs wrote:

> 1). The first function has as a drawback that it changes the time zone for
> the entire transaction (not sufficiently isolated to my tastes)

But if you add in the function declaration
   SET timezone TO  'Europe/Amsterdam' 
like in your 2nd function, or simply
  SET timezone FROM CURRENT
doesn't that solve this problem?

Because as the doc says

  If a SET clause is attached to a function, then the effects of a SET
  LOCAL command executed inside the function for the same variable are
  restricted to the function: the configuration parameter's prior
  value is still restored at function exit

The actual value to which the timezone is set through this statement
does not matter, as the first instruction in the function overwrites
it:
   perform set_config('timezone', tz_, true /* local */); 

The point is that the prior value of timezone being restored
automatically at function exit, the temporary setting will not
leak out of the function.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Import csv to temp table

2024-01-02 Thread Daniel Verite
arun chirappurath wrote:

> Do we have any scripts that create a temp table with column names
> from the first row of csv files?

csvkit [1] does that.

[1] https://csvkit.readthedocs.io/en/latest/


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: how can I fix my accent issues?

2023-12-13 Thread Daniel Verite
Igniris Valdivia Baez wrote:

> 3. After the revision the data is loaded to the destiny database and
> here is were I believe the issue is, because the data is reviewed in
> Windows and somehow Pentaho is not understanding correctly the
> interaction between both operating systems.

On Windows, a system in spanish would plausibly use
https://en.wikipedia.org/wiki/Windows-1252
as the default codepage.
On Unix, it might use UTF-8, with a locale like maybe es_CU.UTF-8.

Now if a certain component of your data pipeline assumes that
the input data is in the default encoding of the system, and
the input data appears to be always encoded with Windows-1252,
then only the version running on Windows will have it right.
The one that runs on Unix might translate the bytes that
do not meet its encoding expectations into the U+FFFD
code point.
At least that's a plausible explanation for the result you're seeing
in the Postgres database.

A robust solution is to not use defaults for encodings and explicitly
declare the encoding of every input throughout the data pipeline.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: how can I fix my accent issues?

2023-12-12 Thread Daniel Verite
Igniris Valdivia Baez wrote:

> hello, thank you for answering, it's not a typo, in the attachments
> you can see that this is actually my collation, algo a pic of the
> problem for more clarification,

This character is meant to replace undisplayable characters:

From https://en.wikipedia.org/wiki/Specials_(Unicode_block):

  U+FFFD � REPLACEMENT CHARACTER used to replace an unknown,
  unrecognised, or unrepresentable character

It would useful to know whether:

- this code point U+FFFD is in the database contents in places
where accented characters should be. In this case the SQL client is
just faithfully displaying it and the problem is not on its side.

- or whether the database contains the accented characters normally
encoded in UTF8. In this case there's a configuration mismatch on the
SQL client side when reading.

To break down a string into code points to examine it, a query like
the following can be used, where you replace SELECT 'somefield'
with a query that selects a suspicious string from your actual table:

WITH string(x) AS (
   SELECT 'somefield'
)
SELECT
  c,
  to_hex(ascii(c)) AS codepoint
FROM
  string CROSS JOIN LATERAL regexp_split_to_table(x, '') AS c
;


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Feature request for INITCAP() function

2023-12-01 Thread Daniel Verite
Jeff Gerbracht wrote:

> It would be great if there was a way to set exceptions to the 'word'
> delimiter list used in the INITCAP() function.For example, I have
> hyphenated words like blue-green or possessives and contractions like
> don't and cat's tail
> These become Blue-Green, Don'T and Cat'S Tail.

An ICU collation may produce better results for this:
 
select initcap('blue-green don''t cat''s tail' COLLATE "en-x-icu");
   initcap   
-
 Blue-Green Don't Cat's Tail

With hyphens, it's not clear what the results should be.
Looking at the IMDb movies database [1] for instance, titles have
a mix of upper and lower case following the hyphen, even within the
same english title:

Hell-to-Pay Austin
The Co-respondent
A Pit-boy's Romance
A Cowboy's Mother-in-Law
Our Parents-In-Law
The Good-for-Nothing
When the Fire-Bells Rang
The post-millennial wave
...

[1] https://datasets.imdbws.com/


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Thanks! Re: Who adds the "start transaction" and "commit" to the intended SQL statement in "autocommit" mode?

2023-02-20 Thread Daniel Verite
Bryn Llewellyn wrote:

> 2023-02-20 12:42:44.993 PST [2540504] d0$u0@d0 LOG:  0: statement:
> insert into s.t(v) values(17); insert into s.t(v) values(42);
> 2023-02-20 12:42:44.993 PST [2540504] d0$u0@d0 LOCATION:  exec_simple_query,
> postgres.c:971
> 
> It seems a bit odd that psql has no syntax to ask for this in its
> interactive mode. 

Backslash-semicolon is the syntax. Quoted from the doc:



\;

Backslash-semicolon is not a meta-command in the same way as the
preceding commands; rather, it simply causes a semicolon to be added to the
query buffer without any further processing.

Normally, psql will dispatch an SQL command to the server as soon as it
reaches the command-ending semicolon, even if more input remains on the
current line. Thus for example entering

select 1; select 2; select 3;

will result in the three SQL commands being individually sent to the
server, with each one's results being displayed before continuing to the next
command. However, a semicolon entered as \; will not trigger command
processing, so that the command before it and the one after are effectively
combined and sent to the server in one request. So for example

select 1\; select 2\; select 3;

results in sending the three SQL commands to the server in a single
request, when the non-backslashed semicolon is reached. The server executes
such a request as a single transaction, unless there are explicit
BEGIN/COMMIT commands included in the string to divide it into multiple
transactions. (See Section 55.2.2.1 for more details about how the server
handles multi-query strings.)




Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Q: error on updating collation version information

2022-12-05 Thread Daniel Verite
Karsten Hilbert wrote:

> The database encoding is UTF8. That br_FR@euro.LATIN9 had
> _not_ been added manually. It is also not actively used in my
> database(s).

br_FR@euro.LATIN9 cannot be used actively in an UTF-8 database
because it's for a different encoding than the database.

It was probably available in the OS at initdb time. Every available
locale (see locale -a) gets imported into the template databases,
and then into the other databases, since CREATE DATABASE copies a
template without filtering out the locales that are incompatible
with the target database.

There's no need to upgrade anything in the OS to get the
ALTER COLLATION... REFRESH error you mention.
It's sufficient to have a mix of collations for different encodings
and try to refresh collations whose encoding are not compatible
with the current database.


> What is the canonical advice on the way forward here ?  Is
> the _suggested_ solution to delete the collation or am I
> missing to see the "proper" approach to fixing it ?

ISTM that dropping that collation from any non-LATIN9 database
is the more sensible action.

 
Maybe it could be discussed as a possible improvement to have
ALTER COLLATION... REFRESH ignore the database encoding and still
refresh the version number.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Q: fixing collation version mismatches

2022-11-14 Thread Daniel Verite
Karsten Hilbert wrote:

> Which is why my question still stands: does the above
> three-strikes operation safely take care of any collation
> issues that may currently exist in a database ?

For the indexes, yes, but theorically, all constraints involving collatable
types need a recheck.

For foreign key constraints with non-deterministic collations, there
might be equality tests that pass with an older Unicode version and fail
with a newer Unicode version.

For check constraints as well, checks applied to strings with recent
Unicode characters can give different results after an upgrade.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Daniel Verite
Dominique Devienne wrote:

> PostgreSQL bytea is much better and simpler, except limited to 1GB...
> Bytea also has no direct random access, except via substr[ing], but
> how efficient and "random access" is that?

Bytea contents are compressed before being sliced (in chunks of
TOAST_MAX_CHUNK_SIZE bytes, typically it's 2000 IIRC), so it's not
possible to access a piece of data without decompressing the contents
before it.

By contrast large objects are sliced before compression, so the
performance of random access is likely to be completely different.

> Here are the main requirement I need to fulfil:
> 1) store literally millions of rows, 1 "blob" per row. (scientific data).
> 2) most "blobs" are smallish, below a few KBs. bytea's perfect for that.
> 3) yet many blobs are on the dozens of MBs. bytea's still OK.
> 4) but some blobs exceed the 1GB byte limit. Found at least a dozen
> just in our test data, so clients will have them.
> 5) for accessing larger blobs, the API I must implement accesses
> contiguous chunks of the blobs. Thus I need random access.

In your case I would probably opt for bytea (as opposed to large
objects), and slicing the blobs in the application in chunks of a
fixed size much larger than what TOAST does (for instance, somewhere
between 128 kB and 32 MB).

That is, having a table like:

create table blobs (
  blob_id some_type,
  chunk_no int,  /* 0->N */
  chunk bytea
);

It's not as simple as using a single bytea field or large objects,
but overall it avoids the management difficulties of both large
objects and very large contents in bytea columns.
Random access is achieved by skipping the chunks before the
requested piece of data.

If the app is able to read/write the chunks in binary mode, its queries
should perform as well as the large objects functions.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: How to store "blobs" efficiently for small and large sizes, with random access

2022-10-19 Thread Daniel Verite
Dominique Devienne wrote:

> the fact the lo table is unique for the whole database would allow
> users to see blobs from any schema, as I understand it.

Direct access to pg_largeobject is only possible for superusers.
If lo_compat_privileges is on, any user can read any large
object with the lo* functions.
If it's off, they can read a large object only if they're the owner
or they have been granted permissions with

GRANT { { SELECT | UPDATE } [, ...] | ALL [ PRIVILEGES ] }
ON LARGE OBJECT loid [, ...]
TO role_specification [, ...] [ WITH GRANT OPTION ]
[ GRANTED BY role_specification ]

Each large object has its own set of permissions. This is a significant
difference with bytea, since every creation of a new large object
may need to be followed by GRANT statements.
Also if the roles and the access policies are changed in the
lifetime of the app, that might imply massive REVOKE/GRANT
statements to apply to existing objects.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Number of updated rows with LibPQ

2022-10-14 Thread Daniel Verite
Laurenz Albe wrote:

> > Or is there a libpq API on PGresult that would allow to get the type
> > of statement the result is from?
> 
> The command tag is not what you look at.

Yet that's what psql does. from PrintQueryResult():

/* if it's INSERT/UPDATE/DELETE RETURNING, also print status */
if (last || pset.show_all_results)
{
cmdstatus = PQcmdStatus(result);
if (strncmp(cmdstatus, "INSERT", 6) == 0 ||
strncmp(cmdstatus, "UPDATE", 6) == 0 ||
strncmp(cmdstatus, "DELETE", 6) == 0)
PrintQueryStatus(result, printStatusFout);
}



> You simply check the result from an INSERT statement.  If it is
> PGRES_TUPLES_OK, it must have been INSERT ... RETRUNING.

Or a SELECT, or an UPDATE RETURNING, or a DELETE RETURNING.
The code still need to look at PQcmdStatus() to learn which kind it is,
if it needs to know this.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: ICU is not supported in this build. install from source code.

2022-08-03 Thread Daniel Verite
jian he wrote:

> ./configure --with-perl --with-python  --with-icu
> ICU_CFLAGS='-I/usr/include/unicode'  ICU_LIBS='-L/usr/lib/icu'
> --enable-debug --with-pgport=5440

The location with -L is not sufficient in ICU_LIBS.
The list of libraries to link with should be put as well.
For instance:
ICU_LIBS="-L/usr/lib/icu -licui18n -licuuc -licudata"

When using the version of ICU that ships with the distribution (libicu-dev
package on Ubuntu), adding only "--with-icu" should be enough.
The locations of includes and libraries will be automatically found.
I'm not sure that's your case though. On one hand, /usr/include/unicode is
where the package puts the ICU includes, but on the other hand
it doesn't create any /usr/lib/icu directory. 
And Ubuntu 22.04 LTS brings ICU 70, not ICU 71.

If you need a self-built version of ICU, it might be simpler to use pkgconfig
as an alternative to ICU_CFLAGS/ICU_LIBS, for instance:

./configure --with-icu PKG_CONFIG_PATH=/path/to/icu/lib/pkgconfig

where /path/to/icu was passed as the prefix to ICU4C configure
invocation. It's recommended to install custom builds to specific
locations instead of overwriting the libraries that are part of
the distribution (otherwise it becomes unmaintainable)


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Differences in Escaped bytea's when creating a plain pg_dump

2022-06-27 Thread Daniel Verite
WR wrote:

> First run worked.
> Second run worked.
> Then I changed to SET standard_conforming_strings = off;
> Third run worked.
> Fourth run throw the error
> Then I changed back to  SET standard_conforming_strings = on;
> Fifth run throw the error too.
> And only adding E and second backslash helped.

This kind of unstable behavior can be seen if the SET may not be
executed by the same session (same connection to the server) as the
subsequent queries.
SET affects only the session it's being run in.

For instance a connection pooler configured in statement mode may
produce that behavior. The solution in the case of a connection pooler
is to group related statements into a transaction.

Maybe pgAdmin has a pooler like that, but if you're issuing the statements
in the same SQL window, I would find it quite surprising that it doesn't
run them by the same session.
Or maybe you're mixing queries from different SQL windows that each
have their own connection, and in some cases you do the SET
in a window and the INSERT in another window.
Or maybe it's a pgAdmin bug.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Differences in Escaped bytea's when creating a plain pg_dump

2022-06-23 Thread Daniel Verite
WR wrote:

> But one question is left.
> 
> When I read this valid data into a c++ std::string (and I checked that 
> the single backslashes are still there). Why can't I put this 
> SQL-command to a pqxx-transaction and execute it. It looks like the 
> pqxx-transaction unescapes the bytea-string and then it finds the 0x00 
> bytes, which are not allowed in text-strings but should be in bytea-strings.

You may check your server logs. They would have the error messages
with the queries as received by the server.

Note that passing '\000'::bytea with a single backslash can be
incorrect if standard_conforming_strings if set to off (not
recommended since version 8.1).
In that case the backslashes need to be doubled.

Example:

postgres=# set standard_conforming_strings to off;
SET

postgres=# set escape_string_warning to off;
SET

postgres=# select '\000'::bytea;
ERROR:  invalid byte sequence for encoding "UTF8": 0x00

postgres=# select '\\000'::bytea;
 bytea 
---
 \x00
(1 row)


There might also be a problem with how the string is being fed with
the C++ code, but you'd need to show us the code to get feedback
on it.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: gawk extension linux missing stuff in package manager fedora 36

2022-06-08 Thread Daniel Verite
Jim McNamara wrote:

> The package doesn't contain the sql or control file in
> /usr/share/pgsql/extensions. It did install .so files though. I ran sudo
> dnf repoquery -l package name and looked at the files.

It's probably this library:
http://gawkextlib.sourceforge.net/pgsql/gawk-pgsql.html

Quote:
  This API can be used by either invoking gawk with a command-line
  argument of -l pgsql or by inserting @load "pgsql" in your script.

It's not a PostgreSQL server-side extension. It's a client-side
gawk plugin to query a postgres database from inside a gawk
script.



Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: link that explain ICU_LOCALE locale concept about local variant code and keywords.

2022-05-25 Thread Daniel Verite
jian he wrote:

> So which website can I get the info about the "kf" and "u".

https://unicode.org/reports/tr35/#u_Extension

https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings



Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: COPY TO STDOUT WITH (FORMAT CSV, HEADER), and embedded newlines

2022-03-11 Thread Daniel Verite
Dominique Devienne wrote:


> so I can easily do that "\n" encoding myself, as a post-processing on
> the buffer I get back.

Alternatively, it might be easier to use the default TEXT format of
COPY rather than CSV, as the TEXT format already produces \n for
line feeds, along with  half a dozen other special backslashes sequences.
See https://www.postgresql.org/docs/current/sql-copy.html


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: COPY TO STDOUT WITH (FORMAT CSV, HEADER), and embedded newlines

2022-03-11 Thread Daniel Verite
Dominique Devienne wrote:

> These values are 'normal'. I'm not use to CSV, but I suppose
> such newlines
> must be encoded, perhaps as \n, since AFAIK CSV needs to be 1 line per row,
> no?

No, but such fields must be enclosed by double quotes, as documented
in RFC 4180 https://datatracker.ietf.org/doc/html/rfc4180

Consider this output:

psql> COPY (values (1, E'ab\ncd'), (2,'efgh')) TO STDOUT CSV;
1,"ab
cd"
2,efgh

That's 2 records on 3 lines.
If you feed this to a parser and it chokes on it, it means that it's
not a valid CSV parser.

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: varchar::bytea fails when varchar contains backslash

2022-02-21 Thread Daniel Verite
Matthias Apitz wrote:

> I wanted to look into a column of type varchar converting the content
> with ::bytea to a hex string, but this fails when the column contains a
> backslash:

Yes, casting from text to bytea cannot be used for that.
The convert_to() function must be used instead.

From the doc at
https://www.postgresql.org/docs/current/functions-binarystring.html

convert_to ( string text, dest_encoding name ) → bytea

Converts a text string (in the database encoding) to a binary string encoded
in encoding dest_encoding (see Section 24.3.4 for available conversions).

convert_to('some_text', 'UTF8') → \x736f6d655f74657874



Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Can we go beyond the standard to make Postgres radically better?

2022-02-11 Thread Daniel Verite
Peter J. Holzer wrote:

> > My use case for such a feature are tables which contain one column (or a
> > small number of columns) which you usually don't want to select: A bytea
> > column or a very wide text column. In a program I don't mind (in fact I
> > prefer) listing all the columns explicitely, but exploring a database
> > interactively with psql typing lots of column names is tedious
> > (especially since autocomplete doesn't work here).
> 
> Forgot to add: I think that the syntax would have to be more explicit.
> It's too easy to mix up
>SELECT * - b.a_id FROM ...
> and
>SELECT *, - b.a_id FROM ...
> 
> Maybe
>SELECT * EXCEPT b.a_id FROM ...

The solution to this by the SQL standard might be that it can be done with
a Polymorphic Table Function, introduced in SQL:2016.

https://webstore.iec.ch/preview/info_isoiec19075-7%7Bed1.0%7Den.pdf

A practical example with the Oracle implementation can be seen here:
https://blogs.oracle.com/sql/post/select-star-except-queries-in-oracle-database

Excluding all columns of a given type from a relation is also possible,
with both the relation and the type as parameters of the PTF.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Force re-compression with lz4

2021-10-17 Thread Daniel Verite
Florents Tselai wrote:

> I have a table storing mostly text data (40M+ rows) that has
> pg_total_relation_size ~670GB.
> I’ve just upgraded to postgres 14 and I’m now eager to try the new LZ4
> compression.

You could start experimenting with data samples rather than the
full contents.

FWIW, in my case I've found that the compression ratio of lz4 was only
marginally better than pglz (like 2% on text).
As for decompression time, it doesn't seem to differ significantly
from pglz, so overall, recompressing existing data did not seem
worth the trouble.

However lz4 appears to be much faster to compress than pglz, so its
benefit is clear in terms of CPU usage for future insertions.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: JWT decoder

2021-08-11 Thread Daniel Verite
Masih Tavassoli wrote:

> But there doesn't seem to be a way doing it in postgres.
> Has anyone got any suggesions?

RFC 7519 says:

   A JWT is represented as a sequence of URL-safe parts separated by
   period ('.') characters.  Each part contains a base64url-encoded
   value.  The number of parts in the JWT is dependent upon the
   representation of the resulting JWS using the JWS Compact
   Serialization or JWE using the JWE Compact Serialization.

base64url is similar to base64 except that the two characters
+ and / are replaced by - and _

Postgres provides decode(..., 'base64') but it's stricter than the
Oracle version showed in your sample code (which seems to ignore the
dot character that is illegal in base64 whereas Postgres would reject
it).

The JWT may be decoded with built-in Postgres functions by
splitting the dot-separated parts with regexp_split_to_table(),
converting them from base64url into binary, then into UTF-8,
and then the results could be cast into the json type if needed.

So the SQL code could be:

create function decode_base64url(text) returns bytea as $$
  select decode(
rpad(translate($1, '-_', '+/')   -- pad to the next multiple of 4 bytes
 ,4*((length($1)+3)/4)
 ,'=')
,'base64');
$$ language sql strict immutable;

with parts(x,n) as (
 select * from regexp_split_to_table('', '\.')
with ordinality
)
select n, convert_from(decode_base64url(x), 'utf-8')
from parts where n in (1,2);

"n" in the query is the part number, 1 for the header, 2 for the
payload, 3 for the signature which is ignored here.



Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: unicode match normal forms

2021-05-17 Thread Daniel Verite
Hamann W wrote:

> in unicode letter ä exists in two versions - linux and windows use a
> composite whereas macos prefers
> the decomposed form. Is there any way to make a semi-exact match that
> accepts both variants?

Aside from normalizing the strings into the same normal form
before comparing, non-deterministic ICU collations will recognize them as
identical (they're "canonically equivalent" in Unicode terms)

For instance,

CREATE COLLATION nd (
   provider = 'icu',
   locale='', 
   deterministic = false
);

SELECT 
 nfc_form, 
 nfd_form,
 nfc_form = nfd_form COLLATE nd AS equal1,
 nfc_form = nfd_form COLLATE "C" AS equal2 -- or any deterministic collation
FROM
  (VALUES 
  (E'j\u00E4hrlich', 
   E'j\u0061\u0308hrlich'))
  AS s(nfc_form, nfd_form);


 nfc_form | nfd_form | equal1 | equal2 
--+--++
 jährlich  | jährlich  | t | f
(1 row)

Normalizing is available as a built-in function since Postgres 13 and
non-deterministic collations appeared in Postgres 12.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: CROSSTAB( .. only one column has values... )

2021-01-06 Thread Daniel Verite
(resent to the list, previous post was rejected)

   Pavel Stehule wrote:

> > *That* is a function of how Postgres set returning functions work, and not
> > specific to crosstab(). It is not easily fixed. Patches to fix that would
> > be
> > welcomed!
> >
> 
> https://www.postgresql.org/message-id/flat/CAFj8pRC%2BhNzpH%2B0bPRCnqNncUCGjEvpwX%2B0nbhb1F7gwjYZZNg%40mail.gmail.com#9b3fbdd968c271668049a103bfc93880

Oracle has Polymorphic Table Function but still it doesn't work
seamlessly for dynamic pivots.
This blog post illustrates why:
https://blog.sqlora.com/en/dynamic-pivot-with-polymorphic-table-function/
The "related posts" at the end are also interesting about this feature.

Even before considering more dynamic SRFs in postgres, having a
variable number of columns for a DML statement is problematic
in general.

When an SQL statement is parsed/prepared, it's not supposed to
change its output structure, unless a DDL statement intervenes.
The dynamic pivot must break this promise, since a simple INSERT or
UPDATE or DELETE in the pivoting rows may cause the number of
output columns to change across invocations of the same statement.
That would mean that PREPARE would be basically unusable or
unreliable for such statements.

I think the query parser is also not supposed to read data outside
of the catalogs to determine the output structure of a query.
This promise would also need to be broken by dynamic pivots
implemented as a single-pass DML query.

On the other hand, dynamic pivots can be done by generating the SQL
dynamically and  getting at the results in a second step, or returning a
resultset embedded in a scalar (json).
Speaking of features that don't exist but might someday, ISTM that
CALL crosstab_dynamic(...) was more plausible than
SELECT * FROM crosstab_dynamic(...), since CALL
doesn't have the same contingencies as SELECT.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Multiple result set to be returned in procedure/function

2020-11-19 Thread Daniel Verite
Thomas Kellerer wrote:

>arthur=> \set AUTOCOMMIT off

Alternatively, start an explicit transaction block with BEGIN.
The point is that the lifespan of the cursor is the transaction block
in which it's instantiated.
 
>arthur=> select * from get_results();
>get_results
>
> 
> 

Friendlier names may be used by assigning them in the function,
i.e. plpgsql does support:

 declare
  c1 refcursor := 'mycursorname';

Then the caller might simply hardcode the cursor names in the FETCH
statements rather than building them dynamically at runtime.

Also it allows to put the server-side code into an anymous DO block
instead of creating a function, because it doesn't have to return
any refcursor when the caller already knows the cursor names.

BEGIN;
DO $$
declare c refcursor := 'c_pgclass';
begin
  open c for select relname from pg_class;
end $$ LANGUAGE plpgsql;

FETCH ALL from c_pgclass;
...
COMMIT;

This form might be closer to how inline blocks of code are
written with some other db engines, when they produce resultsets 
without an explicit cursor interface.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: passing linux user to PG server as a variable ?

2020-08-18 Thread Daniel Verite
David Gauthier wrote:

> I can avoid the error by just throwing a namespace in there...
> atletx7-reg036:/home/dgauthie[ 214 ] --> setenv PGOPTIONS "-c
> 'os.user=$USER' "
> But once in, "show os.user" is undefined.

It's documented to work [1], but you need to remove these single
quotes. For instance:

$ env PGOPTIONS="-c os.user=$USER" psql -d postgres
psql (12.4 (Debian 12.4-1.pgdg90+1))
Type "help" for help.

postgres=> show "os.user";
 os.user 
-
 daniel
(1 row)

If characters needed to be quoted in the value, backslash
should be used as the quote character. Since it's also
a quote character for the shell, two levels of quoting
are needed:

$ env PGOPTIONS="-c os.user=user\\ is\\ $USER" psql -d postgres
psql (12.4 (Debian 12.4-1.pgdg90+1))
Type "help" for help.

postgres=> show "os.user";
os.user 

 user is daniel
(1 row)


[1] https://www.postgresql.org/docs/current/config-setting.html


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Switching Primary Keys to BigInt

2020-07-22 Thread Daniel Verite
Mohamed Wael Khobalatte wrote:

> We lock the table as a precaution, with the understanding that we are
> undergoing a "small" downtime to finish replacing the int id by the new
> bigint

Ah, sorry I overlooked that most row updates are done
pre-transaction in a preliminary step:
  /* in batches, we update all the rows to new_id = id */


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Switching Primary Keys to BigInt

2020-07-22 Thread Daniel Verite
Mohamed Wael Khobalatte wrote:

> > alter table change_seq alter COLUMN id set data
> > type bigint;

> This is significant downtime, since it locks exclusively, no? We want to
> avoid that.

Well, in the steps you mentioned upthread, the transaction starts by
doing LOCK TABLE some_table, so it will hold an exclusive lock on it
for the rest of the transaction.

If you can test how the ALTER TABLE... SET TYPE ... compares
to your procedure in terms of downtime, that would be interesting.
To me, it's not clear why the procedure in multiple steps would
be better overall than a single ALTER TABLE.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




RE: psql: FATAL: database "postgres" does not exist or ERROR: 23505: duplicate key value violates unique constraint "pg_namespace_nspname_index"

2020-07-15 Thread Daniel Verite
TALLURI Nareshkumar wrote:

>  [0]postgres@axmdevhkdb008$ [PHKGAXMD008] psql
> psql: FATAL:  database "postgres" does not exist

That's not necessarily a problem. The "postgres" database is generally
empty and some installations don't even have one.
Use "psql -d template1" instead, or start postgres in single-user mode.

> Since we are not able to connect to DB , we did a new initialization and
> copy the directories from base directory to new base directory(in new data
> location)

This can't work. If you have a backup from just before that point,
you should restore it and work from that without writing anything into
the data directory.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked

2020-06-24 Thread Daniel Verite
Jim Hurne wrote:

> "Daniel Verite"  wrote on 06/22/2020 06:00:37 PM:
> > If you can arrange a maintenance window, a faster way to rebuild
> > pg_largeobject when it contains mostly empty pages can be to:
> 
> Thanks Daniel. If we cannot arrange for a maintenance window, is this the 
> sort of thing that pg_repack[1] could be used to do?

Unfortunately it can't work with pg_largeobject:

https://github.com/reorg/pg_repack/issues/163


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked

2020-06-22 Thread Daniel Verite
Jim Hurne wrote:

> We are of course going to continue to try different things, but does 
> anyone have any other suggestions on what we should be looking at or what 
> settings we might want to adjust?

If you can arrange a maintenance window, a faster way to rebuild
pg_largeobject when it contains mostly empty pages can be to:

- export the contents into files:
 for id in select oid from pg_largeobject_metadata
 loop
   perform lo_export(id, '/tmp-path/'||id::text);
 end loop;

- SET allow_system_table_mods to ON; (needs a restart)

- truncate table pg_largeobject, pg_largeobject_metadata;

- reimport the files with the same OIDs
  for id in select pg_ls_dir('/tmp-path/')
  loop
 perform lo_import('/tmp-path/' || id::text, id::oid);
  end loop;

- remove the files in /tmp-path

- Set allow_system_table_mods back to OFF and restart again, unless
 you don't need that safety check and prefer to leave it permanently to ON
 to avoid the restarts.

With less than 60MB of actual contents, all this might take no more
than a few minutes, as these operations don't need to fully scan
pg_largeobject, which is what is problematic with vacuum.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Daniel Verite
Thorsten Schöning wrote:

> > The caveat you mention about IF NOT EXISTS does not apply to
> > temporary tables, as they're not shared across sessions.[...]
> 
> That's what I understood as well, but I'm creating those concurrently
> WITHIN one and the same session and transaction. :-) 

But a SQL session on the server takes its statements from a FIFO queue
and processes them serially, so there's no intra-session concurrency.
In fact multi-threaded SQL clients *must* make sure that they don't
send concurrent queries to the same connection. The best they can
do in terms of throughput is to queue up a new query while
the server is busy executing a previous one, but that's pipelining,
not parallelism.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

2020-06-08 Thread Daniel Verite
Thorsten Schöning wrote:

> I'm using "CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]" and the
> associated queries can take a long time. So the following lists some
> questions about executing those concurrently, even thouzgh I've
> already read threads like the following:

> > The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend
> > to handle concurrency issues any better than regular old CREATE
> > TABLE, which is to say not very well.[...]
> 
> https://www.postgresql.org/message-id/ca+tgmozadyvtwbfp1fl2smzbihcwt4uprzrlnnx1nb30ku3...@mail.gmail.com


The caveat you mention about IF NOT EXISTS does not apply to 
temporary tables, as they're not shared across sessions.
That is, if two concurrent transactions execute at the same time
CREATE TEMP TABLE IF NOT EXISTS foo(...)
it can't fail as described above because that creates two distinct tables,
each private to their session.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: psql show me the : and ask user input, when running one sql file

2020-04-05 Thread Daniel Verite
Adrian Klaver wrote:

> On 4/5/20 9:46 AM, Adrian Klaver wrote:
> > On 4/5/20 5:50 AM, arden liu wrote:
> 
> > 4) I don't see anything wrong the statements, so I am wondering if it is 
> > a shell issue?
> 
> Seems to be. I removed the RETURNING *_id from the INSERT statements and 
> the file ran without interruption:

Presumably these results are being displayer with a pager, and it's
the pager that is asking for keyboard input.
You may add -P pager=off to psql options to suppress this,
or remove permanently the RETURNING clauses that seem pointless
in that context.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-23 Thread Daniel Verite
pabloa98 wrote:

> When I have a medium number of sequence I will report how it behaves. It
> will take some time though.

Be aware that creating the sequences on the fly has the kind of race
condition that you wanted to avoid in the first place.

For instance consider this execution in two concurrent sessions:

S1: BEGIN;

S1: CREATE SEQUENCE seq1 IF NOT EXISTS;

S2:  BEGIN;

S2:  CREATE SEQUENCE seq1 IF NOT EXISTS;
S2:  (now blocked waiting for S1)

S1: COMMIT;

S2:ERROR:  duplicate key value violates unique constraint
"pg_type_typname_nsp_index"
DETAIL : Key (typname, typnamespace)=(seq1, 36434) already exists.

The client could catch these errors and retry, but then it could also
do that with serializable transactions on serialization failures
(SQLSTATE 40001), and you'd get the guarantee of consecutive
numbering without creating all these sequences, not to mention
the protection against other potential concurrency anomalies.
See https://www.postgresql.org/docs/current/transaction-iso.html


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: SQL Query Syntax help

2020-01-22 Thread Daniel Verite
srikkanth wrote:

> Can you please help me in writing the syntax for the below mentioned

This looks like an UNPIVOT operation.

Here's a generic method that does this without having to specify the
columns individually, with the help of json functions:

SELECT ID, key, value FROM
  (SELECT ID, row_to_json(t.*) AS line FROM PivotTableName t) AS r
JOIN LATERAL json_each_text(r.line) on (key <> 'ID');


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Inexplicable duplicate rows with unique constraint

2020-01-16 Thread Daniel Verite
Richard van der Hoff wrote:

> synapse=> select count(*), room_id, type, state_key from 
> current_state_events group by 2,3,4 order by count(*) DESC LIMIT 2;
>  count |  room_id  | type  |   state_key
> ---+---+---+-
>  3 | !ueLfVrSWYGMnFnoCbT:darkfasel.net | m.room.member | 
> @irc_ebi_:darkfasel.net
>  3 | !HwocBmCtBcHQhILtYQ:matrix.org| m.room.member | 
> @freenode_AlmarShenwan_:matrix.org

Looking at these columns which are of type text but do not
contain words of any particular language, there's probably
no point in using a linguistic-aware collation for them.

If you maintain the database schema, what you could do to avoid
the dependency on the OS collation and stay clear of the particular
upgrade difficulty of collations is to use COLLATE "C" for this kind of
field, as opposed to the default collation of the database.
As a bonus, operations with the "C" collations tend to be faster,
sometimes even much faster.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Inexplicable duplicate rows with unique constraint

2020-01-16 Thread Daniel Verite
Richard van der Hoff wrote:

> So, question: what could we be doing wrong to get ourselves into this 
> situation?

OS/libc upgrades without reindexing come to mind.
See https://wiki.postgresql.org/wiki/Collations

>  * At least one user reports that he has recently migrated his database 
> from one server to another via a `pg_dump -C` and later piping into psql.

This kind of migration recreates indexes (since it recreates everything)
so it's not subject to the same hazard as an OS/libc upgrade without
reindexing. In fact it would detect the problem if it existed,
as the creation of the unique constraint in the new db would fail if the
data in the dump did not satisfy it.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: replace single char for string using regexp_replace

2019-12-30 Thread Daniel Verite
PegoraroF10 wrote:

> I have a
> Replace(Replace(Replace(Replace($$Text,with;On'It"$$,',','chr(59)'),';','chr(44)'),'"','chr(34)'),,'chr(39)')
> It works but I would like to call just one Replace.

For performance, it might be the fastest method, despite the
lack of elegance and the multiple passes on the string.
In a recent discussion on the list [1] I've asked on how best to do
multiple string replacements. That's a generalized version of your
question and the answer might be heavier but anyway here's some
code on the wiki with plperl and plpgsql versions:

https://wiki.postgresql.org/wiki/Multi_Replace_Perl
https://wiki.postgresql.org/wiki/Multi_Replace_plpgsql

If you're okay with plperl see the first version, as it's
much more efficient, with Perl implementing the multiple
replacement natively in its regexp engine (plus implementing
the search with a trie since 5.10).

The plpgsql version tries do its best with regexp_replace, but it
scales much worse as the number of replacements grows.
But it does work in the situations where a stack of nested replace() calls
wouldn't work, like replace foo with bar and bar with foo.

In your case, as the strings to replace always consist only of one character,
you might also split the string by characters, replace them with a
CASE WHEN... construct, and reassemble the result with string_agg,
as in:

select string_agg(c, '') from 
(select case c
   when ',' then 'chr(59)'
   when ';' then 'chr(44)'
   ... other substitutions...
   else c
   end
 from
   regexp_split_to_table($$The Original String$$, '') as s1(c)
) as s2(c);

If the strings are not too large and there are many occurrences of the
characters
to replace, I would expect  this to be more efficient than the more generic
plpgsql-based solution above. Against the Perl version I don't know.
There is a per-call overhead with plperl that can't be ignored if you're
focused on performance.


[1]
https://www.postgresql.org/message-id/306b726b-f185-4668-bffe-ac8e7f788...@manitou-mail.org

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Mixing greediness in regexp_matches

2019-12-23 Thread Daniel Verite
Tom Lane wrote:

> regression=# select regexp_split_to_array('junkfoolbarfoolishfoobarmore',
> 'foo|bar|foobar');
> regexp_split_to_array 
> ---
> {junk,l,"",lish,more}
> (1 row)
> 
> The idea would be to iterate over the array elements, tracking the
> corresponding position in the source string, and re-discovering at
> each break which of the original alternatives must've matched.
> 
> It's sort of annoying that we don't have a simple "regexp_location"
> function that would give you back the starting position of the
> first match.

It occurred to me too that regexp_split_to_table or array would make
this problem really easy if only it had a mode to capture and return the
matched parts too.

FWIW, in plperl, there's a simple solution:

 $string =~ s/(foobar|foo|...)/$replace{$1}/g

when %replace is a hash of the substitutions %(foo=>baz,...).
The strings in the alternation are tested in their order of
appearance, so you can choose to be greedy or not by just sorting
them by length.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Mixing greediness in regexp_matches

2019-12-23 Thread Daniel Verite
Tom Lane wrote:

> I'd try forcing the match to be the whole string, ie
> 
> ^(.*?)(foo|bar|foobar)(.*)$
> 
> which would also save some work for restarting the iteration,
> since you'd have already captured the all-the-rest substring.

In that case regexp_matches will return 0 or 1 row. In the
above-mentioned example, that would be:

=> select regexp_matches('the string has foo and foobar and bar and more',
 '^(.*?)(foo|foobar|bar)(.*)$', 'g');
 regexp_matches 

 {"the string has ",foo," and foobar and bar and more"}

So the next iteration would consist of calling regexp_matches()
on result[3], and so on until no match is found.
I think it would work as desired, but probably much less efficiently on
large strings/large number of matches than if a single call of
regexp_matches() could return all matches.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Mixing greediness in regexp_matches

2019-12-23 Thread Daniel Verite
  Hi,

When looking into how to implement a global replace of multiple
substrings (each with their own replacement) in sql
or plpgsql, I'm wondering if/how an RE with an alternation
can be used.

The basic idea is to iterate on the rows produced by

   regexp_matches(string, '(.*?)(foo|bar|foobar)', 'g')

to break down the string into pairs of (non-matching segment,
matching segment) so that a final result can be assembled
from that (setting aside the last non-matching segment, that
can be retrieved in a final step).

The difficulty is that the longest strings in the alternation
should be prioritized, but the starting (.*?) makes the RE
non-greedy so "foo" is choosen over "foobar". 

The doc at [1] leaves me unoptimistic when it mentions that:

 "...when an RE contains both greedy and non-greedy
 subexpressions, the total match length is either as long as possible
 or as short as possible, according to the attribute assigned to the
 whole RE. The attributes assigned to the subexpressions only affect
 how much of that match they are allowed to “eat” relative to each
 other."

Also it gives this example of forcing the RE as a whole to
be greedy despite it having a non-greedy sub-RE:
 regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}')
but it doesn't seem to be able to produce the desired result
in the case of the RE in the middle being an alternation
with strings of different lengths.

The ideal RE with a (foo|foobar|bar) alternation, when applied
globally to a string like 'the string has foo and foobar and bar and
more' would produce something like:

 {"the string has ","foo"}
 {" and ","foobar"}
 {" and ","bar"}

Is that possible with regexp_matches?


[1] https://www.postgresql.org/docs/current/functions-matching.html




Re: Tuple concurrency issue in large objects

2019-12-18 Thread Daniel Verite
Shalini wrote:

> Could you also please state the reason why is it happening in case
> of large objects? Because concurrent transactions are very well
> handled for other data types, but the same is not happening for
> lobs. Is it because the fomer are stored in toast table and there is
> no support for concurrent txns in pg_largeobject table?

Keeping in mind that large objects are not a datatype, but rather a
functionality that is built on top of the bytea and oid datatypes plus
a set of functions, I wouldn't say that concurrent writes would be
better handled if you had a table: document(id serial, contents bytea)
with "contents" being indeed toastable.

To illustrate with a basic example: transactions Tx1 and Tx2
want to update the contents of the same document concurrently,
with this order of execution:

Tx1: begin 
Tx1: update document set contents=... where id=...
Tx2: begin
Tx2: update the same document (gets blocked)
Tx1: commit
Tx2: commit

If using the read committed isolation level, Tx2 will be put to wait
until Tx1 commits, and then the update by Tx1 will be overwritten by
Tx2. That's a well known anomaly known as a "lost update", and
probably not what you want.

If using a better isolation level (repeatable read or serializable),
the update by Tx2 will be rejected with a serialization failure,
which, to me, seems the moral equivalent of the "Tuple concurrently
updated" error you're reporting with large objects.
When this occurs, your application can fetch the latest value in a new
transaction and see how it can apply its change to the new value,
unless another conflict arises and so on.

In short, the best the database can do in case of conflicting writes
is to inform the application. It can't know which write should be
prioritized or if the changes should be merged before being written.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Tuple concurrency issue in large objects

2019-12-13 Thread Daniel Verite
Shalini wrote:

> > Is there a workaround to this concurrency issue without creating a
> > new   large object?

The transaction failing with the "Tuple concurrently updated"
error could be resubmitted by the client, as if it was a
serialization failure.
Or the failure could be prevented by using advisory locks:
https://www.postgresql.org/docs/current/explicit-locking.html


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Issue upgrading from V11 to V12 on Debian

2019-12-03 Thread Daniel Verite
stan wrote:

> BUT, I went to the directory where I have the exentsion's source, did a
> make clean ; make ; make install, and the files were still installed in the
> V11 tree. How can I instruct the system to put these in the V12 tree?

With the Debian packaging, /usr/bin/pg_config is a shell script that
has this comment on top:

# If postgresql-server-dev-* is installed, call pg_config from the latest
# available one. Otherwise fall back to libpq-dev's version.

According to this comment, invoking /usr/bin/pg_config refers to the v11
server probably because you didn't install postgresql-server-dev-12 yet.

When you'll have both postgresql-server-dev-11 and postgresql-server-dev-12
installed, independently of which clusters are actually running,
you can still force a specific target version when compiling an extension
with :
 $ make PG_CONFIG=/usr/lib/postgresql/11/bin/pg_config [install]
or
 $ make PG_CONFIG=/usr/lib/postgresql/12/bin/pg_config [install]


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE

2019-10-24 Thread Daniel Verite
Jeff Lanzarotta wrote:

> I have a question about nondeterministic collations in PostgreSQL 12.  I
> have created a new collation that is nondeterministic and created several
> columns which use this collation.  Querying these columns works great until
> I use LIKE.  When I do, I get the following error:
> 
> SQL Error [0A000]: ERROR: nondeterministic collations are not supported for
> LIKE
> 
> Is there any plan to allow this functionality?

PostgreSQL development is conducted without a roadmap [1]. Maybe
someone will submit a patch to enable LIKE with nondeterministic
collations, but so far it did not happen according to the current set
of patches at https://commitfest.postgresql.org

Such matches can be weirder than you might think (not to
mention much slower).
Consider for instance a collation that ignores punctuation:

CREATE COLLATION "nd3alt" (
  provider = 'icu',
  locale='und@colAlternate=shifted',
  deterministic = false
);

In the icu_ext extension, icu_strpos [2] can match a substring with
a nondeterministic collation, which is one part of what LIKE
would need to do for such collations. The implementation uses
the string search facility of the ICU library.

With the above-defined collation, we can have for instance:

SELECT icu_strpos('abc.  ...de', 'c,d' COLLATE nd3alt);
 icu_strpos 

  3

So even though 'c,d' is not a substring of 'abc.  ...de' in the common
sense, it is recognized as such by this collation, by design.

A LIKE operator for nondeterministic collations should be able to
recognize this too, but with an arbitrary number of substrings to
match in the pattern, plus it should handle the underscore wildcard
in a way that hopefully makes sense. 

With the example above,
   'abc.  ...de' LIKE '%c,d%' COLLATE nd3alt
should certainly be a match, but in the case of this variant:
  'abc.  ...de' LIKE '%c_d%' COLLATE nd3alt
it's not necessarily clear how (or even if) it should work.


[1] https://www.postgresql.org/developer/roadmap/
[2] https://github.com/dverite/icu_ext#icu_strpos


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Text search lexer's handling of hyphens and negatives

2019-10-16 Thread Daniel Verite
raylu wrote:

> to_tsvector('simple', 'UVW-789-XYZ') is
> 'uvw':1 '-789':2 'xyz':3
> because -789 is a negative integer. If we turn the query '789-XYZ'
> into the tsquery as before, we get to_tsquery('simple', '789 <-> xyz')
> which doesn't match it.
> 
> Are we missing something here? Is there either a way to
> 1. generate tsvectors without this special (negative) integer behavior or

As an ad-hoc solution, you could add a dictionary that turns a negative
integer into its positive counterpart. There's a dictionary in contrib that
can be used as a starting point:
https://www.postgresql.org/docs/current/dict-int.html

It's a matter of ~10 lines of C code to add an "abs" parameter to 
that dictionary that would, when true, produce "789" as a lexem
when fed "-789" as input.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Perl DBI converts UTF-8 again to UTF-8 before sending it to the server

2019-10-12 Thread Daniel Verite
Christoph Moench-Tegeder wrote:

> And then it doesn't know that your terminal expects UTF-8 (perl
> just dumps the binary string here), because you didn't tell it:
> "binmode(STDOUT, ':encoding(utf8)')" would fix that.

Or use perl -C, so that it gets that from the environment.

 From https://perldoc.perl.org/perlrun.html :

 -C on its own (not followed by any number or option list), or the
 empty string "" for the PERL_UNICODE environment variable, has the
 same effect as -CSDL. In other words, the standard I/O handles and
 the default open() layer are UTF-8-fied but only if the locale
 environment variables indicate a UTF-8 locale.

Now for what the OP is doing, I'd suggest to use Dump() from the
Devel::Peek module instead of print.

To see the difference between a literal "ä" and "\xc3\xa4" from the
point of view of Perl:

   use Devel::Peek;
   use utf8;

   $str = "\xc3\xa4";
   Dump($str);

   $str = "ä";
   Dump($str);

Result:

SV = PV(0x55af63beeda0) at 0x55af63c185d0
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x55af63c3c230 "\303\244"\0
  CUR = 2
  LEN = 10
  COW_REFCNT = 1
SV = PV(0x55af63beeda0) at 0x55af63c185d0
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK,UTF8)
  PV = 0x55af63c58dc0 "\303\244"\0 [UTF8 "\x{e4}"]
  CUR = 2
  LEN = 10
  COW_REFCNT = 1



Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Case Insensitive Comparison with Postgres 12

2019-10-12 Thread Daniel Verite
Igal Sapir wrote:

> > Out of curiosity is there a eason not to use the citext type for th?
> >
> >
> Using the collation seems like a much cleaner approach, and I trust ICU to
> do a better job at comparing strings according to language rules etc.

One notable difference between citext and case-insensitive collations
by ICU is that the latter recognizes canonically equivalent sequences
of codepoints [1] as equal, while the former does not.

For instance:

=# CREATE COLLATION ci (locale='und@colStrength=secondary', 
  provider='icu', deterministic=false);

=# SELECT E'E\u0302TES'::citext = 'Êtes'::citext AS "citext-equal", 
  E'E\u0302TES' = 'Êtes' collate "ci" AS "ci-equal";
 citext-equal | ci-equal 
--+--
 f| t

Another significant difference is that building or rebuilding an index on a
text column with a CI collation appears to be way faster than with citext
(I've seen 10:1 ratios, but do your own tests).

On the minus side, substring matching with LIKE or other methods
is not possible with CI collations whereas it does work with citext.


[1] https://en.wikipedia.org/wiki/Unicode_equivalence


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: psql \copy hanging

2019-10-08 Thread Daniel Verite
Arnaud L. wrote:

> Anyway, it hung using this syntax during last night's run.
> I'll give it another try tonight  just to be sure.

When psql.exe is hanging, maybe you could use a tool like
Process Monitor [1] or Process Explorer [2] to get insights about
what it's stuck on or what it's doing exactly.

[1] https://docs.microsoft.com/en-us/sysinternals/downloads/procmon
[2] https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: psql \copy hanging

2019-10-08 Thread Daniel Verite
Arnaud L. wrote:

> As a side note, COPY (...) TO STDOUT \g 'somefile' does not work in a 
> script file (i.e. it does not work if the command is passed in a file 
> via the -f argument).
> The command runs fine, no error is raised either by the client or the 
> server, but no file is written.

Testing this with 11.5, it works for me.
Make sure you're running the latest minor release (on the client
side in this case), because a related fix was issued last February.
For the 11 branch it was in version 11.2.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Postgres 12: backend crashes when creating non-deterministic collation

2019-10-04 Thread Daniel Verite
Thomas Kellerer wrote:

> > To get a case insensitive collation you'd have to use something like
> >
> >  LOCALE = 'de-DE-u-ks-level2'
> 
> Creating works, but apparently on Windows ICU does not support this.

After installing v12 on windows with the EDB installer, I notice
that it ships with ICU 53, a relatively old version (2014).

Concerning the problem just above (not the crash), ICU 53 is too old
to support BCP47 tags as collation attributes, as mentioned
at https://www.postgresql.org/docs/12/collation.html :

  "The first example selects the ICU locale using a “language tag” per
  BCP 47. The second example uses the traditional ICU-specific locale
  syntax. The first style is preferred going forward, but it is not
  supported by older ICU versions.
 
With ICU 53 or older, instead of the locale above, we must use the
old-style syntax:

 locale = 'de-DE@colStrength=secondary'

If you use that in your example, the case insensitive lookups should
work.

But it's unfortunate that the EDB build did not switch to a recent ICU
version for PostgreSQL 12.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Postgres 12: backend crashes when creating non-deterministic collation

2019-10-04 Thread Daniel Verite
Thomas Kellerer wrote:

> It also works on Windows when I specify "correct" locale names - the above
> seems to be an edge case.
> Is it worth the effort to report that through the bug reporting form?

Sure. Both the crash with 'de-x-icu' and the difference in behavior
between Linux and Windows on locale = 'de-DE-u-ks-level2' look like
"must-fix" bugs to me. Both are valid ICU locales and should work the
same in all operating systems.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: "Failed to connect to Postgres database"

2019-09-27 Thread Daniel Verite
Marco Ippolito wrote:

> (base) postgres@pc:~$ psql --cluster 11/fabmnet -h localhost
> Password for user postgres:
> psql: FATAL:  password authentication failed for user "postgres"
> FATAL:  password authentication failed for user "postgres"

Did you set a password for the postgres user in that newly created
cluster?
If not, try psql --cluster 11/fabmnet (without -h localhost),
it should connect you without a password,
then set a password with the \password command in psql,
then try again with -h localhost.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Issues with inconsistent COLLATION installation

2019-09-09 Thread Daniel Verite
Cory Nemelka wrote:

> ERROR:  22023: could not create locale "fr_FR.utf8": No such file or
> directory
> DETAIL:  The operating system could not find any locale data for the locale
> name "fr_FR.utf8".
> LOCATION:  report_newlocale_failure, pg_locale.c:1312

I can reproduce this by creating a new locale *after* starting
PostgreSQL and trying to use it before a restart.

Example with Ubuntu 18.04 and Postgres 11.5:

# locale-gen es_ES.utf8
Generating locales (this might take a while)...
  es_ES.UTF-8... done
Generation complete.

Immediately refering to the locale will fail:

$ sudo -u postgres psql -d test -U postgres

test=# create collation es (provider=libc, locale='es_ES.utf8');
ERROR:  could not create locale "es_ES.utf8": No such file or directory
DÉTAIL : The operating system could not find any locale data for the locale
name "es_ES.utf8".


Now restart postgresql

$ sudo /etc/init.d/postgresql restart
[ ok ] Restarting postgresql (via systemctl): postgresql.service.

And now it works:

$ sudo -u postgres psql -d test -U postgres

test=# create collation es (provider=libc, locale='es_ES.utf8');
CREATE COLLATION

test=# select * from pg_collation where collname='es' \gx
-[ RECORD 1 ]-+---
collname  | es
collnamespace | 2200
collowner | 10
collprovider  | c
collencoding  | 6
collcollate   | es_ES.utf8
collctype | es_ES.utf8
collversion   | 


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: database "cdf_100_1313" does not exist

2019-09-09 Thread Daniel Verite
nikhil raj wrote:

> /usr/lib/postgresql/11/bin/psql -p 5432 -U postgres -d $DBLIST -c "GRANT
> CONNECT ON DATABASE \"$DBLIST\" TO cpupdate"
> ERROR:  database " CDF_100_1313" does not exist

That's because you don't use the unaligned format to get the
list of results. You should add -A to psql options.

Or better yet, use a simpler method that does not use
shell variables at all and work with a single psql call:

$ psql -U postgres -d postgres <
\gexec

EOF

That's possible because you don't need to be connected to 
a database to grant the right to connect to that database.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Permissions on postgresql.conf, psql and patroni

2019-09-09 Thread Daniel Verite
Peter J. Holzer wrote:

> 2) Why does psql need to read postgresql.conf, and more specifically,
>   why does it care about the location of the data directory? It
>   shouldn't access files directly, just talk to the server via the
>   socket.

It's not psql itself, it's pg_wrapper.

$ ls -l /usr/bin/psql
lrwxrwxrwx 1 root root 37 Aug  8 12:48 /usr/bin/psql ->
../share/postgresql-common/pg_wrapper

See http://manpages.ubuntu.com/manpages/bionic/man1/pg_wrapper.1.html


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?

2019-08-08 Thread Daniel Verite
Bryn Llewellyn wrote:

> B.t.w., I’m guessing that the “begin” SQL command that you see in the log
> that I mentioned is actually issued by (some) clients—at least psql and
> Python-on-psycopg2—as an explicit call from the client. In other words, it
> isn’t the server that generates this. Does anyone know for sure how this
> works?

Well, that's the point of Autocommit, and moreover it does nothing
else. Maybe you're still confused about this.

* Autocommit off = the client automatically adds a "BEGIN" when
it thinks a transaction must be started on behalf of the user.

* Autocommit on = the client does nothing.

The fact that "off" implies doing something and "on" implies not
interfering is counterintuitive, but that's how it is.  Autocommit is
for compatibility with other databases. If it was only for Postgres, I
guesss either it wouldn't exist in the first place or it should be
called "AutoStartTransactionBlock" or something like that, because
that's just what it really does.

Anyway, the server only know whether a BEGIN has been issued.
It never knows or cares whether it has been added implicitly or explicitly,
which is why it can be quite confusing to reason about server-side
differences in terms of Autocommit, as you do in some of your previous
messages.

It should be stressed that Autocommit is not a parameter of the
session between Postgres and the SQL client, but rather it's a
parameter of the session between the user and their SQL client.
So when you're hypothesizing that a plpgsql block in a procedure
would look at this parameter or change it temporarily (your
points #2 and #5 in your analysis of p2's execution), you should
see that it's impossible, because on the server-side, this parameter
just does not exist.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Inserting into the blob

2019-06-10 Thread Daniel Verite
Igor Korot wrote:

> It is not a problem in general, but just curious - is there a more
> generic solution (to get the file from the client)?

With psql:

 \lo_import /path/to/file

It creates a large object with the contents of the file from the client
file system and returns its unique ID, in the output and into the
:LASTOID variable in psql.

There's no equivalent for bytea contents. Bytea contents need
to be either injected into the query as text, or passed separately
as parameters, but psql does not provide helper methods for this,
and it also lack binary support for variables.
So it's easier to implement "upload bytea to server" in a script language
than in psql.

Alternatively, if you don't care about the contents being written twice, a
file can be imported as a large object, copied as bytea into a row of the
target table, and the large object purged immediately.

In psql, a sequence like this should work:

 \lo_import /path/to/file
 \set tmp_oid :LASTOID

insert into tablename ( bytea_col, [other columns] )
 values ( lo_get(:tmp_oid), [other values] )

\lo_unlink :tmp_oid


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: delimeters psql /CSV

2019-06-07 Thread Daniel Verite
Paul Malm wrote:

> C:\Users\an\bin>C:\Tmp\psql -h 10.211.43.22 -p 5432 -U postgres -d aatest -w
> -c  "\copy public.""Bayern"" FROM 'C:\Users\an\test\Bayern.csv' WITH
> DELIMITER ';' CSV"
> Error:  extra data after expected last column
> CONTEXT:  COPY Bayern, row 1:
> "48.456;11.8008333;Anglberg/Amper;Industrial
> plant;722;220;220;2133;0;0;Undefined..."
> 
> I have a table (Bayern) with all columns in the right format. There is a
> column-value which has a '/' in the string (Anglberg/Amper). I think that
> '/' is taken for a delimeter, since when I replace / with 'white space' it
> works.

There's no reason for '/' to be taken as a delimiter.
Can you can share the table definition and the offending line as an
attachment, so that someone can try to reproduce this with the
exact same data?
Also mention your PostgreSQL version and the shell the command
is invoked from (cmd.exe, powershell, something else?) just in case
it matters.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: pldbgapi error

2019-05-20 Thread Daniel Verite
Prakash Ramakrishnan wrote:

> which code
> will be support for postgresql 11 can you please share me the link or file.

It appears indeed from the error messages that you're trying
to compile an old version.

The lastest version here:
git://git.postgresql.org/git/pldebugger.git
does compile successfully for me with PG11 (Debian 9).


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Upgrading locale issues

2019-05-03 Thread Daniel Verite
rihad wrote:

> On 05/03/2019 05:35 PM, Daniel Verite wrote:
> > For non-English text, I would recommend C.UTF-8 over "C" because of
> 
> BTW, there's no C.UTF-8 inside pg_collation, and running select 
> pg_import_system_collations('pg_catalog') doesn't bring it in, at least 
> not on FreeBSD 11.2.

Yes, aside from "C", locales are quite system-dependent, unfortunately.

It looks like FreeBSD 13 does provide C.UTF-8:

https://unix.stackexchange.com/questions/485073/how-to-backport-freebsd-13-current-c-utf-8-locale-to-11-2-release/485077#485077


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Upgrading locale issues

2019-05-03 Thread Daniel Verite
rihad wrote:

> Thanks, I'm a bit confused here. AFAIK indexes are used for at least two 
> things: for speed and for skipping the ORDER BY step (since btree 
> indexes are already sorted). Will such an "upgrade-immune" C.UTF-8 index 
> still work correctly for table lookups?

If the lookup is based on a equality test or a pattern match with LIKE
or a regexp, it makes no difference.  But the locale makes a
difference with inequality tests, such as < > or BETWEEN.

Around version 9.1 and in the pre-ICU days, Robert Haas wrote
this post that explained it pretty well, I think:
http://rhaas.blogspot.com/2012/03/perils-of-collation-aware-comparisons.html

Quote:

  If you happen to need the particular sorting behavior that
  collation-aware sorting and comparisons provide, then you may find
  this price worth paying, but I suspect there are a lot of people out
  there who are paying it more or less accidentally and don't really
  care very much about the underlying sorting behavior.  If, for
  example, all of your queries are based on equality, and you don't
  use greater-than or less-than tests, then it doesn't matter what
  collation is in use.  You might as well use "C" instead of whatever
  your local default may be, because it's faster.

For non-English text, I would recommend C.UTF-8 over "C" because of
its better handling of Unicode characters. For instance:

=# select upper('été' collate "C"), upper('été' collate "C.UTF-8");
 upper | upper 
---+---
 éTé   | ÉTÉ

The "price" of linguistic comparisons that Robert mentioned was about
performance, but the troubles we have with the lack of long-term
immutability of these locales are worth being added to that.

> And can the existing en_US.UTF-8 fields' definition be altered in
> place, without a dump+restore?

Changing the COLLATE clause of a column with ALTER TABLE does
implicitly rebuild an index on this column if there is one,
A dump+restore is not needed, nor an explicit REINDEX.

The dump+restore is needed in another scenario, where you would
decide to change the LC_COLLATE and LC_CTYPE of the database,
instead of doing it only for some columns.
This scenario makes perfect sense if the locale of the database
has been set implicitly and it uses linguistic sorts without
really needing them ("accidentally" as said in the post).


> en_US.UTF-8 is the default encoding+locale+collation, it 
> isn't set explicitly for any of our string columns. I assume there's 
> some "catch-all" ordering taking place even for the C locale, so there 
> won't be any bizarre things like b coming before a, or generally for any 
> language, the second letter of its alphabet coming before the first?

'b' < 'a' is never true for any locale AFAIK, it is 'B' < 'a' that is
true for some locales such as C or C.UTF-8.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Upgrading locale issues

2019-05-02 Thread Daniel Verite
rihad wrote:

> Thanks for the reply. Do you know what would a "decent" ICU collation be 
> to bind to a field's schema definition so it would mimic a UTF-8 
> encoding for a multilingual column? Maybe und-x-icu? We aren't as much 
> concerned about their sortability in most cases, we just want indexes to 
> better handle future PG/ICU upgrades. But what does und(efined) even 
> mean with respect to collations?

"undefined" in this context means unspecified language and
unspecified country or region. It implies that no language-specific
nor regional rule will be applied to compare strings.

Using C.UTF-8 as the collation for text fields to index may be the
best trade-off in your case. It should be immune to libc and ICU
upgrades.

With C.UTF-8, a string like 'BC' will sort before 'ab', and punctuation
and accents will also sort differently than with a linguistic-aware
collation.
If your applications care about that, it can be fixed by simply
adding COLLATE "default" to the ORDER BY clause of the queries that
are meant to present data to users.
COLLATE "default" means the collation of the database, which
presumably would be something like "language_REGION.UTF-8" in your
case. If you never specified it explicitly, it came from initdb which
itself got it from the environment of the server.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




CVE-2019-9193 about COPY FROM/TO PROGRAM

2019-03-30 Thread Daniel Verite
  Hi,

I've noticed this post being currently shared on social media:

https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/cve-2019-9193-authenticated-arbitrary-command-execution-on-postgresql-9-3/

The claim that COPY FROM PROGRAM warrants a CVE seems groundless
because you need to be superuser in the first place to do that.

Apparently these guys have not figured out that a superuser can
also inject arbitrary code with CREATE EXTENSION or even CREATE
FUNCTION since forever, or maybe that will be for a future post?

The CVE itself has not been published, in the sense that it's not
on https://cve.mitre.org, but the ID is reserved.

I don't know if there are precedents of people claiming
CVE entries on Postgres without seemingly reaching out to the
community first. Should something be done proactively about
that particular claim?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




RE: Oracke BLOB to Postgres BYTEA using ora2pg

2019-01-15 Thread Daniel Verite
Mihalidesová Jana wrote:

> nipjd=> select distinct encode(serializable_value, 'escape') from
> alf_node_properties_zaloha where serializable_value is not null; 
>   
>   encode
> 
> decode(E'aced00057e72002c6f72672e616c66726573636f2e736572766963652e636d722e76657273696f6e2e56657273696f6e54797065127872000e6a6176612e6c616e672e456e756d1278707400054d494e4f52',
> 'hex') 

These contents are code, in the form of function calls, instead of data.
Possibly if you ran these function calls they would output the
original data, but that's a pretty weird form of storage.  If the
insert was done by Ora2Pg, you want to report this as a bug.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text file that contains comma and double quotes

2018-12-17 Thread Daniel Verite
   wrote:

> When I save that Excel as a tab delimited text file, I get this:rec_no   
> item1item2item3item4item5
> 1Denny'sorange juice"1,500 yen""""Dear John""""32""
> TV"(As seen when I opened that file with Notepad)

This looks good. Fields are properly enclosed and double quotes
in contents are doubled, as expected in CSV.

> 5.while (($line = fgetcsv($fileRead, 0, "\t")) !== FALSE) { //0 means I can
> read row whatever its length
> 6.if($row == 1){ $row++; continue; } //skip header
> 7.$line = implode(" ",$line). "\n";
> 8.$twoDarray[] = $line;
> ...
> 14.if (pg_copy_from($con, $tableName, $twoDarray) !== FALSE) {

It goes wrong at line 7. pg_copy_from() expects lines in the 
COPY "text format" documented at
https://www.postgresql.org/docs/current/sql-copy.html

It implies that:
- since your call to pg_copy_from() doesn't specify a delimiter
it uses tab, not a space, so implode() must be passed a tab,
not a space.
- if there are backslashes in the contents they must be quoted
by doubling them.
- if there are newline or carriage return characters in the contents
they must be replaced by \n and \r respectively, so as to
not be confused with an end of record.
- if there are tabs in the contents they must be replaced by \t.

These replacements can all be done by a single strtr() call in php.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



RE: pg_dump out of memory for large table with LOB

2018-11-15 Thread Daniel Verite
Jean-Marc Lessard wrote:

> Another area where LOB hurts is the storage. LOB are broken and stored in 2K
> pieces.
> Due to the block header, only three 2k pieces fit in an 8k block wasting 25%
> of space (in fact pgstattuple reports ~ 20%).

Yes. bytea stored as TOAST is sliced into pieces of 2000 bytes, versus
2048 bytes for large objects. And that makes a significant difference
when packing these slices because 2000*4+page overhead+
4*(row overhead) is just under the default size of 8192 bytes per page,
whereas 2048*4+(page overhead)+4*(row overhead)
is obviously a bit over 8192, since 2048*4=8192.

If the data is compressible, the difference may be less obvious because
the slices in pg_largeobject are compressed individually
(as opposed to bytea that gets compressed as a whole),
so more than 3 slices can fit in a page inside pg_largeobject
The post-compression size can be known with pg_column_size(),
versus octet_length() that gives the pre-compression size.
 
> Would you recommend bytea over LOB considering that the max LOB size is well
> bellow 1GB?
> Are bytea preferable in terms of support by the community, performance,
> feature, etc?

For the storage and pg_dump issues, bytea seems clearly preferable
in your case.
As for the performance aspect, large objects are excellent because their
API never requires a binary<->text conversion.
This may be different with bytea. The C API provided by libpq allows to
retrieve and send bytea in binary format, for instance through
PQexecParams(), but most drivers implemented on top of libpq use only 
the text representation for all datatypes, because it's simpler for them.
So you may want to check the difference in sending and retrieving
your biggest binary objects with your particular app/language/framework
stored in a bytea column versus large objects.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Trouble Upgrading Postgres

2018-11-06 Thread Daniel Verite
Adrian Klaver wrote:

> > So there's no way it can deal with the contents over 500MB, and the
> > ones just under that limit may also be problematic.
> 
> To me that looks like a bug, putting data into a record you cannot get out.

Strictly speaking, it could probably get out with COPY in binary format,
but pg_dump doesn't use that.

It's undoubtedly very annoying that a database can end up with
non-pg_dump'able contents, but it's not an easy problem to solve.
Some time ago, work was done to extend the 1GB limit
but eventually it got scratched. The thread in [1] discusses
many details of the problem and why the proposed solution
were mostly a band aid. Basically, the specs of COPY
and other internal aspects of Postgres are from the 32-bit era when 
putting the size of an entire CDROM in a single row/column was not
anticipated as a valid use case.
It's still a narrow use case today and applications that need to store
big pieces of data like that should slice them in chunks, a bit like in
pg_largeobject, except in much larger chunks, like 1MB.

[1] pg_dump / copy bugs with "big lines" ?
https://www.postgresql.org/message-id/1836813.YmyOrS99PX%40ronan.dunklau.fr

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Trouble Upgrading Postgres

2018-11-06 Thread Daniel Verite
Charles Martin wrote:

>  but the second one returned this:
> 
> 0 "623140"
> 1 "53"
> 2 "12"
> 3 "10"
> 4 "1"
> 5 "1"
> 7 "1"
> [null] "162"
> 
> Not quite sure what that means, but if there is just a small number of
> overly-large records, I might be able to delete them. If I can find them.

The query was:

  SELECT octet_length(docfilecontents)/(1024*1024*100),
   count(*)
   FROM docfile
   GROUP BY octet_length(docfilecontents)/(1024*1024*100);

The results above show that there is one document weighing over 700 MB
(the first column being the multiple of 100MB), one between 500 and
600 MB, one between 400 MB and 500 MB, 10 between 300 and 400 MB, and
so on.

The hex expansion performed by COPY must allocate twice that size,
plus the rest of the row, and if that resulting size is above 1GB, it
will error out with the message you mentioned upthread:
ERROR: invalid memory alloc request size .
So there's no way it can deal with the contents over 500MB, and the
ones just under that limit may also be problematic.

A quick and dirty way of getting rid of these contents would be to
nullify them. For instance, nullify anything over 400MB:

UPDATE docfile SET docfilecontents=NULL
  WHERE octet_length(docfilecontents) > 1024*1024*400;

Or a cleaner solution would be to delete them with the application if
that's possible. You may turn the above query into a SELECT that
retrieve the fields of interest (avoid SELECT * because of the huge
column).


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Trouble Upgrading Postgres

2018-11-05 Thread Daniel Verite
Charles Martin wrote:

> SELECT max(length(docfilecontents::text)) FROM docfile;
> and after a very long time, got:
> ERROR: invalid memory alloc request size 1636085512 SQL state: XX000

It would mean that at least one row has a "docfilecontents"
close to 0.5GB in size. Or that the size fields in certain rows
are corrupted, although that's less plausible if you have
no reason to suspect hardware errors.

Does the following query work:

 SELECT max(octet_length(docfilecontents)) FROM docfile;

or maybe a histogram by size in hundred of megabytes:

 SELECT octet_length(docfilecontents)/(1024*1024*100),
  count(*)
  FROM docfile
  GROUP BY octet_length(docfilecontents)/(1024*1024*100);

Note that the error message above does not say that there's not enough
free memory, it says that it won't even try to allocate that much, because
1636085512 is over the "varlena limit" of 1GB.
AFAICS I'm afraid that this table as it is now cannot be exported
by pg_dump, even if you had enough free memory, because any individual
row in COPY cannot exceed 1GB in text format.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Trouble Upgrading Postgres

2018-11-05 Thread Daniel Verite
Charles Martin wrote:

> >So where is the server located relative to the pg_dump client?
> >On the same machine?
> >If so is it a virtual machine e.g AWS?
> >Across a local or remote network?
> 
> 
> I gave the command in a terminal session after SSHing to the server from
> the same network. It is not a virtual machine.

That means that pg_dump runs on the same machine as the server.

It's plausible that, with only 4GB of RAM, the table that fails
to dump has some very large rows that can't be allocated, especially
since both the backend and pg_dump need to have it simultaneously
in memory.

> pg_dump: The command was: COPY public.docfile (docfile_pkey, 
> docfileoriginalname, ordernumber, versionnum, docfilecontents, 
> docfilepath, docfileextension, enddatetime, endby, editnum, insby, 
> insdatetime, modby, moddatetime, active, doc_fkey) TO stdout;

The "docfilecontents" column suggests that it might contain
large contents. If its type is bytea, it's going to be expanded
to twice its size to build the hex representation.

You may get a sense on how big is the biggest row expressed
as text with this query:
 SELECT max(length(contents.*::text)) FROM public.docfile;

If it's big enough that it might cause the OOM issue,
try to run pg_dump remotely through an SSH tunnel [1], which you
can already do in terms of network permissions since you log in with
SSH, so pg_dump itself does not use any memory on the server.

Also, if the machine doesn't have swap space, it might be
that just adding a few GB's of swap would make the operation
succeed.


[1] https://www.postgresql.org/docs/current/static/ssh-tunnels.html

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: PostgreSQl, PHP and IIS

2018-09-22 Thread Daniel Verite
Mark Williams wrote:

> Can anyone please shed any light on what more I need to do?

Overall, what you did looks good.
I've never used IIS, but when faced with similar problems with Apache,
ProcessExplorer [1] has been quite effective to figure out which
dependent DDLs were missing, where PHP was searching for them,
and the precise errors it encountered.

[1] https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: pg_sample

2018-08-29 Thread Daniel Verite
Naveen Dabas wrote:

> Creating sample schema sampledb1
> DBD::Pg::db do failed: ERROR:  cannot execute CREATE SCHEMA in a read-only
> transaction at ./pg_sample line 296.

Maybe you ran this on a hot standby? It can't work because
this program needs to write data into the database,
which is not allowed outside of the primary node.

Or you may have default_transaction_read_only set to true
for the particular db/user of the connection.
https://www.postgresql.org/docs/current/static/runtime-config-client.html#GUC-DEFAULT-TRANSACTION-READ-ONLY

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Linker errors while creating a PostgreSQL C extension function.

2018-08-20 Thread Daniel Verite
TalGloz wrote:

> Do I have to replace my -shared in the link command with -fPIC?

No, but -fPIC should go into your CXXFLAGS.
The pgxs makefile handles CFLAGS, but as it doesn't do C++,
you're on your own for CXXFLAGS.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Linker errors while creating a PostgreSQL C extension function.

2018-08-19 Thread Daniel Verite
TalGloz wrote:

> If yes then the -lseal is added with the $(LDFLAGS) at the end of the
> command.

But it doesn't happen because LDFLAGS is overriden by the
makefile included just after you set it. The relevant part
copy-pasted from your mail:

   LDFLAGS = -L$(INCLUDE_SEAL_LIB) -llibseal.a -lpthread
   include $(PGXS)
   seal_diff_cpp.so: seal_diff_cpp.o
  $(CXX) -Wl,--no-undefined -shared -o seal_diff_cpp.so
   seal_diff_cpp.o $(LDFLAGS)

As as solution, you could leave LDFLAGS alone and use a different
variable, or include $(PGXS) first and *append* your options to
LDFLAGS if your prefer. 
For instance (replacing -llibseal.a with -lseal while at it):

   SEAL_LDFLAGS = -L$(INCLUDE_SEAL_LIB) -lseal -lpthread
   include $(PGXS)
   seal_diff_cpp.so: seal_diff_cpp.o
  $(CXX) -Wl,--no-undefined -shared -o seal_diff_cpp.so
   seal_diff_cpp.o $(LDFLAGS) $(SEAL_LDFLAGS)


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Postgres - search for value throughout many tables?

2018-08-10 Thread Daniel Verite
Adrien NAYRAT wrote:

> On 08/08/2018 04:09 PM, czezz wrote:
> > Hi everyone,
> > I want to aks if anyone knows is there a way to search for specific 
> > "value" throughout list of tables OR all tables in databse?
> > 
> > Cheers,
> > czezz
> 
> Hello,
> 
> Maybe this article by Daniel could help you : 
> https://blog-postgresql.verite.pro/2017/06/06/global-search.html
> 
> Unfortunately it is not translated, you have to play with a translate tool.

You may also get the code and README from here:
https://github.com/dverite/postgresql-functions/tree/master/global_search


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: pg_upgrade and wraparound

2018-06-27 Thread Daniel Verite
Alexander Shutyaev wrote:

> Is there any hope the issue with pg_upgrade can be resolved? If not,
> could you give me some hints as to how can I decrease time needed
> for pg_dumpall | psql?

Not sure about the larger problem, but for the part about having more
than 1 million large objects that get committed individually, setting
fsync=off for the reload phase is likely to help a lot.

Just don't forget to turn it on again when it's done. See
https://blog.2ndquadrant.com/postgresql-fsync-off-warning-in-config-file/


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: pg_upgrade and wraparound

2018-06-12 Thread Daniel Verite
Andres Freund wrote:

> I'm not entirely clear why pg_restore appears to use a separate
> transaction for each large object, surely exascerbating the problem.

To make sure that per-object locks don't fill up the shared
lock table?
There might be hundreds of thousands of large objects.
If it had to restore N objects per transaction, would it know
how to compute N that is large enough to be effective
and small enough not to exhaust the shared table?

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Problem compiling PostgreSQL.

2018-05-17 Thread Daniel Verite
Paul Linehan wrote:

> Now, I've installed the icu libraries using the classic ./configure, make
> and sudo make install.

So it got installed under /usr/local

> collationcmds.c:(.text+0xe36): undefined reference to `uloc_getAvailable_61'
> collationcmds.c:(.text+0xe5b): undefined reference to
> `uloc_toLanguageTag_61'

61 is ICU's major version number.
You have two ICU versions installed, one from Fedora, with
its icu*.pc config files located in usr/lib64/pkgconfig, the other
(presumably v61, as it's the latest) in /usr/local/lib/pkgconfig/
Without specifying which one should be used, it looks like
Postgres get them mixed between compilation and link.

To use self-compiled ICU, I've been having success
configuring postgres with:

PKG_CONFIG_PATH=/path/to/icu/lib/pkgconfig \
./configure --with-icu [other flags]

I tend to install ICUs versions into their own directories rather
than /usr/local, and use Debian rather than Fedora, but you
might try PKG_CONFIG_PATH=/usr/local/lib/pkgconfig


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Adding AVG to a JOIN

2018-04-23 Thread Daniel Verite
Alexander Farber wrote:

> Here is such a query for the best player
> 
> # SELECT AVG(score) FROM words_moves WHERE uid = 1201;
> avg
> -
> 18.4803525523319868
> 
> However I am not sure, how to "marry" the 2 queries?
> 
> I have tried to add words_moves through another JOIN, but that does not
> work:

You may use a correlated subquery in the SELECT clause,
like this:

 SELECT
   u.elo,
   u.uid,
   (SELECT AVG(score) FROM words_moves WHERE uid=u.uid),
   s.given,
   s.photo


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Conflict between JSON_AGG and COPY

2018-04-09 Thread Daniel Verite
   Đỗ Ngọc Trí Cường wrote:

> I want to export it to a file in JSON format so I run the query as below: 
> COPY (SELECT...) TO '/home/postgres/test1.json'

COPY TO applies its own format rules on top of the data, which include
among other things, quoting the newline characters. What you seem
to expect is to copy a single-row, single-column piece of data verbatim
to the output file, but COPY does not offer that.
Such an option has already been discussed among developers, for
instance in this thread as a 'RAW' format:
https://www.postgresql.org/message-id/CAFj8pRAfer%2Bip3JCMTnvzra2QK7W9hk0hw1YqE8%2B%2BPZA1OqSJw%40mail.gmail.com
but eventually it was not implemented.

The unformatted output can be obtained client-side without COPY:
psql -Atc "SELECT row_to_json(...)"  > /path/to/file

If you really need it server-side, a function that can write a result to
a file is required. I don't think it exists as such out of the box, but
you may either:

-  write your own function in any untrusted language to
do just that (open file, write the piece of text into it, close it).

- or see pg_file_write() from the adminpack contrib module
https://www.postgresql.org/docs/current/static/adminpack.html
With that, you may call, as a superuser:
select pg_file_write('path/to/file.json', 
  (select row_to_json(...))::text,
  false);
But note that the path must be relative to the PGDATA directory.

- or export the data as an intermediary large object at the cost of some
data churn in the large objects. And in that case, the path is not
constrained to postgres data directory.

do $$
declare
 id oid;
 j json;
begin
 j := (select row_to_json(t) from );
 id := lo_from_bytea(0, convert_to(j::text, 'UTF-8'));
 perform lo_export(id, '/path/to/file.json');
 perform lo_unlink(id);
end;
$$ language plpgsql;

For all these server-side methods, you need to be superuser, just like for
COPY TO file anyway.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Must re-connect to see tables

2018-03-27 Thread Daniel Verite
Blake McBride wrote:

> data.sql is an untouched dump of a database from PostgreSQL 9.5.12.
> 
> The import seems to have worked.  I just need another \c after the \i.

An effect of reconnecting is to reset the search_path to what
it was before including your dump file.
You're most certainly affected by the search_path-related
changes in pg_dump done in the latest versions following
CVE-2018-1058
See https://www.postgresql.org/support/security/
or
https://www.postgresql.org/about/news/1834/

Manually issuing
  RESET search_path;
after \i data.sql should get "public" back in the search_path
without reconnecting and after that \dt would work as usual.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Dynamic PL/pgSQL select query: value association propblem

2018-02-16 Thread Daniel Verite
Thiemo Kellner, NHC Barhufpflege wrote:

> > Why you don't create query like
> >
> > EXECUTE 'SELECT xxx FROM TAB WHERE A = $1.x AND B = $1.y'  USING NEW;
> 
> I shall try. This would be the direct way, but I doubt the placeholder  
> $1 can be a record.

It could be written without refering to any individual column:

 IF EXISTS (select 1 from tablename 
 where tablename.* is not distinct from NEW)
 THEN
   -- do something
 END IF;

But since the select cannot see not-yet-committed changes from other
sessions, such a trigger cannot reliably detect duplicates, unless
you make sure that there are no concurrent writes to the table.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: execute block like Firebird does

2018-02-12 Thread Daniel Verite
PegoraroF10 wrote:

> Another approach to solve my problem would be a function that receives a
> dynamic SQL, runs it and returns a XML or JSON and on client side I convert
> that XML back to a recordset. Is that possible ?

Yet another tool that can be handy to transfer polymorphic
results is a cursor through the plpgsql REFCURSORs:

https://www.postgresql.org/docs/10/static/plpgsql-cursors.html

Here's an actual example with a DO block :

  BEGIN;

  DO $$
   DECLARE
 c1 refcursor := 'cursor1';
 c2 refcursor := 'cursor2';
   BEGIN
 OPEN c1 FOR select 'result #1 column 1', 'result #1 column 2';
 OPEN c2 FOR select 'result #2 column 1', 'result #2 column 2';
   END;
  $$ LANGUAGE plpgsql;

  FETCH cursor1;
  FETCH cursor2;

  COMMIT;

The interface is a bit weird because the value of the refcursor variable
is the name of  the underlying SQL cursor object. The name can
also be auto-generated by postgres; the above code uses fixed
names instead. Anyway that cursor, once instanciated in the
DO block, can be fetched from with FETCH statements initiated
client-side or by other server-side code. 

The above code will retrieve two independant resultsets:

postgres=# FETCH cursor1;
  ?column?  |  ?column?  
+
 result #1 column 1 | result #1 column 2
(1 row)

postgres=# FETCH cursor2;
  ?column?  |  ?column?  
+
 result #2 column 1 | result #2 column 2
(1 row)

These cursors disappear at transaction end, or they can be explicitly
closed with CLOSE statements.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: To all who wish to unsubscribe

2017-11-20 Thread Daniel Verite
Magnus Hagander wrote:

> So do you have any suggestions for actually fixing that? Given that we have
> more lists to migrate, if you can figure out a way to make those changes
> without peoples filters not matching, we'd be happy to hear it..

I guess it's a bit late at this point, but in theory keeping the same
List-Id headers as before may help with how many people have to
change their filters.
Using lists.postgresql.org as the new domain was explained
in the announce, but it's not clear why List-Id had to follow
the other List-* fields.
Unless it was the reason mentioned in the last sentence below.

Quoting RFC-2919:

4. Persistence of List Identifiers

   Although the list identifier MAY be changed by the mailing list
   administrator this is not desirable.  (Note that there is no
   disadvantage to changing the description portion of the List-Id
   header.)  A MUA may not recognize the change to the list identifier
   because the MUA SHOULD treat a different list identifier as a
   different list.  As such the mailing list administrator SHOULD avoid
   changing the list identifier even when the host serving the list
   changes.  On the other hand, transitioning from an informal
   unmanaged-list-id-namespace to a domain namespace is an
   acceptable reason to change the list identifier.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite