Re: Please recommend postgresql.conf improvements for osm2pgsql loading Europe

2024-03-31 Thread Alexander Farber
Thank you for the insights


Re: Feature request: pg_get_tabledef(text)

2024-03-31 Thread Hans Schou
On Wed, Nov 22, 2023 at 5:09 PM Laurenz Albe 
wrote:

>
> One of the problems is what should be included.
> Indexes?  Policies?  Constraints?
>

A high limit could be all objects except data.
All the objects which would be deleted by a 'DROP TABLE'.

Maybe including 'CASCADE'?

No unsurmountable questions, but someone would have to come up with a
> clear design and implement it.
>

I gave it a try.
I'm not that skilled in plpgsql so there is probably room for improvement.

https://github.com/chlordk/pg_get_tabledef

For your convenience here is a copy/paste of the function.

CREATE OR REPLACE FUNCTION pg_get_tabledef(TEXT)
RETURNS TEXT
LANGUAGE plpgsql
AS $_$
-- pg_get_tabledef ( text ) → text
-- Reconstructs the underlying CREATE command for a table and objects
related to a table.
-- (This is a decompiled reconstruction, not the original text of the
command.)
DECLARE
R TEXT; -- Return result
R_c TEXT; -- Comments result, show after table definition
rec RECORD;
tmp_text TEXT;
v_oid OID; -- Table object id
v_schema TEXT; -- Schema
v_table TEXT; -- Table name
rxrelname TEXT;
BEGIN
rxrelname :=  '^(' || $1 || ')$';
-- Get oid and schema
SELECT
c.oid, n.nspname, c.relname
INTO
v_oid, v_schema, v_table
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname OPERATOR(pg_catalog.~) rxrelname COLLATE
pg_catalog.default
AND pg_catalog.pg_table_is_visible(c.oid);
-- If table not found exit
IF NOT FOUND THEN
-- RAISE EXCEPTION 'Table % not found', $1;
RETURN '-- Table not found: ''' || $1 || ;
END IF;
-- Table comment first, columns comment second, init variable R_c,
SELECT obj_description(v_oid) INTO tmp_text;
IF LENGTH(tmp_text) > 0 THEN
R_c := 'COMMENT ON TABLE ' || v_schema || '."' || v_table || '" IS
''' || tmp_text || ''';' || E'\n';
ELSE
R_c := '';
END IF;
R := 'CREATE TABLE ' || v_schema || '."' || v_table || '" (';
-- Get columns
FOR rec IN
SELECT
a.attname,
pg_catalog.format_type(a.atttypid, a.atttypmod),
(SELECT pg_catalog.pg_get_expr(d.adbin, d.adrelid, true)
 FROM pg_catalog.pg_attrdef d
 WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND
a.atthasdef),
a.attnotnull,
(SELECT c.collname FROM pg_catalog.pg_collation c,
pg_catalog.pg_type t
 WHERE c.oid = a.attcollation AND t.oid = a.atttypid AND
a.attcollation <> t.typcollation) AS attcollation,
a.attidentity,
a.attgenerated,
a.attnum
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = v_oid AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum
LOOP
--RAISE NOTICE '% % %', rec.attnum, rec.attname, rec.format_type;
IF rec.attnum > 1 THEN
R := R || ','; -- no comma after last column definition
END IF;
R := R || E'\n' || '"' || rec.attname || '" ' ||
rec.format_type;
IF rec.attnotnull THEN
R := R || ' NOT NULL';
END IF;
-- Comment on column
SELECT col_description( v_oid, rec.attnum) INTO tmp_text;
IF LENGTH(tmp_text) > 0 THEN
R_c := R_c || 'COMMENT ON COLUMN ' || v_schema || '."' ||
v_table || '.' || rec.attname || '" IS ''' || tmp_text || ''';' || E'\n';
END IF;
END LOOP; -- Columns
-- Finalize table
R := R || E'\n' || ');' || E'\n';
-- Add COMMENTs
IF LENGTH(R_c) > 0 THEN
R := R || R_c;
END IF;
-- Index
FOR rec IN
SELECT
pg_catalog.pg_get_indexdef(i.indexrelid, 0, true) AS indexdef
FROM pg_catalog.pg_class c, pg_catalog.pg_class c2,
pg_catalog.pg_index i
LEFT JOIN pg_catalog.pg_constraint con ON (conrelid = i.indrelid
AND conindid = i.indexrelid AND contype IN ('p','u','x'))
WHERE c.oid = v_oid AND c.oid = i.indrelid AND i.indexrelid = c2.oid
ORDER BY i.indisprimary DESC, c2.relname
LOOP
R := R || rec.indexdef || ';' || E'\n';
END LOOP; -- Index
RETURN R;
END;
$_$;



-- 
핳햆햓햘 핾했햍햔햚
☏ ➁➁ ➅➃ ➇⓪ ➁⓪


Re: how to check if the license is expired.

2024-03-31 Thread Christophe Pettus



> On Mar 31, 2024, at 09:59, Peter J. Holzer  wrote:
> Is this an acceptable performance penalty per API call? If not, is it
> really necessary to check this on every call? Maybe it can be done just
> once per session or once per hour.

It's probably not required to check it every API call.  Two places come to mind:

_PG_init -- Although I don't know the possibility or wisdom of reading from a 
file there.
shmem_startup_hook -- It's definitely OK to read from a file there.

Remember that you have the full ability to crash PostgreSQL in an extension, so 
it really needs to be bulletproof.  You don't want the shared library to fail 
to load if the license isn't valid.  Instead:

-- If the functionality is exposed as functions, return an error when one of 
those functions is used, if the extension is not licensed.
-- If the functionality modifies PostgreSQL behavior, disable that modification.

If you are validating the license via a network call... I would counsel against 
having a network call as part of PostgreSQL's startup process.  It might work 
to make the call on-demand the first time the extension is used, and the result 
stored locally, although I would limit that to function calls rather than, say, 
a hook into the planner.

You also need to decide exactly how you want to distribute this extension.  
Most PostgreSQL extensions are supplied as source and built against PostgreSQL.



Re: how to check if the license is expired.

2024-03-31 Thread Peter J. Holzer
On 2024-03-30 09:23:15 -0400, Ron Johnson wrote:
> On Sat, Mar 30, 2024 at 9:15 AM 黄宁  wrote:
> 
> I want to develop a postgresql paid extension, then there is a local
> license file, how do I check if the license file is expired, check it once
> at each api execution, will that affect the performance of the api, is
> there any other way?
> 
> 
> What you're really asking is "how do I read a file from an extension?".
>  

We often chide users for falling into the "XY problem"[1] trap, so think
it's nice that 黄宁 asks about the bigger picture.

I can't help with the extension (never wrote one), but a few thoughts:

Basically I see three ways to get at the license information:

* from a file (as mentioned)
* from a database table
* over the network (e.g. from a license server)

On my (not very fast) laptop I can open and read a small text file in
about 25 µs. Selecting one row from a small database table takes about 100
µs, which is quite a bit slower but I tested that from an external
process. A stored procedure would be faster than that and possibly even
faster than the file access. A query over the network is unlikely to be
faster.

Plus of course you need to check the contents, which likely involves
some cryptographic operation. Checking a 2048 bit RSA signature takes
about 30 µs on my laptop, most other algorithms are slower (unless you
go with a simple HMAC which wouldn't be secure).

So realistically we are somewhere in the 50 to 200 µs range.

Is this an acceptable performance penalty per API call? If not, is it
really necessary to check this on every call? Maybe it can be done just
once per session or once per hour.

hp


[1] You have problem X and think that Y is part of the solution. So you
ask how to achieve Y. However, Z would be better than Y for solving
X, but nobody can tell you because they don't know about X.

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Please recommend postgresql.conf improvements for osm2pgsql loading Europe

2024-03-31 Thread hector vass
On Sat, 30 Mar 2024, 10:04 Alexander Farber, 
wrote:

> Thank you, Justin -
>
> On Sat, Mar 30, 2024 at 4:33 AM Justin Clift 
> wrote:
>
>> On 2024-03-30 05:53, Alexander Farber wrote:
>> > I use the following postgresql.conf in my Dockerfile
>> > ( the full version at https://stackoverflow.com/a/78243530/165071 ),
>> > when loading a 28 GByte large europe-latest.osm.pbf
>>
>> Not specific conf file improvements, but for an initial data load
>> have you done things like turning off fsync(), deferring index
>> creating until after the data load finishes, and that kind of thing?
>>
>
> I will try the following commands in my Dockerfile then
> and later report back on any improvements:
>
> RUN set -eux && \
> pg_ctl init && \
> echo "shared_buffers = 1GB">> $PGDATA/postgresql.conf
> && \
> echo "work_mem = 50MB" >> $PGDATA/postgresql.conf
> && \
> echo "maintenance_work_mem = 10GB" >> $PGDATA/postgresql.conf
> && \
> echo "autovacuum_work_mem = 2GB"   >> $PGDATA/postgresql.conf
> && \
> echo "wal_level = minimal" >> $PGDATA/postgresql.conf
> && \
> echo "checkpoint_timeout = 60min"  >> $PGDATA/postgresql.conf
> && \
> echo "max_wal_size = 10GB" >> $PGDATA/postgresql.conf
> && \
> echo "checkpoint_completion_target = 0.9"  >> $PGDATA/postgresql.conf
> && \
> echo "max_wal_senders = 0" >> $PGDATA/postgresql.conf
> && \
> echo "random_page_cost = 1.0"  >> $PGDATA/postgresql.conf
> && \
> echo "password_encryption = scram-sha-256" >> $PGDATA/postgresql.conf
> && \
> echo "fsync = off">>
> $PGDATA/postgresql.conf && \
> pg_ctl start && \
> createuser --username=postgres $PGUSER && \
> createdb --username=postgres --encoding=UTF8 --owner=$PGUSER
> $PGDATABASE && \
> psql --username=postgres $PGDATABASE --command="ALTER USER $PGUSER
> WITH PASSWORD '$PGPASSWORD';" && \
> psql --username=postgres $PGDATABASE --command='CREATE EXTENSION IF
> NOT EXISTS postgis;' && \
> psql --username=postgres $PGDATABASE --command='CREATE EXTENSION IF
> NOT EXISTS hstore;' && \
> osm2pgsql --username=$PGUSER --database=$PGDATABASE --create
> --cache=6 --hstore --latlong /data/map.osm.pbf && \
> rm -f /data/map.osm.pbf && \
> pg_ctl stop && \
> echo "fsync = on">>
> $PGDATA/postgresql.conf && \
> echo '# TYPE DATABASE USER ADDRESS METHOD'>
> $PGDATA/pg_hba.conf && \
> echo "local all postgres peer"   >>
> $PGDATA/pg_hba.conf && \
> echo "local $PGDATABASE $PGUSER   scram-sha-256" >>
> $PGDATA/pg_hba.conf && \
> echo "host  $PGDATABASE $PGUSER 0.0.0.0/0 scram-sha-256" >>
> $PGDATA/pg_hba.conf
>
> The later fsync = on will override the former, right?
>
> Best regards
> Alex
>
>
>

2hrs sounds reasonable for Europe, it's a big place in terms of osm data
and osm2pgsql is doing processing to convert to geometry objects prior to
doing anything on the Postgresql side.
If you examine the --log--sql output for a small test country you can see
what it does in terms of the postgresql.
osm2pgsql gives options to trim the output to only what you need (so if you
don't want waterways, traffic features, parking places or places of worship
etc.. why load them)
Hopefully you have found the excellent geofabrik
https://download.geofabrik.de/ source for osm data.
Rather than load this data afresh each update cycle you would be better off
simply loading the changes so the .osc files or ... osm osmosis will create
the equivalent of a diff file for you
Looks like you are already using osm2psql's recommended postgresql.config
settings, I'd be surprised if this was way off.  Getting as close to tin
rather than virtual machines and containers will also help, lots of io
going on here.
If you are only interested in the geography you might consider geofabrik's
shapefile available for many countries, they have already done some of the
work for you.

Apologies if you are already a long way down this route & just asking about
the final stage of loading the osm2pgsql output to Postgresql but however
well you do here I would only expect small marginal gains.


Re: Please recommend postgresql.conf improvements for osm2pgsql loading Europe

2024-03-31 Thread Justin Clift

On 2024-03-31 04:07, Alexander Farber wrote:
Turning fsync = off has resulted in no noticable build time reduction 
for
my Dockerfile with OSM Europe data, but still thank you for the 
suggestion!


No worries. :)

With this import you're doing, is it something that will be repeated a 
lot

with the exact same data set, or is this a once off thing?

If it's something that'll be repeated a lot (maybe part of some 
automated

process?), then it might be worth making a backup / snapshot / something
of the database after the import has completed.

With a backup or snapshot in place (depends on the storage you're 
using),
you could potentially load things from that backup / snapshot (etc) 
instead

of having to do the import all over again each time.

Regards and best wishes,

Justin Clift