Re: CLUSTER, VACUUM, and TABLESPACEs (oh my)
> On Jan 24, 2021, at 9:03 PM, Guillaume Lelarge wrote: > >> VACUUM FULL unclusters the table?? > > It will rebuild the table without sorting the data according to the index > used with CLUSTER (without any sorting actually). Thank you for the clarification; that’s very helpful. For the case above. If I CLUSTER a table, add/delete no rows, then VACUUM, will the existing order remain or be changed? Demitri
CLUSTER, VACUUM, and TABLESPACEs (oh my)
Hi, I would like to request a little clarification on the CLUSTER and VACUUM commands. My use case here (partially) is when my disk runs out of space and I want to move a table to a newly created tablespace. These questions somewhat overlap. Let’s say I am starting with a table that is not CLUSTERed on a given index, but I want it to be. * If I run “CLUSTER table USING idx” on a table, is VACUUM FULL required/useful afterwards, or should I assume that the cluster operation did the equivalent of a VACUUM FULL? * If I have previously run a CLUSTER command on a table, will future VACUUM FULL commands rewrite the table in the order specified in the previous CLUSTER? * If I want to move a table to a new tablespace, is it possible to CLUSTER+VACUUM in the same step since the whole table will be rewritten anyway? This would be very useful in low-disk space scenarios. I did find this answer, but it’s dated several years ago and was hoping for something better supported. https://dba.stackexchange.com/a/87457/121020 The documentation is somewhat silent on these details, so I thought I’d ask here. Right now I move a table to a new tablespace, cluster on an index, and then do a full vacuum which results in three full copies of the table being written, which seems less than optimal where one should only be needed as far as I understand things. Cheers, Demitri
Re: Getting "could not read block" error when creating an index on a function.
Thank you for the responses! I was going to go with a materialized view, but then realized that since the dataset is static it’s really no different from just creating a new table and indexing that. The suggestions provide useful advice for the future though. Cheers, Demitri > On Dec 30, 2020, at 3:14 PM, Tom Lane wrote: > > Karsten Hilbert writes: >> Am Wed, Dec 30, 2020 at 02:37:59PM -0500 schrieb Demitri Muna: >>> I want to index the results of these repeated, unchanging calculations to >>> speed up other queries. Which mechanism would be best to do this? Create >>> additional columns? Create another table? > >> A materialized view ? > > Yeah, or you might be able to do something with a before-insert-or-update > trigger that computes whatever desired value you want and fills it into a > derived column. Indexing that column then gives the same results as > indexing the derived expression; but it sidesteps the semantic problems > because the time of computation of the expression is well-defined, even > if it's not immutable. > > You might try to avoid a handwritten trigger by defining a generated > column instead, but we insist that generation expressions be immutable > so it won't really work. (Of course, you could still lie about the > mutability of the expression, but I can't recommend that. Per Henry > Spencer's well-known dictum, "If you lie to the compiler, it will get its > revenge". He was speaking of C compilers, I suppose, but the principle > applies to database query optimizers too.) > > regards, tom lane
Re: Getting "could not read block" error when creating an index on a function.
Hi Tom, > On Dec 30, 2020, at 11:50 AM, Tom Lane wrote: > > I would call this a bug if it were a supported case, but really you are > doing something you are not allowed to. Functions in indexed expressions > are required to be immutable, and a function that looks at the contents of > a table --- particularly the very table that the index is on --- is simply > not going to be that. Marking such a function immutable to try to end-run > around the restriction is unsafe. Thank you, that makes perfect sense. In my mind it was immutable since the database is read-only, but I can see to PG it’s not. Can you suggest an alternate for what I’m trying to do? Given this schema (a “person” has a number of “events”): CREATE TABLE person ( id SERIAL, ... ); CREATE TABLE event ( id SERIAL, patient_id INTEGER event_timestamp TIMESTAMP, … ); I have a function (the one I was trying to index) that returns the earliest event for a person. I’m scanning another table with ~10B rows several times using a few of these “constant” values: * first_event_timestamp(person_id) + ‘1 month’ * first_event_timestamp(person_id) + ‘13 months’ * etc. I want to index the results of these repeated, unchanging calculations to speed up other queries. Which mechanism would be best to do this? Create additional columns? Create another table? Thanks again, Demitri
Getting "could not read block" error when creating an index on a function.
Hello, I’m getting stuck on a problem I was hoping I could get some help with. I’m trying to create an index for the results of a function that touches two tables like this and get the following error: CREATE INDEX my_idx ON mytable (first_time(id)); ERROR: could not read block 0 in file "base/16386/42868": read only 0 of 8192 bytes CONTEXT: SQL function "first_time" during startup Every time I run this, the last number in the block path increments by one, e.g. ERROR: could not read block 0 in file "base/16386/42869": read only 0 of 8192 bytes ERROR: could not read block 0 in file "base/16386/42870": read only 0 of 8192 bytes ERROR: could not read block 0 in file "base/16386/42871": read only 0 of 8192 bytes The database is sitting on two tablespaces (pg_default and ‘data2’). When I try to use the second, I get: CREATE INDEX my_idx ON mytable (first_time(id)) TABLESPACE data2; ERROR: could not read block 0 in file "pg_tblspc/17007/PG_13_202007201/16386/42870": read only 0 of 8192 bytes CONTEXT: SQL function "first_time" during startup with the last number similarly incrementing upon repeated attempts. Relevant details: * PostgreSQL version 13.1 running on Ubuntu 20.04.1 on an AWS instance using 2 x 8TB EBS storage. * The database is ~15TB in size. * I am not worried about data loss; the database can be considered read-only and I have all of the files needed to recreate any table. * I am hoping to not recreate the whole database from scratch since doing so and creating the required indices will take more than a week. * I used these settings while importing the files to speed the process since I was not worried about data loss to improve the import speed (all turned back on after import): autovacuum = off synchronous_commit=off fsync = off full_page_writes = off * I will not do the above again. :) * The postmaster server crashed at least twice during the process due to running out of disk space. * Creating any number of new indices on bare columns is no problem. * I DROPped and recreated the functions with no change. * This statement returns no results (but maybe am looking at the wrong thing): select n.nspname AS schema, c.relname AS realtion from pg_class c inner join pg_namespace n on (c.relnamespace = n.oid) where c.relfilenode = '16386’; From reading about this error (missing block files) it suggests I have some database corruption, which is fine as I can easily delete anything problematic and recreate. I’ve deleted the indices related to the function and recreated them, but the same error remains. Accessing the related tables seems ok, but with that much data I can’t guarantee that. I don’t get any errors. Any help would be appreciated! Cheers, Demitri
Re: ddl_command_end not firing DROP events
Hi, On Jul 8, 2018, at 2:19 PM, Alvaro Herrera wrote: > As I recall, handling of DROP events in the ddl_command_end event is not > completely consistent. You may even find that some DROP events are not > returned by pg_event_trigger_ddl_commands. I suggest you stick to > pg_event_trigger_dropped_objects() instead (for which you need > sql_drop). That's what I'm seeing as well and have a workaround in place that does just that. But to track schema changes requires two triggers and two functions. Which is fine, it works, but I'm wondering if it's a documentation issue or an outright bug. Cheers, Demitri
ddl_command_end not firing DROP events
Hi, I’ve created a trigger where I want to capture schema-changing events. I’m finding that DROP events are not being triggered when using “ddl_command_end". The test case below demonstrates this. I am running PostgreSQL 10.4. The trigger is fired for the CREATE event, but not DROP TYPE or DROP TABLE. I came across the same question on the mailing list, but the solution was to use the “sql_drop” trigger instead. https://www.postgresql.org/message-id/CAHE3wgjX-N%3DX9mccp4Bs-y-%3DHE%3DqayHEpXpCs1jMT%3DW0ZsYc5A%40mail.gmail.com The documentation suggests that one should be able to use “ddl_command_end” instead of creating two triggers (i.e. one for DROPs, another for everything else). Test: CREATE FUNCTION test_event_trigger_for_drops() RETURNS event_trigger LANGUAGE plpgsql AS $$ DECLARE obj record; BEGIN FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands() LOOP RAISE NOTICE 'caught % event on ''%''', obj.command_tag, obj.object_identity; END LOOP; END $$; CREATE EVENT TRIGGER test_event_trigger_for_drops ON ddl_command_end EXECUTE PROCEDURE test_event_trigger_for_drops(); CREATE TYPE foo AS (f1 int, f2 text); DROP TYPE foo; CREATE TABLE drop_test ( pk int ) WITH ( OIDS = FALSE ); DROP TABLE drop_test; -- for ease of copy/paste to delete the above DROP EVENT TRIGGER test_event_trigger_for_drops; DROP FUNCTION test_event_trigger_for_drops; Cheers, Demitri
Re: Location to specify C compiler option in C extension
Hi Tom, > PG_CPPFLAGS ought to work. One point is that you need to set that before > including pgxs.mk; maybe it's an ordering problem? On closer inspection, it was working, but the pg Makefile was specifically including “-Wdeclaration-after-statement” which I think was overriding the ‘-std=c99’ part. The line below fixed the problem. PG_CPPFLAGS += -std=c99 -Wno-declaration-after-statement Thanks, sorry for the noise. Demitri
Location to specify C compiler option in C extension
Hi, I’m writing a C extension and have successfully based my Makefile on existing templates. This conveniently hides the details of the full Makefile provided by "pg_config --pgxs”. Which variable would be the appropriate one to specify the ‘-std=c99’ compiler flag? I’ve tried many and still get warnings that this flag should suppress (specifically, "ISO C90 forbids mixed declarations and code”). These are the variables I currently define: EXTENSION DATA PGFILEDESC MODULE_big PG_CPPFLAGS (this one seemed like the right place) SHLIB_LINK OBJS PG_CONFIG PGXS A workaround has been: OBJS = file1.o file2.o CPPFLAGS+=-Wdeclaration-after-statement but it’s not exactly the same thing. Thanks, Demitri
Re: Getting a primitive numeric value from "DatumGetNumeric"?
Hi, On Feb 22, 2018, at 9:31 PM, Michael Paquier wrote: > PG_GETARG_NUMERIC(), no? That function returns an object of datatype “Numeric” which still requires some (not immediately obvious) conversation to a double (or whatever primitive C type). > When working on implementing your own data > types or when hacking out functions which manipulate arguments of an > existing datatype, looking at the input and output functions help a > lot. In your case, numeric_in and numeric_out in > src/backend/utils/adt/numeric.c is full of hints. I spent an hour diving into the code out of curiosity. I found useful functions like this: double numeric_to_double_no_overflow(Numeric n) They’re available from the PostgreSQL main source code, but not exposed in the public headers. (Maybe I was missing something.) There was enough there where I could see a way to copy/paste or otherwise link to those methods, but as Tom pointed out, implicit coercion handles what I need so I’ll stick with that. Cheers, Demitri
Re: Getting a primitive numeric value from "DatumGetNumeric"?
Hi Tom, On Feb 20, 2018, at 10:54 PM, Tom Lane wrote: > Well, the implicit coercions work in your favor in this particular case. Ah, I wasn’t aware of implicit coercion. Yes, that solves the problem perfectly, thanks. Out of curiosity, how does one read a numeric type from within a C extension (i.e. get a number value out of the Datum type)? I ask as I was unable to find an example and there are a few open questions on Stack Overflow (e.g. https://stackoverflow.com/questions/12588554/postgres-c-function-passing-returning-numerics). Thanks, Demitri
Getting a primitive numeric value from "DatumGetNumeric"?
Hi, I’m writing a C extension for PostgreSQL. One possible input datatype for my function is a numeric array, e.g. ARRAY[[1.5,2.5],[3.5,4.5]]. I can use “DatumGetNumeric” to extract a “Numeric” data type from the data, but at some point I need to convert this to a number (e.g. double) so that I can do mathy things with it. How does one convert a “Numeric” to, say, a double? I have a workaround in that I can pass this to my function: ARRAY[[1.5,2.5],[3.5,4.5]]::float8[] but I’d rather have the code do that instead of bothering the user to remember that. Thanks, Demitri