[GENERAL] synchronize DTAP
Hi, We have an environment that has a central repository for lookups, which is replicated to several databases, ech for different applications. This has been arranged in a DTAP manner. Sometimes it is necessary to synchronize the lookups of one of the DTAP branches with another. But i can't just overwrite one database with a dump from another branch, as the consumer databases will not follow. What i think i need is a way to compute the differences between two databases that have the same schema, and generate insert/update/delete statements from that. Since this seems as a pretty generic problem, i thought that i should ask around before i start writing my own scripts. Does anyone know of script or application that does this? Cheers -- Willy-Bas Loos
Re: [GENERAL] BDR Global Sequences
Hi, On 2014-09-29 13:52:52 -0700, p...@cmicdo.com wrote: I have a question about BDR Global Sequences. I've been playing with BDR on PG 9.4beta2, built from source from the 2nd Quadrant GIT page (git://git.postgresql.org/git/2ndquadrant_bdr.git). When trying a 100 row \copy-in, letting PG choose the global sequence values, I get ERROR: could not find free sequence value for global sequence public.myseq, as documented...no surprise there. However, the number of rows I can load before the error varies wildly with each trial. Yea, it depends on how quick the refilling starts and how quickly it can keep up. The next version hopefull will start to fill up a bit quicker. Is there way to increase a global sequence's reservation block for each node so that I can tell the nodes, I'm going to load 100M rows now so you should get ready for that.? Not yet, but we're planning to add that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] table versioning approach (not auditing)
Hey yes i'm adding an additional key to each of my tables. First i wanted to use the primary key as one column in my audit_log table, but in some of my tables the PK consists of more than one column. Plus it's nice to have one key that is called the same over all tables. To get a former state for one row at date x I need to join the latest delta BEFORE date x with each delta AFTER date x. If I would log complete rows, this joining part would not be neccessary, but as I usually work with spatial databases that have complex geometries and also image files, this strategy is too harddisk consuming. If there are more users following a similar approach, I wonder why we not throw all the good ideas together, to have one solution that is tested, maintained and improved by more developpers. This would be great. Felix Gesendet: Montag, 29. September 2014 um 23:25 Uhr Von: Abelard Hoffman abelardhoff...@gmail.com An: Felix Kunde felix-ku...@gmx.de Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org Betreff: Re: [GENERAL] table versioning approach (not auditing) Thank you Felix, Gavin, and Jonathan for your responses. Felix Jonathan: both of you mention just storing deltas. But if you do that, how do you associate the delta record with the original row? Where's the PK stored, if it wasn't part of the delta? Felix, thank you very much for the example code. I took a look at your table schemas. I need to study it more, but it looks like the way you're handling the PK, is you're adding a separate synthethic key (audit_id) to each table that's being versioned. And then storing that key along with the delta. So then to find all the versions of a given row, you just need to join the audit row with the schema_name.table_name.audit_id column. Is that right? The only potential drawback there is there's no referential integrity between the audit_log.audit_id and the actual table. I do like that approach very much though, in that it eliminates the need to interrogate the json data in order to perform most queries. AH On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde felix-ku...@gmx.de wrote:Hey i've also tried to implement a database versioning using JSON to log changes in tables. Here it is: https://github.com/fxku/audit[https://github.com/fxku/audit] I've got two versioning tables, one storing information about all transactions that happened and one where i put the JSON logs of row changes of each table. I'm only logging old values and not complete rows. Then I got a function that recreates a database state at a given time into a separate schema - either to VIEWs, MVIEWs or TABLES. This database state could then be indexed in order to work with it. You can also reset the production state to the recreated past state. Unfortunately I've got no time to further work on it at the moment + I have not done tests with many changes in the database so I can't say if the recreation process scales well. On downside I've realised is that using the json_agg function has limits when I've got binary data. It gets too long. So I'm really looking forward using JSONB. There are more plans in my mind. By having a Transaction_Log table it should be possible to revert only certain transactions. I'm also thinking of parallel versioning, e.g. different users are all working with their version of the database and commit their changes to the production state. As I've got a unique history ID for each table and each row, I should be able to map the affected records. Have a look and tell me what you think of it. Cheers Felix Gesendet: Montag, 29. September 2014 um 04:00 Uhr Von: Abelard Hoffman abelardhoff...@gmail.com An: pgsql-general@postgresql.org pgsql-general@postgresql.org Betreff: [GENERAL] table versioning approach (not auditing) Hi. I need to maintain a record of all changes to certain tables so assist in viewing history and reverting changes when necessary (customer service makes an incorrect edit, etc.). I have studied these two audit trigger examples: https://wiki.postgresql.org/wiki/Audit_trigger[https://wiki.postgresql.org/wiki/Audit_trigger] https://wiki.postgresql.org/wiki/Audit_trigger_91plus I've also read about two other approaches to versioning: 1. maintain all versions in one table, with a flag to indicate which is the current version 2. have a separate versions table for each real table, and insert into the associated version table whenever an update or insert is done. My current implementation is based on the wiki trigger examples, using a single table, and a json column to record the row changes (rather than hstore). What I like about that, in particular, is I can have a global, chronological view of all versioned changes very easily. But there are two types of queries I need to run. 1. Find all changes made by a specific user 2. Find all changes related to a specific record #1 is simple to do. The versioning table has a user_id
Re: [GENERAL] BDR Global Sequences
Hi Andres, Hi, On 2014-09-29 13:52:52 -0700, p...@cmicdo.com wrote: I have a question about BDR Global Sequences. [deleted] Is there way to increase a global sequence's reservation block for each node so that I can tell the nodes, I'm going to load 100M rows now so you should get ready for that.? Not yet, but we're planning to add that. Good to hear. In the meantime, is there something I can hack to force the nodes to make a sequence allocation of my choosing (even if just hardwired?) I was playing with start_elections_sql where it says: generate_series(\n current_max,\n -- 1000 is the chunk size, -1 is to get instead = out of generate_series\n current_max + 1000 * (5 - open_seq_chunks) - 1,\n 1000) chunk_start\n and hoping that bumping up the 1000 would help, but apparently not. PJ Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PANIC: could not create file pg_xlog/xlogtemp.7884: No space left on device
Roopeshakumar Narayansa Shalgar (rshalgar) wrote: Hi, I am using version 9.3.1 and see the “no space device error” even though there is enough space (99% free) on my disk. Just to be sure, check the output of both 'df -h' (for disk blocks) and 'df -hi' (for inodes). You might have ran out of inodes... Eric signature.asc Description: OpenPGP digital signature
Re: [GENERAL] [SQL] pg_multixact issues
On Fri, Sep 26, 2014 at 1:36 PM, Dev Kumkar devdas.kum...@gmail.com wrote: Received the database with huge pg_multixact directory of size 21G and there are ~82,000 files in pg_multixact/members and 202 files in pg_multixact/offsets directory. Did run vacuum full on this database and it was successful. However now am not sure about pg_multixact directory. truncating this directory except file results into database start up issues, of course this is not correct way of truncating. FATAL: could not access status of transaction 13224692 Stumped ! Please provide some comments on how to truncate pg_multixact files and if there is any impact because of these files on database performance. Facing this issue on couple more machines where pg_multixact is huge and not being cleaned up. Any suggestions / troubleshooting tips? Regards...
Re: [GENERAL] [SQL] pg_multixact issues
Dev Kumkar wrote: On Fri, Sep 26, 2014 at 1:36 PM, Dev Kumkar devdas.kum...@gmail.com wrote: Received the database with huge pg_multixact directory of size 21G and there are ~82,000 files in pg_multixact/members and 202 files in pg_multixact/offsets directory. Did run vacuum full on this database and it was successful. However now am not sure about pg_multixact directory. truncating this directory except file results into database start up issues, of course this is not correct way of truncating. FATAL: could not access status of transaction 13224692 Stumped ! Please provide some comments on how to truncate pg_multixact files and if there is any impact because of these files on database performance. Facing this issue on couple more machines where pg_multixact is huge and not being cleaned up. Any suggestions / troubleshooting tips? Did you try decreasing the autovacuum_multixact_freeze_min_age and autovacuum_multixact_freeze_table_age parameters? What exact server version are you running? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] synchronize DTAP
Gidday, There was an interesting presentation at the Portland Postgres Users Group meeting in early Sept, from a guy who demo'd a Postgres database mounted as a FUSE filesystem. Not production ready, but with tables manifesting as directories, databases could be synch'ed using filesystem tools like rsynch - which offers intriguing backup replication possibilities. http://vimeo.com/105493143 the demo of the FUSE functionality starts at 39 minutes into the presentation. Brent Wood Programme leader: Environmental Information Delivery NIWA DDI: +64 (4) 3860529 Brent Wood Principal Technician - GIS and Spatial Data Management Programme Leader - Environmental Information Delivery +64-4-386-0529 | 301 Evans Bay Parade, Greta Point, Wellington | www.niwa.co.nzhttp://www.niwa.co.nz [NIWA]http://www.niwa.co.nz From: pgsql-general-ow...@postgresql.org pgsql-general-ow...@postgresql.org on behalf of Willy-Bas Loos willy...@gmail.com Sent: Tuesday, September 30, 2014 8:58 PM To: pgsql-general@postgresql.org Subject: [GENERAL] synchronize DTAP Hi, We have an environment that has a central repository for lookups, which is replicated to several databases, ech for different applications. This has been arranged in a DTAP manner. Sometimes it is necessary to synchronize the lookups of one of the DTAP branches with another. But i can't just overwrite one database with a dump from another branch, as the consumer databases will not follow. What i think i need is a way to compute the differences between two databases that have the same schema, and generate insert/update/delete statements from that. Since this seems as a pretty generic problem, i thought that i should ask around before i start writing my own scripts. Does anyone know of script or application that does this? Cheers -- Willy-Bas Loos
Re: [GENERAL] table versioning approach (not auditing)
Felix, I'd love to see a single, well maintained project. For example, I just found yours, and gave it a shot today after seeing this post. I found a bug when an update command is issued, but the old and new values are all the same. The trigger will blow up. I've got a fix for that, but if we had one project that more than a handful of people used, stuff like that would be quashed very quickly. I love the design of it by the way. Any idea what it will take to move to JSONB for 9.4? On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde felix-ku...@gmx.de wrote: Hey yes i'm adding an additional key to each of my tables. First i wanted to use the primary key as one column in my audit_log table, but in some of my tables the PK consists of more than one column. Plus it's nice to have one key that is called the same over all tables. To get a former state for one row at date x I need to join the latest delta BEFORE date x with each delta AFTER date x. If I would log complete rows, this joining part would not be neccessary, but as I usually work with spatial databases that have complex geometries and also image files, this strategy is too harddisk consuming. If there are more users following a similar approach, I wonder why we not throw all the good ideas together, to have one solution that is tested, maintained and improved by more developpers. This would be great. Felix Gesendet: Montag, 29. September 2014 um 23:25 Uhr Von: Abelard Hoffman abelardhoff...@gmail.com An: Felix Kunde felix-ku...@gmx.de Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org Betreff: Re: [GENERAL] table versioning approach (not auditing) Thank you Felix, Gavin, and Jonathan for your responses. Felix Jonathan: both of you mention just storing deltas. But if you do that, how do you associate the delta record with the original row? Where's the PK stored, if it wasn't part of the delta? Felix, thank you very much for the example code. I took a look at your table schemas. I need to study it more, but it looks like the way you're handling the PK, is you're adding a separate synthethic key (audit_id) to each table that's being versioned. And then storing that key along with the delta. So then to find all the versions of a given row, you just need to join the audit row with the schema_name.table_name.audit_id column. Is that right? The only potential drawback there is there's no referential integrity between the audit_log.audit_id and the actual table. I do like that approach very much though, in that it eliminates the need to interrogate the json data in order to perform most queries. AH On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde felix-ku...@gmx.de wrote:Hey i've also tried to implement a database versioning using JSON to log changes in tables. Here it is: https://github.com/fxku/audit[https://github.com/fxku/audit] I've got two versioning tables, one storing information about all transactions that happened and one where i put the JSON logs of row changes of each table. I'm only logging old values and not complete rows. Then I got a function that recreates a database state at a given time into a separate schema - either to VIEWs, MVIEWs or TABLES. This database state could then be indexed in order to work with it. You can also reset the production state to the recreated past state. Unfortunately I've got no time to further work on it at the moment + I have not done tests with many changes in the database so I can't say if the recreation process scales well. On downside I've realised is that using the json_agg function has limits when I've got binary data. It gets too long. So I'm really looking forward using JSONB. There are more plans in my mind. By having a Transaction_Log table it should be possible to revert only certain transactions. I'm also thinking of parallel versioning, e.g. different users are all working with their version of the database and commit their changes to the production state. As I've got a unique history ID for each table and each row, I should be able to map the affected records. Have a look and tell me what you think of it. Cheers Felix Gesendet: Montag, 29. September 2014 um 04:00 Uhr Von: Abelard Hoffman abelardhoff...@gmail.com An: pgsql-general@postgresql.org pgsql-general@postgresql.org Betreff: [GENERAL] table versioning approach (not auditing) Hi. I need to maintain a record of all changes to certain tables so assist in viewing history and reverting changes when necessary (customer service makes an incorrect edit, etc.). I have studied these two audit trigger examples: https://wiki.postgresql.org/wiki/Audit_trigger[https://wiki.postgresql.org/wiki/Audit_trigger] https://wiki.postgresql.org/wiki/Audit_trigger_91plus I've also read about two other approaches to versioning: 1. maintain all versions in one table, with a flag to indicate which is the current version 2. have a separate versions table for
Re: [GENERAL] [SQL] pg_multixact issues
On Tue, Sep 30, 2014 at 8:50 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Did you try decreasing the autovacuum_multixact_freeze_min_age and autovacuum_multixact_freeze_table_age parameters? As per the docs this set anywhere from zero to 1 billion for vacuum_multixact_freeze_min_age And zero to 2 billion for vacuum_multixact_freeze_table_age. Modified this to have value 10 and 15 respectively. Not sure if that's correct way of setting these parameters? What exact server version are you running? Am using PostgreSQL 9.3.4 (linux-64-bit) Regards...
[GENERAL] ability to return number of rows inserted into child partition tables request
Hello. I was trying to get postgres to return the correct number of rows inserted for batch inserts to a partitioned table [using the triggers as suggested here http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html results in it always returning 0 by default]. What I ideally wanted it to do is to be able to insert into just the child partitions, and return number of rows updated. It seems the state of the art is either to return the NEW row from the insert trigger [which causes it to also be saved to the parent master table], then define an extra trigger to remove the parent table. So 2 inserts and 1 delete for an insert. [1] Or you can use an unconditional rule and it will return the number of rows updated [however, in this case, since we're using partitioning, we I think need multiple rules, once for each child table]. It is possible for a view to use a trigger and still return the number of rows updated, which provides another work around. (See bottom of [1]). Is there some more elegant way here? It seems odd that partitioned tables basically cannot, without a *lot* of massaging, return number of rows updated, am I missing something or do I understand ok? [Today this requires people to put in lots of work arounds, like *not* checking for number of rows returned for batch inserts, etc.-- potentially dangerous as well] Is there, for instance, some work around, like a way to manually cause the count of the number of rows affected by the command to be incremented here? Or possibly conditional rules could be made possible to return the output string with number of rows affected (feature request)? I guess this has come up before, FWIW. http://grokbase.com/t/postgresql/pgsql-general/0863bjzths/insert-into-master-table-0-rows-affected-hibernate-problems Thanks! -roger- [1] http://stackoverflow.com/questions/83093/hibernate-insert-batch-with-partitioned-postgresql PS if no response I'll forward this on to pghackers.
[GENERAL] improving speed of query that uses a multi-column filter ?
I'm trying to improve the speed of suite of queries that go across a few million rows. They use 2 main filters across a variety of columns: WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR (col_3 = col_1)) WHERE (col_1 IS True ) AND (col_2 IS True) AND (col_3 IS True) OR (col_4 IS NULL) I created a dedicated multi-column index for each query to speed them up. That was great. I still don't have the performance where I want it to be - the size of the index seems to be an issue. If the index were on one column, instead of 4, I think the scans would complete in time. i looked online and the archives, and couldn't find much information on good strategies to deal with this. It looks like my best option is to somehow index on the interpretation of this criteria, and not the criteria itself. the two ways that come to mind are: 1. alter the table: adding a boolean column for each filter-test to the table, index that, then query for that field 2. leave the table as-is: write a custom function for each filter, and then use a function index has anyone else encountered a need like this? are there any tips / tricks / things I should look out for. are there better ways to handle this? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] improving speed of query that uses a multi-column filter ?
On 9/30/2014 4:50 PM, Jonathan Vanasco wrote: WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR (col_3 = col_1)) if col_1 IS NULL, then that OR condition doesn't make much sense. just saying... these 4 columns are all nullable booleans, so they can be TRUE, FALSE, or NULL ? with 4 columns, there's 3^4 = 81 possible combinations of these values...you might get better speeds encoding this as a single SHORT INTEGER, and enumerating those 81 states, then just do equals or IN (set of values) conditions... of course, this might make a lot of OTHER code more complicated. It might be easier to make each col_X 2 bits of this integer, such that one bit indicates the value was 'NULL', and the other bit is the true/false state if that first bit isn't set, this would make testing individual bits somewhat better. -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] improving speed of query that uses a multi-column filter ?
On Sep 30, 2014, at 8:04 PM, John R Pierce pie...@hogranch.com wrote: if col_1 IS NULL, then that OR condition doesn't make much sense. just saying... I was just making a quick example. There are two commonly used filter sets, each are mostly on Bool columns that allow null -- but one checks to see if the row references itself in a particular column. these 4 columns are all nullable booleans, so they can be TRUE, FALSE, or NULL ? Most of them, yes. with 4 columns, there's 3^4 = 81 possible combinations of these values... you might get better speeds encoding this as a single SHORT INTEGER, and enumerating those 81 states, then just do equals or IN (set of values) conditions... of course, this might make a lot of OTHER code more complicated. It might be easier to make each col_X 2 bits of this integer, such that one bit indicates the value was 'NULL', and the other bit is the true/false state if that first bit isn't set, this would make testing individual bits somewhat better. That's interesting. I never thought of how Postgres processes the data. For legacy reasons, I can't change the data types -- but I can add additional columns. So I could do a trigger/function that manages a filter_test column that is an int, give each filter a bit value, and then just run a scan on that. It wouldn't be much more work to test that and dedicated Bool columns for each filter. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] improving speed of query that uses a multi-column filter ?
On Wednesday, October 1, 2014, Jonathan Vanasco postg...@2xlp.com wrote: I'm trying to improve the speed of suite of queries that go across a few million rows. They use 2 main filters across a variety of columns: WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR (col_3 = col_1)) WHERE (col_1 IS True ) AND (col_2 IS True) AND (col_3 IS True) OR (col_4 IS NULL) I created a dedicated multi-column index for each query to speed them up. That was great. I still don't have the performance where I want it to be - the size of the index seems to be an issue. If the index were on one column, instead of 4, I think the scans would complete in time. i looked online and the archives, and couldn't find much information on good strategies to deal with this. It looks like my best option is to somehow index on the interpretation of this criteria, and not the criteria itself. the two ways that come to mind are: 1. alter the table: adding a boolean column for each filter-test to the table, index that, then query for that field 2. leave the table as-is: write a custom function for each filter, and then use a function index has anyone else encountered a need like this? are there any tips / tricks / things I should look out for. are there better ways to handle this? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org javascript:;) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general Have you considered maybe partial indexes? http://www.postgresql.org/docs/9.3/static/indexes-partial.html I.e idx1 on pk column of the table with where inside index exactly the same as your first where Idx2 on pk column with where inside index as second where Cheers, Misa