[GENERAL] synchronize DTAP

2014-09-30 Thread Willy-Bas Loos
Hi,

We have an environment that has a central repository for lookups, which is
replicated to several databases, ech for different applications.
This has been arranged in a DTAP manner.

Sometimes it is necessary to synchronize the lookups of one of the DTAP
branches with another. But i can't just overwrite one database with a dump
from another branch, as the consumer databases will not follow.
What i think i need is a way to compute the differences between two
databases that have the same schema, and generate insert/update/delete
statements from that.

Since this seems as a pretty generic problem, i thought that i should ask
around before i start writing my own scripts. Does anyone know of script or
application that does this?

Cheers
-- 
Willy-Bas Loos


Re: [GENERAL] BDR Global Sequences

2014-09-30 Thread Andres Freund
Hi,

On 2014-09-29 13:52:52 -0700, p...@cmicdo.com wrote:
 I have a question about BDR Global Sequences.
 
 I've been playing with BDR on PG 9.4beta2, built from source from the
 2nd Quadrant GIT page (git://git.postgresql.org/git/2ndquadrant_bdr.git).
 
 When trying a 100 row \copy-in, letting PG choose the global sequence
 values, I get ERROR:  could not find free sequence value for global
 sequence public.myseq, as documented...no surprise there.  However, the
 number of rows I can load before the error varies wildly with each trial.

Yea, it depends on how quick the refilling starts and how quickly it can
keep up. The next version hopefull will start to fill up a bit quicker.

 Is there way to increase a global sequence's reservation block for each
 node so that I can tell the nodes, I'm going to load 100M rows now so
 you should get ready for that.?

Not yet, but we're planning to add that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table versioning approach (not auditing)

2014-09-30 Thread Felix Kunde
Hey
 
yes i'm adding an additional key to each of my tables. First i wanted to use 
the primary key as one column in my audit_log table, but in some of my tables 
the PK consists of more than one column. Plus it's nice to have one key that is 
called the same over all tables.
 
To get a former state for one row at date x I need to join the latest delta 
BEFORE date x with each delta AFTER date x. If I would log complete rows, this 
joining part would not be neccessary, but as I usually work with spatial 
databases that have complex geometries and also image files, this strategy is 
too harddisk consuming.
 
If there are more users following a similar approach, I wonder why we not throw 
all the good ideas together, to have one solution that is tested, maintained 
and improved by more developpers. This would be great.
 
Felix
 

Gesendet: Montag, 29. September 2014 um 23:25 Uhr
Von: Abelard Hoffman abelardhoff...@gmail.com
An: Felix Kunde felix-ku...@gmx.de
Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org
Betreff: Re: [GENERAL] table versioning approach (not auditing)

Thank you Felix, Gavin, and Jonathan for your responses.
 
Felix  Jonathan: both of you mention just storing deltas. But if you do that, 
how do you associate the delta record with the original row? Where's the PK 
stored, if it wasn't part of the delta?
 
Felix, thank you very much for the example code. I took a look at your table 
schemas. I need to study it more, but it looks like the way you're handling the 
PK, is you're adding a separate synthethic key (audit_id) to each table that's 
being versioned. And then storing that key along with the delta.
 
So then to find all the versions of a given row, you just need to join the 
audit row with the schema_name.table_name.audit_id column. Is that right? The 
only potential drawback there is there's no referential integrity between the 
audit_log.audit_id and the actual table.
 
I do like that approach very much though, in that it eliminates the need to 
interrogate the json data in order to perform most queries.
 
AH
 
 
 
On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde felix-ku...@gmx.de wrote:Hey
 
i've also tried to implement a database versioning using JSON to log changes in 
tables. Here it is: https://github.com/fxku/audit[https://github.com/fxku/audit]
I've got two versioning tables, one storing information about all transactions 
that happened and one where i put the JSON logs of row changes of each table. 
I'm only logging old values and not complete rows.
 
Then I got a function that recreates a database state at a given time into a 
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could 
then be indexed in order to work with it. You can also reset the production 
state to the recreated past state.
 
Unfortunately I've got no time to further work on it at the moment + I have not 
done tests with many changes in the database so I can't say if the recreation 
process scales well. On downside I've realised is that using the json_agg 
function has limits when I've got binary data. It gets too long. So I'm really 
looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it should be 
possible to revert only certain transactions. I'm also thinking of parallel 
versioning, e.g. different users are all working with their version of the 
database and commit their changes to the production state. As I've got a unique 
history ID for each table and each row, I should be able to map the affected 
records.

Have a look and tell me what you think of it.

Cheers
Felix
 

Gesendet: Montag, 29. September 2014 um 04:00 Uhr
Von: Abelard Hoffman abelardhoff...@gmail.com
An: pgsql-general@postgresql.org pgsql-general@postgresql.org
Betreff: [GENERAL] table versioning approach (not auditing)

Hi. I need to maintain a record of all changes to certain tables so assist in 
viewing history and reverting changes when necessary (customer service makes an 
incorrect edit, etc.).
 
I have studied these two audit trigger examples:
https://wiki.postgresql.org/wiki/Audit_trigger[https://wiki.postgresql.org/wiki/Audit_trigger]
https://wiki.postgresql.org/wiki/Audit_trigger_91plus
 
I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which is the 
current version
2. have a separate versions table for each real table, and insert into the 
associated version table whenever an update or insert is done.
 
My current implementation is based on the wiki trigger examples, using a single 
table, and a json column to record the row changes (rather than hstore). What I 
like about that, in particular, is I can have a global, chronological view of 
all versioned changes very easily.
 
But there are two types of queries I need to run.
1. Find all changes made by a specific user
2. Find all changes related to a specific record
 
#1 is simple to do. The versioning table has a user_id 

Re: [GENERAL] BDR Global Sequences

2014-09-30 Thread pbj
Hi Andres,

  Hi,
 
  On 2014-09-29 13:52:52 -0700, p...@cmicdo.com wrote:
  I have a question about BDR Global Sequences.
 
[deleted]
  Is there way to increase a global sequence's reservation block for each
  node so that I can tell the nodes, I'm going to load 100M rows now so
  you should get ready for that.?
 
 
  Not yet, but we're planning to add that.

Good to hear.  In the meantime, is there something I can hack to force
the nodes to make a sequence allocation of my choosing (even if just
hardwired?)  I was playing with start_elections_sql where it says:

generate_series(\n
current_max,\n
-- 1000 is the chunk size, -1 is to get  instead = out of 
generate_series\n
current_max + 1000 * (5 - open_seq_chunks) - 1,\n
1000) chunk_start\n

and hoping that bumping up the 1000 would help, but apparently not.

PJ
   
  
  Greetings,
  
  Andres Freund
  
  --
  Andres Freund   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PANIC: could not create file pg_xlog/xlogtemp.7884: No space left on device

2014-09-30 Thread Eric Veldhuyzen
Roopeshakumar Narayansa Shalgar (rshalgar) wrote:
 Hi,
 
  
 
 I am using version 9.3.1 and see the “no space device error” even though there
 is enough space (99% free) on my disk.

Just to be sure, check the output of both 'df -h' (for disk blocks) and 'df -hi'
(for inodes). You might have ran out of inodes...

Eric



signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] [SQL] pg_multixact issues

2014-09-30 Thread Dev Kumkar
On Fri, Sep 26, 2014 at 1:36 PM, Dev Kumkar devdas.kum...@gmail.com wrote:

 Received the database with huge pg_multixact directory of size 21G and
 there are ~82,000 files in pg_multixact/members and 202 files in
 pg_multixact/offsets directory.

 Did run vacuum full on this database and it was successful. However now
 am not sure about pg_multixact directory. truncating this directory except
  file results into database start up issues, of course this is not
 correct way of truncating.
  FATAL:  could not access status of transaction 13224692

 Stumped ! Please provide some comments on how to truncate pg_multixact
 files and if there is any impact because of these files on database
 performance.


Facing this issue on couple more machines where pg_multixact is huge and
not being cleaned up. Any suggestions / troubleshooting tips?

Regards...


Re: [GENERAL] [SQL] pg_multixact issues

2014-09-30 Thread Alvaro Herrera
Dev Kumkar wrote:
 On Fri, Sep 26, 2014 at 1:36 PM, Dev Kumkar devdas.kum...@gmail.com wrote:
 
  Received the database with huge pg_multixact directory of size 21G and
  there are ~82,000 files in pg_multixact/members and 202 files in
  pg_multixact/offsets directory.
 
  Did run vacuum full on this database and it was successful. However now
  am not sure about pg_multixact directory. truncating this directory except
   file results into database start up issues, of course this is not
  correct way of truncating.
   FATAL:  could not access status of transaction 13224692
 
  Stumped ! Please provide some comments on how to truncate pg_multixact
  files and if there is any impact because of these files on database
  performance.
 
 
 Facing this issue on couple more machines where pg_multixact is huge and
 not being cleaned up. Any suggestions / troubleshooting tips?

Did you try decreasing the autovacuum_multixact_freeze_min_age and
autovacuum_multixact_freeze_table_age parameters?

What exact server version are you running?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] synchronize DTAP

2014-09-30 Thread Brent Wood
Gidday,


There was an interesting presentation at the Portland Postgres Users Group 
meeting in early Sept, from a guy who demo'd a Postgres database mounted as a 
FUSE filesystem. Not production ready, but with tables manifesting as 
directories, databases could be synch'ed using filesystem tools like rsynch - 
which offers intriguing backup  replication possibilities.


 http://vimeo.com/105493143


the demo of the FUSE functionality starts at 39 minutes into the presentation.


Brent Wood

Programme leader: Environmental Information Delivery
NIWA
DDI:  +64 (4) 3860529

Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529 | 301 Evans Bay Parade, Greta Point, Wellington | 
www.niwa.co.nzhttp://www.niwa.co.nz
[NIWA]http://www.niwa.co.nz

From: pgsql-general-ow...@postgresql.org pgsql-general-ow...@postgresql.org 
on behalf of Willy-Bas Loos willy...@gmail.com
Sent: Tuesday, September 30, 2014 8:58 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] synchronize DTAP


Hi,

We have an environment that has a central repository for lookups, which is 
replicated to several databases, ech for different applications.
This has been arranged in a DTAP manner.

Sometimes it is necessary to synchronize the lookups of one of the DTAP 
branches with another. But i can't just overwrite one database with a dump from 
another branch, as the consumer databases will not follow.
What i think i need is a way to compute the differences between two databases 
that have the same schema, and generate insert/update/delete statements from 
that.

Since this seems as a pretty generic problem, i thought that i should ask 
around before i start writing my own scripts. Does anyone know of script or 
application that does this?

Cheers
--
Willy-Bas Loos





Re: [GENERAL] table versioning approach (not auditing)

2014-09-30 Thread Adam Brusselback
Felix, I'd love to see a single, well maintained project. For example, I
just found yours, and gave it a shot today after seeing this post.  I found
a bug when an update command is issued, but the old and new values are all
the same.  The trigger will blow up.  I've got a fix for that, but if we
had one project that more than a handful of people used, stuff like that
would be quashed very quickly.

I love the design of it by the way. Any idea what it will take to move to
JSONB for 9.4?


On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde felix-ku...@gmx.de wrote:

 Hey

 yes i'm adding an additional key to each of my tables. First i wanted to
 use the primary key as one column in my audit_log table, but in some of my
 tables the PK consists of more than one column. Plus it's nice to have one
 key that is called the same over all tables.

 To get a former state for one row at date x I need to join the latest
 delta BEFORE date x with each delta AFTER date x. If I would log complete
 rows, this joining part would not be neccessary, but as I usually work with
 spatial databases that have complex geometries and also image files, this
 strategy is too harddisk consuming.

 If there are more users following a similar approach, I wonder why we not
 throw all the good ideas together, to have one solution that is tested,
 maintained and improved by more developpers. This would be great.

 Felix


 Gesendet: Montag, 29. September 2014 um 23:25 Uhr
 Von: Abelard Hoffman abelardhoff...@gmail.com
 An: Felix Kunde felix-ku...@gmx.de
 Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org
 Betreff: Re: [GENERAL] table versioning approach (not auditing)

 Thank you Felix, Gavin, and Jonathan for your responses.

 Felix  Jonathan: both of you mention just storing deltas. But if you do
 that, how do you associate the delta record with the original row? Where's
 the PK stored, if it wasn't part of the delta?

 Felix, thank you very much for the example code. I took a look at your
 table schemas. I need to study it more, but it looks like the way you're
 handling the PK, is you're adding a separate synthethic key (audit_id) to
 each table that's being versioned. And then storing that key along with the
 delta.

 So then to find all the versions of a given row, you just need to join the
 audit row with the schema_name.table_name.audit_id column. Is that right?
 The only potential drawback there is there's no referential integrity
 between the audit_log.audit_id and the actual table.

 I do like that approach very much though, in that it eliminates the need
 to interrogate the json data in order to perform most queries.

 AH



 On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde felix-ku...@gmx.de
 wrote:Hey

 i've also tried to implement a database versioning using JSON to log
 changes in tables. Here it is:
 https://github.com/fxku/audit[https://github.com/fxku/audit]
 I've got two versioning tables, one storing information about all
 transactions that happened and one where i put the JSON logs of row changes
 of each table. I'm only logging old values and not complete rows.

 Then I got a function that recreates a database state at a given time into
 a separate schema - either to VIEWs, MVIEWs or TABLES. This database state
 could then be indexed in order to work with it. You can also reset the
 production state to the recreated past state.

 Unfortunately I've got no time to further work on it at the moment + I
 have not done tests with many changes in the database so I can't say if the
 recreation process scales well. On downside I've realised is that using the
 json_agg function has limits when I've got binary data. It gets too long.
 So I'm really looking forward using JSONB.

 There are more plans in my mind. By having a Transaction_Log table it
 should be possible to revert only certain transactions. I'm also thinking
 of parallel versioning, e.g. different users are all working with their
 version of the database and commit their changes to the production state.
 As I've got a unique history ID for each table and each row, I should be
 able to map the affected records.

 Have a look and tell me what you think of it.

 Cheers
 Felix


 Gesendet: Montag, 29. September 2014 um 04:00 Uhr
 Von: Abelard Hoffman abelardhoff...@gmail.com
 An: pgsql-general@postgresql.org pgsql-general@postgresql.org
 Betreff: [GENERAL] table versioning approach (not auditing)

 Hi. I need to maintain a record of all changes to certain tables so assist
 in viewing history and reverting changes when necessary (customer service
 makes an incorrect edit, etc.).

 I have studied these two audit trigger examples:

 https://wiki.postgresql.org/wiki/Audit_trigger[https://wiki.postgresql.org/wiki/Audit_trigger]
 https://wiki.postgresql.org/wiki/Audit_trigger_91plus

 I've also read about two other approaches to versioning:
 1. maintain all versions in one table, with a flag to indicate which is
 the current version
 2. have a separate versions table for 

Re: [GENERAL] [SQL] pg_multixact issues

2014-09-30 Thread Dev Kumkar
On Tue, Sep 30, 2014 at 8:50 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Did you try decreasing the autovacuum_multixact_freeze_min_age and
 autovacuum_multixact_freeze_table_age parameters?

As per the docs this set anywhere from zero to 1 billion for
vacuum_multixact_freeze_min_age

And zero to 2 billion for vacuum_multixact_freeze_table_age.

Modified this to have value 10 and 15 respectively. Not sure if that's
correct way of setting these parameters?

What exact server version are you running?

 Am using PostgreSQL 9.3.4 (linux-64-bit)

Regards...


[GENERAL] ability to return number of rows inserted into child partition tables request

2014-09-30 Thread Roger Pack
Hello.

I was trying to get postgres to return the correct number of rows
inserted for batch inserts to a partitioned table [using the triggers as
suggested here
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html results in
it always returning 0 by default].

What I ideally wanted it to do is to be able to insert into just the child
partitions, and return number of rows updated.

It seems the state of the art is either to return the NEW row from the
insert trigger [which causes it to also be saved to the parent master
table], then define an extra trigger to remove the parent table.  So 2
inserts and 1 delete for an insert. [1]

Or you can use an unconditional rule and it will return the number of rows
updated [however, in this case, since we're using partitioning, we I think
need multiple rules, once for each child table].

It is possible for a view to use a trigger and still return the number of
rows updated, which provides another work around. (See bottom of [1]).

Is there some more elegant way here?  It seems odd that partitioned tables
basically cannot, without a *lot* of massaging, return number of rows
updated, am I missing something or do I understand ok? [Today this requires
people to put in lots of work arounds, like *not* checking for number of
rows returned for batch inserts, etc.-- potentially dangerous as well]

Is there, for instance, some work around, like a way to manually cause the
count of the number of rows affected by the command to be incremented
here?  Or possibly conditional rules could be made possible to return the
output string with number of rows affected (feature request)?

I guess this has come up before, FWIW.
http://grokbase.com/t/postgresql/pgsql-general/0863bjzths/insert-into-master-table-0-rows-affected-hibernate-problems

Thanks!
-roger-


[1]
http://stackoverflow.com/questions/83093/hibernate-insert-batch-with-partitioned-postgresql
PS if no response I'll forward this on to pghackers.


[GENERAL] improving speed of query that uses a multi-column filter ?

2014-09-30 Thread Jonathan Vanasco

I'm trying to improve the speed of suite of queries that go across a few 
million rows.

They use 2 main filters across a variety of columns:

WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR 
(col_3 = col_1))
WHERE (col_1 IS True ) AND (col_2 IS True) AND (col_3 IS True) OR 
(col_4 IS NULL)

I created a dedicated multi-column index for each query to speed them up.  That 
was great.

I still don't have the performance where I want it to be - the size of the 
index seems to be an issue.  If the index were on one column, instead of 4, I 
think the scans would complete in time.

i looked online and the archives, and couldn't find much information on good 
strategies to deal with this.

It looks like my best option is to somehow index on the interpretation of 
this criteria, and not the criteria itself.

the two ways that come to mind are:

1. alter the table: adding a boolean column for each filter-test to the 
table, index that, then query for that field
2. leave the table as-is: write a custom function for each filter, and 
then use a function index 

has anyone else encountered a need like this?

are there any tips / tricks / things I should look out for.  are there better 
ways to handle this?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] improving speed of query that uses a multi-column filter ?

2014-09-30 Thread John R Pierce

On 9/30/2014 4:50 PM, Jonathan Vanasco wrote:

WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR 
(col_3 = col_1))


if col_1 IS NULL,   then that OR condition doesn't make much sense. 
just saying...


these 4 columns are all nullable booleans, so they can be TRUE, FALSE, 
or NULL ?  with 4 columns, there's 3^4 = 81 possible combinations of 
these values...you might get better speeds encoding this as a single 
SHORT INTEGER, and enumerating those 81 states, then just do equals or 
IN (set of values) conditions...   of course, this might make a lot of 
OTHER code more complicated.   It might be easier to make each col_X 2 
bits of this integer, such that one bit indicates the value was 'NULL', 
and the other bit is the true/false state if that first bit isn't set, 
this would make testing individual bits somewhat better.


--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] improving speed of query that uses a multi-column filter ?

2014-09-30 Thread Jonathan Vanasco
On Sep 30, 2014, at 8:04 PM, John R Pierce pie...@hogranch.com wrote:
 if col_1 IS NULL,   then that OR condition doesn't make much sense. just 
 saying...

I was just making a quick example.  There are two commonly used filter sets, 
each are mostly on Bool columns that allow null -- but one checks to see if the 
row references itself in a particular column.

 these 4 columns are all nullable booleans, so they can be TRUE, FALSE, or 
 NULL ?  

Most of them, yes.

 with 4 columns, there's 3^4 = 81 possible combinations of these values...
 you might get better speeds encoding this as a single SHORT INTEGER, and 
 enumerating those 81 states, then just do equals or IN (set of values) 
 conditions...   of course, this might make a lot of OTHER code more 
 complicated.   It might be easier to make each col_X 2 bits of this integer, 
 such that one bit indicates the value was 'NULL', and the other bit is the 
 true/false state if that first bit isn't set, this would make testing 
 individual bits somewhat better.

That's interesting.  I never thought of how Postgres processes the data.

For legacy reasons, I can't change the data types -- but I can add additional 
columns.  So I could do a trigger/function that manages a filter_test column 
that is an int, give each filter a bit value, and then just run a scan on that. 
 It wouldn't be much more work to test that and dedicated Bool columns for each 
filter.









-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] improving speed of query that uses a multi-column filter ?

2014-09-30 Thread Misa Simic
On Wednesday, October 1, 2014, Jonathan Vanasco postg...@2xlp.com wrote:


 I'm trying to improve the speed of suite of queries that go across a few
 million rows.

 They use 2 main filters across a variety of columns:

 WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR
 (col_3 = col_1))
 WHERE (col_1 IS True ) AND (col_2 IS True) AND (col_3 IS True) OR
 (col_4 IS NULL)

 I created a dedicated multi-column index for each query to speed them up.
 That was great.

 I still don't have the performance where I want it to be - the size of the
 index seems to be an issue.  If the index were on one column, instead of 4,
 I think the scans would complete in time.

 i looked online and the archives, and couldn't find much information on
 good strategies to deal with this.

 It looks like my best option is to somehow index on the interpretation
 of this criteria, and not the criteria itself.

 the two ways that come to mind are:

 1. alter the table: adding a boolean column for each filter-test
 to the table, index that, then query for that field
 2. leave the table as-is: write a custom function for each filter,
 and then use a function index

 has anyone else encountered a need like this?

 are there any tips / tricks / things I should look out for.  are there
 better ways to handle this?

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org
 javascript:;)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


Have you considered maybe partial indexes?

http://www.postgresql.org/docs/9.3/static/indexes-partial.html

I.e idx1 on pk column of the table with where inside index exactly the same
as your first where

Idx2 on pk column with where inside index as second where


Cheers,

Misa