date:20140930

[GENERAL] synchronize DTAP

2014-09-30 Thread Willy-Bas Loos

Hi,

We have an environment that has a central repository for lookups, which is
replicated to several databases, ech for different applications.
This has been arranged in a DTAP manner.

Sometimes it is necessary to synchronize the lookups of one of the DTAP
branches with another. But i can't just overwrite one database with a dump
from another branch, as the consumer databases will not follow.
What i think i need is a way to compute the differences between two
databases that have the same schema, and generate insert/update/delete
statements from that.

Since this seems as a pretty generic problem, i thought that i should ask
around before i start writing my own scripts. Does anyone know of script or
application that does this?

Cheers
-- 
Willy-Bas Loos

Re: [GENERAL] BDR Global Sequences

2014-09-30 Thread Andres Freund

Hi,

On 2014-09-29 13:52:52 -0700, p...@cmicdo.com wrote:
 I have a question about BDR Global Sequences.
 
 I've been playing with BDR on PG 9.4beta2, built from source from the
 2nd Quadrant GIT page (git://git.postgresql.org/git/2ndquadrant_bdr.git).
 
 When trying a 100 row \copy-in, letting PG choose the global sequence
 values, I get ERROR:  could not find free sequence value for global
 sequence public.myseq, as documented...no surprise there.  However, the
 number of rows I can load before the error varies wildly with each trial.

Yea, it depends on how quick the refilling starts and how quickly it can
keep up. The next version hopefull will start to fill up a bit quicker.

 Is there way to increase a global sequence's reservation block for each
 node so that I can tell the nodes, I'm going to load 100M rows now so
 you should get ready for that.?

Not yet, but we're planning to add that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table versioning approach (not auditing)

2014-09-30 Thread Felix Kunde

Hey
yes i'm adding an additional key to each of my tables. First i wanted to use
the primary key as one column in my audit_log table, but in some of my tables
the PK consists of more than one column. Plus it's nice to have one key that is
called the same over all tables.
To get a former state for one row at date x I need to join the latest delta
BEFORE date x with each delta AFTER date x. If I would log complete rows, this
joining part would not be neccessary, but as I usually work with spatial
databases that have complex geometries and also image files, this strategy is
too harddisk consuming.
If there are more users following a similar approach, I wonder why we not throw
all the good ideas together, to have one solution that is tested, maintained
and improved by more developpers. This would be great.
Felix

Gesendet: Montag, 29. September 2014 um 23:25 Uhr
Von: Abelard Hoffman abelardhoff...@gmail.com
An: Felix Kunde felix-ku...@gmx.de
Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org
Betreff: Re: [GENERAL] table versioning approach (not auditing)

Thank you Felix, Gavin, and Jonathan for your responses.
Felix Jonathan: both of you mention just storing deltas. But if you do that,
how do you associate the delta record with the original row? Where's the PK
stored, if it wasn't part of the delta?
Felix, thank you very much for the example code. I took a look at your table
schemas. I need to study it more, but it looks like the way you're handling the
PK, is you're adding a separate synthethic key (audit_id) to each table that's
being versioned. And then storing that key along with the delta.
So then to find all the versions of a given row, you just need to join the
audit row with the schema_name.table_name.audit_id column. Is that right? The
only potential drawback there is there's no referential integrity between the
audit_log.audit_id and the actual table.
I do like that approach very much though, in that it eliminates the need to
interrogate the json data in order to perform most queries.
AH
On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde felix-ku...@gmx.de wrote:Hey
i've also tried to implement a database versioning using JSON to log changes in
tables. Here it is: https://github.com/fxku/audit[https://github.com/fxku/audit]
I've got two versioning tables, one storing information about all transactions
that happened and one where i put the JSON logs of row changes of each table.
I'm only logging old values and not complete rows.
Then I got a function that recreates a database state at a given time into a
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could
then be indexed in order to work with it. You can also reset the production
state to the recreated past state.
Unfortunately I've got no time to further work on it at the moment + I have not
done tests with many changes in the database so I can't say if the recreation
process scales well. On downside I've realised is that using the json_agg
function has limits when I've got binary data. It gets too long. So I'm really
looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it should be
possible to revert only certain transactions. I'm also thinking of parallel
versioning, e.g. different users are all working with their version of the
database and commit their changes to the production state. As I've got a unique
history ID for each table and each row, I should be able to map the affected
records.

Have a look and tell me what you think of it.

Cheers
Felix

Gesendet: Montag, 29. September 2014 um 04:00 Uhr
Von: Abelard Hoffman abelardhoff...@gmail.com
An: pgsql-general@postgresql.org pgsql-general@postgresql.org
Betreff: [GENERAL] table versioning approach (not auditing)

Hi. I need to maintain a record of all changes to certain tables so assist in
viewing history and reverting changes when necessary (customer service makes an
incorrect edit, etc.).
I have studied these two audit trigger examples:
https://wiki.postgresql.org/wiki/Audit_trigger[https://wiki.postgresql.org/wiki/Audit_trigger]
https://wiki.postgresql.org/wiki/Audit_trigger_91plus
I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which is the
current version
2. have a separate versions table for each real table, and insert into the
associated version table whenever an update or insert is done.
My current implementation is based on the wiki trigger examples, using a single
table, and a json column to record the row changes (rather than hstore). What I
like about that, in particular, is I can have a global, chronological view of
all versioned changes very easily.
But there are two types of queries I need to run.
1. Find all changes made by a specific user
2. Find all changes related to a specific record
#1 is simple to do. The versioning table has a user_id

Re: [GENERAL] BDR Global Sequences

2014-09-30 Thread pbj

Hi Andres,

  Hi,
 
  On 2014-09-29 13:52:52 -0700, p...@cmicdo.com wrote:
  I have a question about BDR Global Sequences.
 
[deleted]
  Is there way to increase a global sequence's reservation block for each
  node so that I can tell the nodes, I'm going to load 100M rows now so
  you should get ready for that.?
 
 
  Not yet, but we're planning to add that.

Good to hear.  In the meantime, is there something I can hack to force
the nodes to make a sequence allocation of my choosing (even if just
hardwired?)  I was playing with start_elections_sql where it says:

generate_series(\n
current_max,\n
-- 1000 is the chunk size, -1 is to get  instead = out of 
generate_series\n
current_max + 1000 * (5 - open_seq_chunks) - 1,\n
1000) chunk_start\n

and hoping that bumping up the 1000 would help, but apparently not.

PJ
   
  
  Greetings,
  
  Andres Freund
  
  --
  Andres Freund   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PANIC: could not create file pg_xlog/xlogtemp.7884: No space left on device

2014-09-30 Thread Eric Veldhuyzen

Roopeshakumar Narayansa Shalgar (rshalgar) wrote:
 Hi,
 
  
 
 I am using version 9.3.1 and see the “no space device error” even though there
 is enough space (99% free) on my disk.

Just to be sure, check the output of both 'df -h' (for disk blocks) and 'df -hi'
(for inodes). You might have ran out of inodes...

Eric



signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] [SQL] pg_multixact issues

2014-09-30 Thread Dev Kumkar

On Fri, Sep 26, 2014 at 1:36 PM, Dev Kumkar devdas.kum...@gmail.com wrote:

 Received the database with huge pg_multixact directory of size 21G and
 there are ~82,000 files in pg_multixact/members and 202 files in
 pg_multixact/offsets directory.

 Did run vacuum full on this database and it was successful. However now
 am not sure about pg_multixact directory. truncating this directory except
  file results into database start up issues, of course this is not
 correct way of truncating.
  FATAL:  could not access status of transaction 13224692

 Stumped ! Please provide some comments on how to truncate pg_multixact
 files and if there is any impact because of these files on database
 performance.


Facing this issue on couple more machines where pg_multixact is huge and
not being cleaned up. Any suggestions / troubleshooting tips?

Regards...

Re: [GENERAL] [SQL] pg_multixact issues

2014-09-30 Thread Alvaro Herrera

Dev Kumkar wrote:
 On Fri, Sep 26, 2014 at 1:36 PM, Dev Kumkar devdas.kum...@gmail.com wrote:
 
  Received the database with huge pg_multixact directory of size 21G and
  there are ~82,000 files in pg_multixact/members and 202 files in
  pg_multixact/offsets directory.
 
  Did run vacuum full on this database and it was successful. However now
  am not sure about pg_multixact directory. truncating this directory except
   file results into database start up issues, of course this is not
  correct way of truncating.
   FATAL:  could not access status of transaction 13224692
 
  Stumped ! Please provide some comments on how to truncate pg_multixact
  files and if there is any impact because of these files on database
  performance.
 
 
 Facing this issue on couple more machines where pg_multixact is huge and
 not being cleaned up. Any suggestions / troubleshooting tips?

Did you try decreasing the autovacuum_multixact_freeze_min_age and
autovacuum_multixact_freeze_table_age parameters?

What exact server version are you running?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] synchronize DTAP

2014-09-30 Thread Brent Wood

Gidday,


There was an interesting presentation at the Portland Postgres Users Group 
meeting in early Sept, from a guy who demo'd a Postgres database mounted as a 
FUSE filesystem. Not production ready, but with tables manifesting as 
directories, databases could be synch'ed using filesystem tools like rsynch - 
which offers intriguing backup  replication possibilities.


 http://vimeo.com/105493143


the demo of the FUSE functionality starts at 39 minutes into the presentation.


Brent Wood

Programme leader: Environmental Information Delivery
NIWA
DDI:  +64 (4) 3860529

Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529 | 301 Evans Bay Parade, Greta Point, Wellington | 
www.niwa.co.nzhttp://www.niwa.co.nz
[NIWA]http://www.niwa.co.nz

From: pgsql-general-ow...@postgresql.org pgsql-general-ow...@postgresql.org 
on behalf of Willy-Bas Loos willy...@gmail.com
Sent: Tuesday, September 30, 2014 8:58 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] synchronize DTAP


Hi,

We have an environment that has a central repository for lookups, which is 
replicated to several databases, ech for different applications.
This has been arranged in a DTAP manner.

Sometimes it is necessary to synchronize the lookups of one of the DTAP 
branches with another. But i can't just overwrite one database with a dump from 
another branch, as the consumer databases will not follow.
What i think i need is a way to compute the differences between two databases 
that have the same schema, and generate insert/update/delete statements from 
that.

Since this seems as a pretty generic problem, i thought that i should ask 
around before i start writing my own scripts. Does anyone know of script or 
application that does this?

Cheers
--
Willy-Bas Loos

Re: [GENERAL] table versioning approach (not auditing)

2014-09-30 Thread Adam Brusselback

Felix, I'd love to see a single, well maintained project. For example, I
just found yours, and gave it a shot today after seeing this post. I found
a bug when an update command is issued, but the old and new values are all
the same. The trigger will blow up. I've got a fix for that, but if we
had one project that more than a handful of people used, stuff like that
would be quashed very quickly.

I love the design of it by the way. Any idea what it will take to move to
JSONB for 9.4?

On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde felix-ku...@gmx.de wrote:

Hey

yes i'm adding an additional key to each of my tables. First i wanted to
use the primary key as one column in my audit_log table, but in some of my
tables the PK consists of more than one column. Plus it's nice to have one
key that is called the same over all tables.

To get a former state for one row at date x I need to join the latest
delta BEFORE date x with each delta AFTER date x. If I would log complete
rows, this joining part would not be neccessary, but as I usually work with
spatial databases that have complex geometries and also image files, this
strategy is too harddisk consuming.

If there are more users following a similar approach, I wonder why we not
throw all the good ideas together, to have one solution that is tested,
maintained and improved by more developpers. This would be great.

Felix

Thank you Felix, Gavin, and Jonathan for your responses.

Felix Jonathan: both of you mention just storing deltas. But if you do
that, how do you associate the delta record with the original row? Where's
the PK stored, if it wasn't part of the delta?

Felix, thank you very much for the example code. I took a look at your
table schemas. I need to study it more, but it looks like the way you're
handling the PK, is you're adding a separate synthethic key (audit_id) to
each table that's being versioned. And then storing that key along with the
delta.

So then to find all the versions of a given row, you just need to join the
audit row with the schema_name.table_name.audit_id column. Is that right?
The only potential drawback there is there's no referential integrity
between the audit_log.audit_id and the actual table.

I do like that approach very much though, in that it eliminates the need
to interrogate the json data in order to perform most queries.

On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde felix-ku...@gmx.de
wrote:Hey

i've also tried to implement a database versioning using JSON to log
changes in tables. Here it is:
https://github.com/fxku/audit[https://github.com/fxku/audit]
I've got two versioning tables, one storing information about all
transactions that happened and one where i put the JSON logs of row changes
of each table. I'm only logging old values and not complete rows.

Then I got a function that recreates a database state at a given time into
a separate schema - either to VIEWs, MVIEWs or TABLES. This database state
could then be indexed in order to work with it. You can also reset the
production state to the recreated past state.

Unfortunately I've got no time to further work on it at the moment + I
have not done tests with many changes in the database so I can't say if the
recreation process scales well. On downside I've realised is that using the
json_agg function has limits when I've got binary data. It gets too long.
So I'm really looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it
should be possible to revert only certain transactions. I'm also thinking
of parallel versioning, e.g. different users are all working with their
version of the database and commit their changes to the production state.
As I've got a unique history ID for each table and each row, I should be
able to map the affected records.

Have a look and tell me what you think of it.

Cheers
Felix

Hi. I need to maintain a record of all changes to certain tables so assist
in viewing history and reverting changes when necessary (customer service
makes an incorrect edit, etc.).

I have studied these two audit trigger examples:

https://wiki.postgresql.org/wiki/Audit_trigger[https://wiki.postgresql.org/wiki/Audit_trigger]
https://wiki.postgresql.org/wiki/Audit_trigger_91plus

I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which is
the current version
2. have a separate versions table for

Re: [GENERAL] [SQL] pg_multixact issues

2014-09-30 Thread Dev Kumkar

On Tue, Sep 30, 2014 at 8:50 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Did you try decreasing the autovacuum_multixact_freeze_min_age and
 autovacuum_multixact_freeze_table_age parameters?

As per the docs this set anywhere from zero to 1 billion for
vacuum_multixact_freeze_min_age

And zero to 2 billion for vacuum_multixact_freeze_table_age.

Modified this to have value 10 and 15 respectively. Not sure if that's
correct way of setting these parameters?

What exact server version are you running?

 Am using PostgreSQL 9.3.4 (linux-64-bit)

Regards...

[GENERAL] ability to return number of rows inserted into child partition tables request

2014-09-30 Thread Roger Pack

Hello.

I was trying to get postgres to return the correct number of rows
inserted for batch inserts to a partitioned table [using the triggers as
suggested here
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html results in
it always returning 0 by default].

What I ideally wanted it to do is to be able to insert into just the child
partitions, and return number of rows updated.

It seems the state of the art is either to return the NEW row from the
insert trigger [which causes it to also be saved to the parent master
table], then define an extra trigger to remove the parent table.  So 2
inserts and 1 delete for an insert. [1]

Or you can use an unconditional rule and it will return the number of rows
updated [however, in this case, since we're using partitioning, we I think
need multiple rules, once for each child table].

It is possible for a view to use a trigger and still return the number of
rows updated, which provides another work around. (See bottom of [1]).

Is there some more elegant way here?  It seems odd that partitioned tables
basically cannot, without a *lot* of massaging, return number of rows
updated, am I missing something or do I understand ok? [Today this requires
people to put in lots of work arounds, like *not* checking for number of
rows returned for batch inserts, etc.-- potentially dangerous as well]

Is there, for instance, some work around, like a way to manually cause the
count of the number of rows affected by the command to be incremented
here?  Or possibly conditional rules could be made possible to return the
output string with number of rows affected (feature request)?

I guess this has come up before, FWIW.
http://grokbase.com/t/postgresql/pgsql-general/0863bjzths/insert-into-master-table-0-rows-affected-hibernate-problems

Thanks!
-roger-


[1]
http://stackoverflow.com/questions/83093/hibernate-insert-batch-with-partitioned-postgresql
PS if no response I'll forward this on to pghackers.

[GENERAL] improving speed of query that uses a multi-column filter ?

2014-09-30 Thread Jonathan Vanasco


I'm trying to improve the speed of suite of queries that go across a few 
million rows.

They use 2 main filters across a variety of columns:

WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR 
(col_3 = col_1))
WHERE (col_1 IS True ) AND (col_2 IS True) AND (col_3 IS True) OR 
(col_4 IS NULL)

I created a dedicated multi-column index for each query to speed them up.  That 
was great.

I still don't have the performance where I want it to be - the size of the 
index seems to be an issue.  If the index were on one column, instead of 4, I 
think the scans would complete in time.

i looked online and the archives, and couldn't find much information on good 
strategies to deal with this.

It looks like my best option is to somehow index on the interpretation of 
this criteria, and not the criteria itself.

the two ways that come to mind are:

1. alter the table: adding a boolean column for each filter-test to the 
table, index that, then query for that field
2. leave the table as-is: write a custom function for each filter, and 
then use a function index 

has anyone else encountered a need like this?

are there any tips / tricks / things I should look out for.  are there better 
ways to handle this?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] improving speed of query that uses a multi-column filter ?

2014-09-30 Thread John R Pierce


On 9/30/2014 4:50 PM, Jonathan Vanasco wrote:

WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR 
(col_3 = col_1))


if col_1 IS NULL,   then that OR condition doesn't make much sense. 
just saying...


these 4 columns are all nullable booleans, so they can be TRUE, FALSE, 
or NULL ?  with 4 columns, there's 3^4 = 81 possible combinations of 
these values...you might get better speeds encoding this as a single 
SHORT INTEGER, and enumerating those 81 states, then just do equals or 
IN (set of values) conditions...   of course, this might make a lot of 
OTHER code more complicated.   It might be easier to make each col_X 2 
bits of this integer, such that one bit indicates the value was 'NULL', 
and the other bit is the true/false state if that first bit isn't set, 
this would make testing individual bits somewhat better.


--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] improving speed of query that uses a multi-column filter ?

2014-09-30 Thread Jonathan Vanasco

On Sep 30, 2014, at 8:04 PM, John R Pierce pie...@hogranch.com wrote:
 if col_1 IS NULL,   then that OR condition doesn't make much sense. just 
 saying...

I was just making a quick example.  There are two commonly used filter sets, 
each are mostly on Bool columns that allow null -- but one checks to see if the 
row references itself in a particular column.

 these 4 columns are all nullable booleans, so they can be TRUE, FALSE, or 
 NULL ?  

Most of them, yes.

 with 4 columns, there's 3^4 = 81 possible combinations of these values...
 you might get better speeds encoding this as a single SHORT INTEGER, and 
 enumerating those 81 states, then just do equals or IN (set of values) 
 conditions...   of course, this might make a lot of OTHER code more 
 complicated.   It might be easier to make each col_X 2 bits of this integer, 
 such that one bit indicates the value was 'NULL', and the other bit is the 
 true/false state if that first bit isn't set, this would make testing 
 individual bits somewhat better.

That's interesting.  I never thought of how Postgres processes the data.

For legacy reasons, I can't change the data types -- but I can add additional 
columns.  So I could do a trigger/function that manages a filter_test column 
that is an int, give each filter a bit value, and then just run a scan on that. 
 It wouldn't be much more work to test that and dedicated Bool columns for each 
filter.









-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] improving speed of query that uses a multi-column filter ?

2014-09-30 Thread Misa Simic

On Wednesday, October 1, 2014, Jonathan Vanasco postg...@2xlp.com wrote:


 I'm trying to improve the speed of suite of queries that go across a few
 million rows.

 They use 2 main filters across a variety of columns:

 WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR
 (col_3 = col_1))
 WHERE (col_1 IS True ) AND (col_2 IS True) AND (col_3 IS True) OR
 (col_4 IS NULL)

 I created a dedicated multi-column index for each query to speed them up.
 That was great.

 I still don't have the performance where I want it to be - the size of the
 index seems to be an issue.  If the index were on one column, instead of 4,
 I think the scans would complete in time.

 i looked online and the archives, and couldn't find much information on
 good strategies to deal with this.

 It looks like my best option is to somehow index on the interpretation
 of this criteria, and not the criteria itself.

 the two ways that come to mind are:

 1. alter the table: adding a boolean column for each filter-test
 to the table, index that, then query for that field
 2. leave the table as-is: write a custom function for each filter,
 and then use a function index

 has anyone else encountered a need like this?

 are there any tips / tricks / things I should look out for.  are there
 better ways to handle this?

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org
 javascript:;)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


Have you considered maybe partial indexes?

http://www.postgresql.org/docs/9.3/static/indexes-partial.html

I.e idx1 on pk column of the table with where inside index exactly the same
as your first where

Idx2 on pk column with where inside index as second where


Cheers,

Misa

[GENERAL] synchronize DTAP

Re: [GENERAL] BDR Global Sequences

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] BDR Global Sequences

Re: [GENERAL] PANIC: could not create file pg_xlog/xlogtemp.7884: No space left on device

Re: [GENERAL] [SQL] pg_multixact issues

Re: [GENERAL] [SQL] pg_multixact issues

Re: [GENERAL] synchronize DTAP

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] [SQL] pg_multixact issues

[GENERAL] ability to return number of rows inserted into child partition tables request

[GENERAL] improving speed of query that uses a multi-column filter ?

Re: [GENERAL] improving speed of query that uses a multi-column filter ?

Re: [GENERAL] improving speed of query that uses a multi-column filter ?

Re: [GENERAL] improving speed of query that uses a multi-column filter ?

15 matches

Site Navigation

Mail list logo

Footer information