Re: [GENERAL] table versioning approach (not auditing)

2014-10-01 Thread Felix Kunde
Hey there. Thank you very much for that fix! Thats why I'd like to have a joint 
development and joint testing. It's way more convincing for users to go for a 
solution that is tested by some experts than just by a random developer :) 

I'm open to create a new project and push the code there. Don't care about the 
name. Then we might figure out which parts are already good, which parts could 
be improved and where to go next. I think switching to JSONB for example will 
be easy, as it offers the same functions than JSON afaik.
 

Gesendet: Dienstag, 30. September 2014 um 21:16 Uhr
Von: Adam Brusselback adambrusselb...@gmail.com
An: Felix Kunde felix-ku...@gmx.de
Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org
Betreff: Re: [GENERAL] table versioning approach (not auditing)

Felix, I'd love to see a single, well maintained project. For example, I just 
found yours, and gave it a shot today after seeing this post.  I found a bug 
when an update command is issued, but the old and new values are all the same.  
The trigger will blow up.  I've got a fix for that, but if we had one project 
that more than a handful of people used, stuff like that would be quashed very 
quickly.
 
I love the design of it by the way. Any idea what it will take to move to JSONB 
for 9.4? 
 
 
On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde felix-ku...@gmx.de wrote:Hey
 
yes i'm adding an additional key to each of my tables. First i wanted to use 
the primary key as one column in my audit_log table, but in some of my tables 
the PK consists of more than one column. Plus it's nice to have one key that is 
called the same over all tables.
 
To get a former state for one row at date x I need to join the latest delta 
BEFORE date x with each delta AFTER date x. If I would log complete rows, this 
joining part would not be neccessary, but as I usually work with spatial 
databases that have complex geometries and also image files, this strategy is 
too harddisk consuming.
 
If there are more users following a similar approach, I wonder why we not throw 
all the good ideas together, to have one solution that is tested, maintained 
and improved by more developpers. This would be great.
 
Felix
 

Gesendet: Montag, 29. September 2014 um 23:25 Uhr
Von: Abelard Hoffman abelardhoff...@gmail.com[abelardhoff...@gmail.com]
An: Felix Kunde felix-ku...@gmx.de[felix-ku...@gmx.de]
Cc: pgsql-general@postgresql.org[pgsql-general@postgresql.org] 
pgsql-general@postgresql.org[pgsql-general@postgresql.org]
Betreff: Re: [GENERAL] table versioning approach (not auditing)

Thank you Felix, Gavin, and Jonathan for your responses.
 
Felix  Jonathan: both of you mention just storing deltas. But if you do that, 
how do you associate the delta record with the original row? Where's the PK 
stored, if it wasn't part of the delta?
 
Felix, thank you very much for the example code. I took a look at your table 
schemas. I need to study it more, but it looks like the way you're handling the 
PK, is you're adding a separate synthethic key (audit_id) to each table that's 
being versioned. And then storing that key along with the delta.
 
So then to find all the versions of a given row, you just need to join the 
audit row with the schema_name.table_name.audit_id column. Is that right? The 
only potential drawback there is there's no referential integrity between the 
audit_log.audit_id and the actual table.
 
I do like that approach very much though, in that it eliminates the need to 
interrogate the json data in order to perform most queries.
 
AH
 
 
 
On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde 
felix-ku...@gmx.de[felix-ku...@gmx.de] wrote:Hey
 
i've also tried to implement a database versioning using JSON to log changes in 
tables. Here it is: 
https://github.com/fxku/audit[https://github.com/fxku/audit][https://github.com/fxku/audit[https://github.com/fxku/audit]]
I've got two versioning tables, one storing information about all transactions 
that happened and one where i put the JSON logs of row changes of each table. 
I'm only logging old values and not complete rows.
 
Then I got a function that recreates a database state at a given time into a 
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could 
then be indexed in order to work with it. You can also reset the production 
state to the recreated past state.
 
Unfortunately I've got no time to further work on it at the moment + I have not 
done tests with many changes in the database so I can't say if the recreation 
process scales well. On downside I've realised is that using the json_agg 
function has limits when I've got binary data. It gets too long. So I'm really 
looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it should be 
possible to revert only certain transactions. I'm also thinking of parallel 
versioning, e.g. different users are all working with their version of the 
database and commit their changes to the 

[GENERAL] Extract especific text from a sql statement

2014-10-01 Thread Emanuel Araújo
Hi,

I need help to extract fields and tables from a sql statement.

Example:

SELECT pc.cod, pc.val, pi.qtd,
COALESCE(pc.name, 'empty') AS name, lower(coalesce(pc.email, 'empty')) as
email,
status
FROM pc
INNER JOIN pi on (pc.cod = pi.cod)
WHERE pc.startdate  CURRENT_DATE
order by 1 desc
;

I need to return that:

pc.cod
pc.val
pi.qtd
pc.name
pc.email
status
pc
pi
pc.startdate



can you help me ?






-- 


*Atenciosamente,Emanuel Araújo*


[GENERAL] Best practices for Large import in empty database?

2014-10-01 Thread Daniel Begin
I am about to feed an empty database with a large import of data (the size
of csv files is about 500 GB). Tables are created but I haven't added any
constraints or indexes yet. I wonder whether the best practice is to add
them before or after the import. Are there other good practices that would
ease the import process?

 

Cheers,

Daniel



Re: [GENERAL] Best practices for Large import in empty database?

2014-10-01 Thread Tom Lane
Daniel Begin jfd...@hotmail.com writes:
 I am about to feed an empty database with a large import of data (the size
 of csv files is about 500 GB). Tables are created but I haven't added any
 constraints or indexes yet. I wonder whether the best practice is to add
 them before or after the import. Are there other good practices that would
 ease the import process?

http://www.postgresql.org/docs/9.3/static/populate.html

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best practices for Large import in empty database?

2014-10-01 Thread Andy Colson

On 10/1/2014 9:13 AM, Daniel Begin wrote:

I am about to feed an empty database with a large import of data (the
size of csv files is about 500 GB). Tables are created but I haven't
added any constraints or indexes yet. I wonder whether the best practice
is to add them before or after the import. Are there other good
practices that would ease the import process?

Cheers,

Daniel



create index after, as long as you dont need it for uniqueness checking.

then:

begin;
truncate table junk;
copy ...  with freeze;
commit;

Even though the table is empty, the begin,trunc,copy lets PG skip a 
bunch of extra WAL.


-Andy




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgres tcp_keepalive_xxxx parameters.

2014-10-01 Thread jlrando
Hi.

We have an issue with postgres clients tha are being disconnected from
database server after some time. Client is a web application which creates a
pool of connections. Since client and server are on different VLANS I think
the problem is that the FW which is routing traffic is droping idle
connections after some time.

Os is CentOS 5.10 and kernel config is as follows:

root@itk-iv-mcs ~ # sysctl -a | grep keepalive
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200

tcp_keepalive_xxx in postgresql.conf were set to 0 all. Then we tried to set 

tcp_keepalives_idle = 60 # TCP_KEEPIDLE, in seconds;
tcp_keepalives_interval = 60 # TCP_KEEPINTVL, in seconds;
#tcp_keepalives_count = 0 # TCP_KEEPCNT;

So effective values (I guess are) 60,60 and 9 (system default). Even with
this configuration the cliens are being disconnected as before.

At this point I am not sure if I have understood properly the way keepalive
is working or not, becase if PG will send a packet after 60 seconds of
inactivity and the client respond ok then it will not retry and wait for
tcp_keepalives_idle seconds again... and so on. Then interval and
count are used only in case the client does not respond. If this is true
and I am not wrong then we can discard this issue as FW issue regarding idle
connections... Is this the behaviour of the tcp_keepalives_xxx parameters?

Is there any more I have to setup or check to make keepalive work?
Do you know how can we sort out this issue?

Thanks in advance.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Postgres-tcp-keepalive--parameters-tp5821282.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best practices for Large import in empty database?

2014-10-01 Thread Daniel Begin
Oups, right to the point! Thanks...

-Original Message-
From: pgsql-general-ow...@postgresql.org
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: October-01-14 10:38
To: Daniel Begin
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Best practices for Large import in empty database?

Daniel Begin jfd...@hotmail.com writes:
 I am about to feed an empty database with a large import of data (the 
 size of csv files is about 500 GB). Tables are created but I haven't 
 added any constraints or indexes yet. I wonder whether the best 
 practice is to add them before or after the import. Are there other 
 good practices that would ease the import process?

http://www.postgresql.org/docs/9.3/static/populate.html

regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres tcp_keepalive_xxxx parameters.

2014-10-01 Thread Tom Lane
jlrando jose.luis.rando.ca...@ericsson.com writes:
 We have an issue with postgres clients tha are being disconnected from
 database server after some time. Client is a web application which creates a
 pool of connections. Since client and server are on different VLANS I think
 the problem is that the FW which is routing traffic is droping idle
 connections after some time.

Probably.  It might be asymmetric; have you tried enabling keepalives
from the client end, rather than the server?  If using libpq, you can
set keepalive parameters in the connection string:
http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS

If that doesn't fix it, you might want to get out Wireshark or a similar
tool and verify that keepalive packets are actually getting sent.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table versioning approach (not auditing)

2014-10-01 Thread Adam Brusselback
I know we're kinda hijacking this thread, so sorry for that.  If you'd like
to do that, i'd be more than happy to use it and push any fixes / changes
upstream.  I don't have much of a preference on the name either, as long as
it's something that makes sense.

I would consider myself far from an expert though! Either way, more people
using a single solution is a good thing.

As a side note, I did some benchmarking this morning and wanted to share
the results:
pgbench -i -s 140 -U postgres pgbench

pgbench -c 4 -j 4 -T 600 -U postgres pgbench
no auditing tps: 2854
NOTE: Accounts are audited
auditing tps: 1278

pgbench -c 2 -j 2 -N -T 300 -U postgres pgbench
no auditing tps: 2504
NOTE: Accounts are audited
auditing tps: 822

pgbench -c 2 -j 2 -T 300 -U postgres pgbench
no auditing tps: 1836
NOTE: branches and tellers are audited, accounts are not
auditing tps: 505

I'd love to see if there are some easy wins to boost the performance.

On Wed, Oct 1, 2014 at 5:19 AM, Felix Kunde felix-ku...@gmx.de wrote:

 Hey there. Thank you very much for that fix! Thats why I'd like to have a
 joint development and joint testing. It's way more convincing for users to
 go for a solution that is tested by some experts than just by a random
 developer :)

 I'm open to create a new project and push the code there. Don't care about
 the name. Then we might figure out which parts are already good, which
 parts could be improved and where to go next. I think switching to JSONB
 for example will be easy, as it offers the same functions than JSON afaik.


 Gesendet: Dienstag, 30. September 2014 um 21:16 Uhr
 Von: Adam Brusselback adambrusselb...@gmail.com
 An: Felix Kunde felix-ku...@gmx.de
 Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org
 Betreff: Re: [GENERAL] table versioning approach (not auditing)

 Felix, I'd love to see a single, well maintained project. For example, I
 just found yours, and gave it a shot today after seeing this post.  I found
 a bug when an update command is issued, but the old and new values are all
 the same.  The trigger will blow up.  I've got a fix for that, but if we
 had one project that more than a handful of people used, stuff like that
 would be quashed very quickly.

 I love the design of it by the way. Any idea what it will take to move to
 JSONB for 9.4?


 On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde felix-ku...@gmx.de
 wrote:Hey

 yes i'm adding an additional key to each of my tables. First i wanted to
 use the primary key as one column in my audit_log table, but in some of my
 tables the PK consists of more than one column. Plus it's nice to have one
 key that is called the same over all tables.

 To get a former state for one row at date x I need to join the latest
 delta BEFORE date x with each delta AFTER date x. If I would log complete
 rows, this joining part would not be neccessary, but as I usually work with
 spatial databases that have complex geometries and also image files, this
 strategy is too harddisk consuming.

 If there are more users following a similar approach, I wonder why we not
 throw all the good ideas together, to have one solution that is tested,
 maintained and improved by more developpers. This would be great.

 Felix


 Gesendet: Montag, 29. September 2014 um 23:25 Uhr
 Von: Abelard Hoffman abelardhoff...@gmail.com[abelardhoff...@gmail.com
 ]
 An: Felix Kunde felix-ku...@gmx.de[felix-ku...@gmx.de]
 Cc: pgsql-general@postgresql.org[pgsql-general@postgresql.org] 
 pgsql-general@postgresql.org[pgsql-general@postgresql.org]
 Betreff: Re: [GENERAL] table versioning approach (not auditing)

 Thank you Felix, Gavin, and Jonathan for your responses.

 Felix  Jonathan: both of you mention just storing deltas. But if you do
 that, how do you associate the delta record with the original row? Where's
 the PK stored, if it wasn't part of the delta?

 Felix, thank you very much for the example code. I took a look at your
 table schemas. I need to study it more, but it looks like the way you're
 handling the PK, is you're adding a separate synthethic key (audit_id) to
 each table that's being versioned. And then storing that key along with the
 delta.

 So then to find all the versions of a given row, you just need to join the
 audit row with the schema_name.table_name.audit_id column. Is that right?
 The only potential drawback there is there's no referential integrity
 between the audit_log.audit_id and the actual table.

 I do like that approach very much though, in that it eliminates the need
 to interrogate the json data in order to perform most queries.

 AH



 On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde felix-ku...@gmx.de[
 felix-ku...@gmx.de] wrote:Hey

 i've also tried to implement a database versioning using JSON to log
 changes in tables. Here it is:
 https://github.com/fxku/audit[https://github.com/fxku/audit][https://github.com/fxku/audit[https://github.com/fxku/audit]]
 I've got two versioning tables, one storing information about all
 

Re: [GENERAL] Extract especific text from a sql statement

2014-10-01 Thread David G Johnston
Emanuel Araújo wrote
 Hi,
 
 I need help to extract fields and tables from a sql statement.
 
 Example:
 
 SELECT pc.cod, pc.val, pi.qtd,
 COALESCE(pc.name, 'empty') AS name, lower(coalesce(pc.email, 'empty')) as
 email,
 status
 FROM pc
 INNER JOIN pi on (pc.cod = pi.cod)
 WHERE pc.startdate  CURRENT_DATE
 order by 1 desc
 ;
 
 I need to return that:
 
 pc.cod
 pc.val
 pi.qtd
 pc.name
 pc.email
 status
 pc
 pi
 pc.startdate
 
 
 
 can you help me ?

Me, probably not.  I would suggest you tell us what goal/use-case you have
in mind that you think you need to do the above.  

The project is open source so you are welcome to look at the parser code and
see how it goes about pulling out identifiers and determining what they are.

A big question is how do you want to deal with aliases and views present in
the query?

Do you have a particular language you are restricting your search to?

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Extract-especific-text-from-a-sql-statement-tp5821256p5821296.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] improving speed of query that uses a multi-column filter ?

2014-10-01 Thread Jonathan Vanasco

On Oct 1, 2014, at 12:34 AM, Misa Simic wrote:

 Have you considered maybe partial indexes?
 
 http://www.postgresql.org/docs/9.3/static/indexes-partial.html
 
 I.e idx1 on pk column of the table with where inside index exactly the same 
 as your first where
 
 Idx2 on pk column with where inside index as second where

That was actually my first attempt , and I was hoping it would work.  

Unfortunately, there is always something in the queries that keeps Postgres 
trying to use other (slower) indexes or jumping to a sequential scan.  

I haven't been able to trick the planner into using the partial index, and most 
online resources suggested it wasn't possible.



[GENERAL] Is there a way to get both the plan and the results?

2014-10-01 Thread François Beausoleil
Hi all!

Building a small tool, and it would be super useful if I could get both the 
query results and the plan at the same time. At the moment, the tool accepts a 
SQL query, drops it into a file and call psql --file=whatever.sql. I can change 
to use a lower-level API if necessary.

Ideally, I’d like to do something like EXLAIN (ANALYZE, RESULTS) SELECT ...

I’m talking about multi-minute queries, so it’s less than ideal if I have to 
run the query twice: once for EXPLAIN ANALYZE, once for the actual results.

Looking around, 
https://duckduckgo.com/?q=postgresql+get+query+results+and+query+plan+in+single+call
 doesn’t return anything useful. Google neither, for that matter. The PG 
mailing list archives seem to not have any results on that.

I’m specifically using 9.1. Not opposed to upgrade if necessary.

Thanks!
François Beausoleil



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Is there a way to get both the plan and the results?

2014-10-01 Thread David G Johnston


François Beausoleil wrote
 Hi all!
 
 Building a small tool, and it would be super useful if I could get both
 the query results and the plan at the same time. At the moment, the tool
 accepts a SQL query, drops it into a file and call psql
 --file=whatever.sql. I can change to use a lower-level API if necessary.
 
 Ideally, I’d like to do something like EXLAIN (ANALYZE, RESULTS) SELECT
 ...
 
 I’m talking about multi-minute queries, so it’s less than ideal if I have
 to run the query twice: once for EXPLAIN ANALYZE, once for the actual
 results.
 
 Looking around,
 https://duckduckgo.com/?q=postgresql+get+query+results+and+query+plan+in+single+call
 doesn’t return anything useful. Google neither, for that matter. The PG
 mailing list archives seem to not have any results on that.
 
 I’m specifically using 9.1. Not opposed to upgrade if necessary.

See:

http://www.postgresql.org/docs/9.1/static/auto-explain.html

There is no way to do so from a client interface.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Is-there-a-way-to-get-both-the-plan-and-the-results-tp5821313p5821315.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Extract especific text from a sql statement

2014-10-01 Thread Melvin Davidson
You already have most of the result columns, so the following should do it.

 SELECT pc.cod,
pc.val,
pi.qtd,
COALESCE(pc.name, 'empty') AS name,
lower(coalesce(pc.email, 'empty')) as email,
status,
c1.relname,
c2.relname,
pc.startdate
 FROM pc
 INNER JOIN pi on (pc.cod = pi.cod)
  JOIN pg_class c1 ON (c1.relname = 'pc' AND c1.relkind = 'r'
  JOIN pg_class c2 ON (c2.relname = 'pi' AND c2.relkind = 'r'
 WHERE pc.startdate  CURRENT_DATE
 order by 1 desc;

Learn the catalogs and you will learn to be a good dba.

Melvin Davidson


Re: [GENERAL] Extract especific text from a sql statement

2014-10-01 Thread David G Johnston
Melvin Davidson-5 wrote
 You already have most of the result columns, so the following should do
 it.
 
  SELECT pc.cod,
 pc.val,
 pi.qtd,
 COALESCE(pc.name, 'empty') AS name,
 lower(coalesce(pc.email, 'empty')) as email,
 status,
 c1.relname,
 c2.relname,
 pc.startdate
  FROM pc
  INNER JOIN pi on (pc.cod = pi.cod)
   JOIN pg_class c1 ON (c1.relname = 'pc' AND c1.relkind = 'r'
   JOIN pg_class c2 ON (c2.relname = 'pi' AND c2.relkind = 'r'
  WHERE pc.startdate  CURRENT_DATE
  order by 1 desc;
 
 Learn the catalogs and you will learn to be a good dba.
 
 Melvin Davidson

I read the example answer as being a single column (or some other
println(...) output) that generates a single row for each of the string
literal identifiers extracted from a parse of the raw query text - possibly
after capturing raw identifiers and performing catalog lookups.

Your particular answer also seems pointless in that the JOIN pg_class is
unnecessary since the ON clause sets a constant for relname and then re-uses
that in the select-list.  You'd been better off just writing:  SELECT ...,
'pc' AS relname, 'pi' AS relname FROM ... and providing disambiguating
aliases.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Extract-especific-text-from-a-sql-statement-tp5821256p5821328.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general