date:20120319

Re: [GENERAL] WHERE IN (subselect) versus WHERE IN (1,2,3,)

2012-03-19 Thread Tom Lane

I wrote:
> You've still got a nasty join-size estimation error:

>> ->  Nested Loop  (cost=6.18..1939.43 rows=411736 width=8) (actual
>> time=0.203..3.487 rows=35 loops=1)

> It's not apparent why that's so far off ...

What PG version is this, anyway?  It strikes me that this estimation
error might have something with the eqjoinsel bugs that we repaired
in 9.0.5.  I'm not having any luck reproducing such a bogus estimate
with current code, either, though that may just mean you've omitted
some critical info about how the tables are set up.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Anonymized database dumps

2012-03-19 Thread Kiriakos Georgiou

On Mar 19, 2012, at 5:55 PM, Bill Moran wrote:

> 
>> Sensitive data should be stored encrypted to begin.  For test databases you 
>> or your developers can invoke a process that replaces the real encrypted 
>> data with fake encrypted data (for which everybody has the key/password.)  
>> Or if the overhead is too much (ie billions of rows), you can have different 
>> decrypt() routines on your test databases that return fake data without 
>> touching the real encrypted columns.
> 
> The thing is, this process has the same potential data spillage
> issues as sanitizing the data.  

Not really, in the modality I describe the sensitive data is always encrypted 
in the database and "useless" because nobody will have the private key or know 
the password that protects it other than the ops subsystems that require access.
So even if you take an ops dump, load it to a test box, and walk away, you are 
good.  If your developers/testers want to play with the data they will be 
forced to over-write and "stage" test encrypted data they can decrypt, or call 
a "fake" decrypt() that gives them test data (eg: joins to a test data table.)

Kiriakos
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] nice'ing the postgres COPY backend process to make pg_dumps run more "softly"

2012-03-19 Thread Merlin Moncure

On Mon, Mar 19, 2012 at 3:51 PM, Guillaume Lelarge
 wrote:
> On Sun, 2012-03-18 at 21:06 -0700, Aleksey Tsalolikhin wrote:
>> Hi.  When pg_dump runs, our application becomes inoperative (too
>> slow).  I was going to ask if nice'ing the postgres backend process
>> that handles the COPY would help but I just realized probably the
>> pg_dump takes out locks when it runs and nice'ing it would just make
>> it run longer...
>>
>> However the man page says "pg_dump does not block other users
>> accessing the database  (readers  or writers)."  But if we run a
>> pg_dump, the phone starts ringing, users are complaining that the web
>> app is not working.
>>
>> Would appreciate some pointer to help me reconcile these two
>> apparently contradictory facts.
>>
>
> Depends on what your app is doing. It doesn't block any usual use of the
> database: DML are all accepted. But you cannot drop a table that pg_dump
> must save, you cannot change its definition. So there are some DDL
> commands you cannot use during a dump.
>
> Other than this, your users shouldn't be blocked. If it happens again,
> you should look at pg_locks and pg_stat_activity to understand what's
> going on.

This.  pg_dump essentially runs a read only database wide transaction
that touches all objects.   This will allow all update, select,
delete, etc, but will block a say, CREATE INDEX or ALTER TABLE.  Let's
just say maybe some other transaction is firing off one of these guys
and is in turn already touched a customer record... bam.  Classic
priority inversion -- slowing down pg_dump will only make the problem
worse.

So the very first thing to determine is if you are seeing generalized
load issues (we don't have the info for that) or locking issues
because the solution will be entirely different depending on what
you're seeing.  Most likely case is you're having iowait issues and
moving the dump to another machine will fix the problem.   Next most
likely case is cpu (courtesy zlib) -- also easily fixed.  But you have
to know before changing things.  So:

What's general load (from top) during pg_dump "bad" times?
pg_locks, etc as Guillaume asked
iowait

etc.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_upgrade + streaming replication ?

2012-03-19 Thread Jeff Davis

On Mon, 2012-03-19 at 15:30 -0400, Bruce Momjian wrote:
> On Thu, Mar 01, 2012 at 02:01:31PM -0800, Lonni J Friedman wrote:
> > I've got a 3 node cluster (1 master/2 slaves) running 9.0.x with
> > streaming replication.  I'm in the planning stages of upgrading to
> > 9.1.x, and am looking into the most efficient way to do the upgrade
> > with the goal of minimizing downtime & risk.  After googling, the only
> > discussion that I've found of using pg_upgrade with a streaming
> > replication setup seems to be this (nearly) year old thread:
> > http://web.archiveorange.com/archive/v/9FNVlDWGQtpyWVL54jlK
> > 
> > In summary, there is no way to use both pg_upgrade and streaming
> > replication simultaneously.  I'd have to either use pg_upgrade and
> > then effectively rebuild/redeploy the slaves, or not use pg_upgrade,
> > and reimport all of the data.  Is that still the latest status, or are
> > there other options?
> 
> You can shut down all three servers, run pg_upgrade on all of them, then
> restart them as 9.1 servers.

After running pg_upgrade on each server individually, they will have
different system IDs, and potentially different on-disk representation
of the catalogs, right?

So how can you resume streaming without rebuilding the slaves?

Regards,
Jeff Davis


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Anonymized database dumps

2012-03-19 Thread Bill Moran

In response to Kiriakos Georgiou :

> The data anonymizer process is flawed because you are one misstep away from 
> data spillage.

In our case, it's only one layer.

Other layers that exist:
* The systems where this test data is instantiated can't send email
* The systems where this exist have limited access (i.e., not all
  developers can access it, and it's not used for typical testing --
  only for specific testing that requires production-like data)

You are correct, however, in that there's always the danger of
spillage if new sensitive data is added and the sanitation script
is not properly updated.  It's part of the ongoing overhead of
maintaining such a system.

> Sensitive data should be stored encrypted to begin.  For test databases you 
> or your developers can invoke a process that replaces the real encrypted data 
> with fake encrypted data (for which everybody has the key/password.)  Or if 
> the overhead is too much (ie billions of rows), you can have different 
> decrypt() routines on your test databases that return fake data without 
> touching the real encrypted columns.

The thing is, this process has the same potential data spillage
issues as sanitizing the data.  I find it intriguing, however, and
I'm going to see if there are places where this approach might
have advantages over our current one.

Since much of our sensitive data is already de-identified, it
provides an additional level of protection on that level as well.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Anonymized database dumps

2012-03-19 Thread Kiriakos Georgiou

The data anonymizer process is flawed because you are one misstep away from 
data spillage.  Sensitive data should be stored encrypted to begin.  For test 
databases you or your developers can invoke a process that replaces the real 
encrypted data with fake encrypted data (for which everybody has the 
key/password.)  Or if the overhead is too much (ie billions of rows), you can 
have different decrypt() routines on your test databases that return fake data 
without touching the real encrypted columns.

Kiriakos

On Mar 19, 2012, at 8:22 AM, Bill Moran wrote:

> In response to Janning Vygen :
>> 
>> I am working on postgresql 9.1 and loving it!
>> 
>> Sometimes we need a full database dump to test some performance issues 
>> with real data.
>> 
>> Of course we don't like to have sensible data like bunches of e-mail 
>> addresses on our development machines as they are of no interest for 
>> developers and should be kept secure.
>> 
>> So we need an anonymized database dump. I thought about a few ways to 
>> achieve this.
>> 
>> 1. Best solution would be a special db user and some rules which fire on 
>> reading some tables and replace privacy data with some random data. Now 
>> doing a dump as this special user doesn't even copy the sensible data at 
>> all. The user just has a different view on this database even when he 
>> calls pg_dump.
>> 
>> But as rules are not fired on COPY it can't work, right?
>> 
>> 2. The other solution I can think of is something like
>> 
>> pg_dump | sed > pgdump_anon
>> 
>> where 'sed' does a lot of magical replace operations on the content of 
>> the dump. I don't think this is going to work reliable.
>> 
>> 3. More reliable would be to dump the database, restore it on a 
>> different server, run some sql script which randomize some data, and 
>> dump it again. hmm, seems to be the only reliable way so far. But it is 
>> no fun when dumping and restoring takes an hour.
>> 
>> Does anybody has a better idea how to achieve an anonymized database dump?
> 
> I highly recommend #3.  It's how we do it where I work.
> 
> At first it seems like a big, slow, complicated monster, but once you've
> built the tools and have it running reliably it's very nice.  Our system
> does the dumps overnight via cron (we have over 100 production databases)
> then changes the sensitive data, as well changing all the passwords to
> "password" so developers can easily log in as any account.  During the
> day, the developers have access to all the sanitized dump files and can
> use them to make as many testing databases as they need.  Yes, the data
> gets up to 24 hours out of date, but it's never been a problem for us.
> 
> -- 
> Bill Moran
> http://www.potentialtech.com
> http://people.collaborativefusion.com/~wmoran/
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] nice'ing the postgres COPY backend process to make pg_dumps run more "softly"

2012-03-19 Thread Steve Crawford


On 03/19/2012 01:51 PM, Guillaume Lelarge wrote:

On Sun, 2012-03-18 at 21:06 -0700, Aleksey Tsalolikhin wrote:

Hi.  When pg_dump runs, our application becomes inoperative (too
slow)

Depends on what your app is doing. It doesn't block any usual use of the
database: DML are all accepted. But you cannot drop a table that pg_dump
must save, you cannot change its definition. So there are some DDL
commands you cannot use during a dump

Dumping may not technically block access but it *does*, of course, 
consume resources.


Most obvious is that it requires reading all table data in entirety. 
This will cause competition for disk access and may cause active data to 
be temporarily pushed out of cache.


You also have to write the data somewhere. If it is on the same drive as 
your database, you will have write competition. If it is on another 
machine it will use network resources.


If you are compressing the data either externally or using a compressed 
dump format, you will need more CPU to handle the compression on 
whatever machine is doing the actual compression.


To assist, we need more info. Tell us the database size, some details 
about your dump process (same or different machine, compression, etc.), 
how long your dumps take to run, how many backends are typically running 
and how many you reach during a dump, whether or not any web processes 
alter tables and other info you think may be of use.


Cheers,
Steve




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Conditionnal validation for transaction

2012-03-19 Thread Florent THOMAS




Le 19/03/2012 16:57, Simon Riggs a écrit :

On Mon, Mar 19, 2012 at 3:28 PM, Florent THOMAS  wrote:

Hy all of you,

1 - Is there a way to have conditions for committing transactions like in
oracle :
http://www.scribd.com/doc/42831667/47/Validation-conditionnelle-de-transaction-62

2 - Is there a way to declare local variables for a SQL statement without
beiing in a function?

PostgreSQL follows the SQL standard which does not allow anything like that.

Later versions do allow anonymous blocks, also known as DO statements
that allow you to execute some code to allow decision making like
that. So the Oracle example is very similar code in PostgreSQL, except
that you can't issue ROLLBACK and COMMIT.

Thanks, Could you precise the sentence bellow

  But then you don't need to
because you*can do a conditional error or drop through to a commit*.

How do you do that?

Regards

Re: [GENERAL] nice'ing the postgres COPY backend process to make pg_dumps run more "softly"

2012-03-19 Thread Guillaume Lelarge

On Sun, 2012-03-18 at 21:06 -0700, Aleksey Tsalolikhin wrote:
> Hi.  When pg_dump runs, our application becomes inoperative (too
> slow).  I was going to ask if nice'ing the postgres backend process
> that handles the COPY would help but I just realized probably the
> pg_dump takes out locks when it runs and nice'ing it would just make
> it run longer...
> 
> However the man page says "pg_dump does not block other users
> accessing the database  (readers  or writers)."  But if we run a
> pg_dump, the phone starts ringing, users are complaining that the web
> app is not working.
> 
> Would appreciate some pointer to help me reconcile these two
> apparently contradictory facts.
> 

Depends on what your app is doing. It doesn't block any usual use of the
database: DML are all accepted. But you cannot drop a table that pg_dump
must save, you cannot change its definition. So there are some DDL
commands you cannot use during a dump.

Other than this, your users shouldn't be blocked. If it happens again,
you should look at pg_locks and pg_stat_activity to understand what's
going on.

-- 
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] current thinking on Amazon EC2?

2012-03-19 Thread Mike Christensen

> On Mon 3/19/2012 4:30 PM Mike Christensen writes:
>
>>I've been running my site on RackSpace CloudServers (similar to EC2)
>>and have been getting pretty good performance, though I don't have
>>huge amounts of database load.
>
>>One advantage, though, is RackSpace allows for hybrid solutions so I
>>could potentially lease a dedicated server and continue to host my web
>>frontend servers on the cloud.
>
> that's good to know, although for the project i'm working on, EC2 is
> what we have to work with, good parts and bad parts and all.

I know Heroku is built on EC2 and runs Postgres, so I would assume
they've got it set up to get pretty good performance..

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] current thinking on Amazon EC2?

2012-03-19 Thread Welty, Richard

On Mon 3/19/2012 4:30 PM Mike Christensen writes:

>I've been running my site on RackSpace CloudServers (similar to EC2)
>and have been getting pretty good performance, though I don't have
>huge amounts of database load.

>One advantage, though, is RackSpace allows for hybrid solutions so I
>could potentially lease a dedicated server and continue to host my web
>frontend servers on the cloud.

that's good to know, although for the project i'm working on, EC2 is
what we have to work with, good parts and bad parts and all.

richard

Re: [GENERAL] usage of pg_get_functiondef() -- SQL state 42809

2012-03-19 Thread Tom Lane

 writes:
> -- This blows up.  -- SQL state: 42809 -- ERROR: "array_agg" is an aggregate 
> function

> select
>   TRG.tgname,  TFX.proname,  pg_get_functiondef(TFX.oid) as fdef
> from
>   pg_trigger TRG
>   inner join pg_proc TFX on TFX.oid = TRG.tgfoid
> where
>   TRG.tgisinternal = true
>   and
>   pg_get_functiondef(TFX.oid) = 'whatever'

> Can you help me understand why this blows up ?

The second part of the WHERE clause can be evaluated against pg_proc
rows for which pg_get_functiondef() will fail.

An easy workaround would be to use TRG.tgfoid instead, so that the WHERE
clause gets pushed down to the other table.  There probably shouldn't be
any entries in pg_trigger for which pg_get_functiondef() will fail.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] current thinking on Amazon EC2?

2012-03-19 Thread Mike Christensen

On Mon, Mar 19, 2012 at 11:16 AM, Ben Chobot  wrote:
> On Mar 19, 2012, at 10:59 AM, Welty, Richard wrote:
>
> i just finished this thread from May of last year, and am wondering if this
> still represents consensus thinking about postgresql deployments in the EC2
> cloud:
>
> http://postgresql.1045698.n5.nabble.com/amazon-ec2-td4368036.html
>
>
> Yes, I believe that still sums up the situation pretty well.

I've been running my site on RackSpace CloudServers (similar to EC2)
and have been getting pretty good performance, though I don't have
huge amounts of database load.

One advantage, though, is RackSpace allows for hybrid solutions so I
could potentially lease a dedicated server and continue to host my web
frontend servers on the cloud.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] usage of pg_get_functiondef() -- SQL state 42809

2012-03-19 Thread david.sahagian

-- This works.

select
  TRG.tgname,  TFX.proname,  pg_get_functiondef(TFX.oid) as fdef
from
  pg_trigger TRG
  inner join pg_proc TFX on TFX.oid = TRG.tgfoid
where
  TRG.tgisinternal = true


-- This blows up.  -- SQL state: 42809 -- ERROR: "array_agg" is an aggregate 
function

select
  TRG.tgname,  TFX.proname,  pg_get_functiondef(TFX.oid) as fdef
from
  pg_trigger TRG
  inner join pg_proc TFX on TFX.oid = TRG.tgfoid
where
  TRG.tgisinternal = true
  and
  pg_get_functiondef(TFX.oid) = 'whatever'


Can you help me understand why this blows up ?
I am running "PostgreSQL 9.1.2, compiled by Visual C++ build 1500, 64-bit"

-dvs-


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_upgrade + streaming replication ?

2012-03-19 Thread Lonni J Friedman

On Mon, Mar 19, 2012 at 12:30 PM, Bruce Momjian  wrote:
> On Thu, Mar 01, 2012 at 02:01:31PM -0800, Lonni J Friedman wrote:
>> I've got a 3 node cluster (1 master/2 slaves) running 9.0.x with
>> streaming replication.  I'm in the planning stages of upgrading to
>> 9.1.x, and am looking into the most efficient way to do the upgrade
>> with the goal of minimizing downtime & risk.  After googling, the only
>> discussion that I've found of using pg_upgrade with a streaming
>> replication setup seems to be this (nearly) year old thread:
>> http://web.archiveorange.com/archive/v/9FNVlDWGQtpyWVL54jlK
>>
>> In summary, there is no way to use both pg_upgrade and streaming
>> replication simultaneously.  I'd have to either use pg_upgrade and
>> then effectively rebuild/redeploy the slaves, or not use pg_upgrade,
>> and reimport all of the data.  Is that still the latest status, or are
>> there other options?
>
> You can shut down all three servers, run pg_upgrade on all of them, then
> restart them as 9.1 servers.
>

Thanks for your reply.  This is very good news.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_upgrade + streaming replication ?

2012-03-19 Thread Bruce Momjian

On Thu, Mar 01, 2012 at 02:01:31PM -0800, Lonni J Friedman wrote:
> I've got a 3 node cluster (1 master/2 slaves) running 9.0.x with
> streaming replication.  I'm in the planning stages of upgrading to
> 9.1.x, and am looking into the most efficient way to do the upgrade
> with the goal of minimizing downtime & risk.  After googling, the only
> discussion that I've found of using pg_upgrade with a streaming
> replication setup seems to be this (nearly) year old thread:
> http://web.archiveorange.com/archive/v/9FNVlDWGQtpyWVL54jlK
> 
> In summary, there is no way to use both pg_upgrade and streaming
> replication simultaneously.  I'd have to either use pg_upgrade and
> then effectively rebuild/redeploy the slaves, or not use pg_upgrade,
> and reimport all of the data.  Is that still the latest status, or are
> there other options?

You can shut down all three servers, run pg_upgrade on all of them, then
restart them as 9.1 servers.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] current thinking on Amazon EC2?

2012-03-19 Thread Ben Chobot

On Mar 19, 2012, at 10:59 AM, Welty, Richard wrote:

> i just finished this thread from May of last year, and am wondering if this 
> still represents consensus thinking about postgresql deployments in the EC2 
> cloud:
> 
> http://postgresql.1045698.n5.nabble.com/amazon-ec2-td4368036.html
> 

Yes, I believe that still sums up the situation pretty well.

[GENERAL] Slow information_schema.views

2012-03-19 Thread Oliver Kohll - Mailing Lists

Hello,

I'm doing some SELECTs from information_schema.views to find views with 
dependencies on other views, i.e.

SELECT table_name FROM information_schema.views WHERE view_definition ILIKE 
'%myviewname%';

and each is taking about 1/2 a second, which is getting a bit slow for my use. 
There are 1213 views listed in information_schema.views

Doing an explain analyze, it looks like the issue is likely to be the 
pg_get_viewdef function or one of the privilege check functions. I'm not 
worried about privilege checks and I don't need a nicely formatted definition. 
Is there a way of finding out how pg_get_viewdef works so I can perhaps do a 
lower level query?

I've previously used pg_catalog.pg_views which performs similarly.

Or is there a better way of finding view dependencies? I see there's a 
pg_catalog entry for tables that a view depends on but that's not what I'm 
after.

Regards
Oliver Kohll
www.agilebase.co.uk
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Conditionnal validation for transaction

2012-03-19 Thread Simon Riggs

On Mon, Mar 19, 2012 at 3:28 PM, Florent THOMAS  wrote:
> Hy all of you,
>
> 1 - Is there a way to have conditions for committing transactions like in
> oracle :
> http://www.scribd.com/doc/42831667/47/Validation-conditionnelle-de-transaction-62
>
> 2 - Is there a way to declare local variables for a SQL statement without
> beiing in a function?

PostgreSQL follows the SQL standard which does not allow anything like that.

Later versions do allow anonymous blocks, also known as DO statements
that allow you to execute some code to allow decision making like
that. So the Oracle example is very similar code in PostgreSQL, except
that you can't issue ROLLBACK and COMMIT. But then you don't need to
because you can do a conditional error or drop through to a commit.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Conditionnal validation for transaction

2012-03-19 Thread mgould

2.  The short answer is No.  I've got the same issue.  I come from a
different sql that had a CREATE VARAIBLE which was good for the session.
 With PostGres, I've created a sessionsettings table and a bunch of
functions to get by variable and use the value.  My perceived downside
is that this causes a lot of calls to be made to get the data instead of
setting them one time (for most items).  I've been told that the table
will probably be cached so it will cost very little.

Michael Gould
Intermodal Software Solutions, LLC
904-226-0978



 Original Message 
Subject: [GENERAL] Conditionnal validation for transaction
From: Florent THOMAS 
Date: Mon, March 19, 2012 8:28 am
To: pgsql-general@postgresql.org

 Hy all of you,
 
 1 - Is there a way to have conditions for committing transactions like
in oracle :
http://www.scribd.com/doc/42831667/47/Validation-conditionnelle-de-transaction-62
 
 2 - Is there a way to declare local variables for a SQL statement
without beiing in a function?
 
 regards


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Conditionnal validation for transaction

2012-03-19 Thread Florent THOMAS


Hy all of you,

1 - Is there a way to have conditions for committing transactions like 
in oracle : 
http://www.scribd.com/doc/42831667/47/Validation-conditionnelle-de-transaction-62


2 - Is there a way to declare local variables for a SQL statement 
without beiing in a function?


regards

Re: [GENERAL] Anonymized database dumps

2012-03-19 Thread Janning Vygen

Am 19.03.2012 um 13:22 schrieb Bill Moran :

> In response to Janning Vygen :
>> 
>> I am working on postgresql 9.1 and loving it!
>> 
>> Sometimes we need a full database dump to test some performance issues 
>> with real data.
>> 
>> Of course we don't like to have sensible data like bunches of e-mail 
>> addresses on our development machines as they are of no interest for 
>> developers and should be kept secure.
>> 
>> So we need an anonymized database dump. I thought about a few ways to 
>> achieve this.
>> 
>> 1. Best solution would be a special db user and some rules which fire on 
>> reading some tables and replace privacy data with some random data. Now 
>> doing a dump as this special user doesn't even copy the sensible data at 
>> all. The user just has a different view on this database even when he 
>> calls pg_dump.
>> 
>> But as rules are not fired on COPY it can't work, right?
>> 
>> 2. The other solution I can think of is something like
>> 
>> pg_dump | sed > pgdump_anon
>> 
>> where 'sed' does a lot of magical replace operations on the content of 
>> the dump. I don't think this is going to work reliable.
>> 
>> 3. More reliable would be to dump the database, restore it on a 
>> different server, run some sql script which randomize some data, and 
>> dump it again. hmm, seems to be the only reliable way so far. But it is 
>> no fun when dumping and restoring takes an hour.
>> 
>> Does anybody has a better idea how to achieve an anonymized database dump?
> 
> I highly recommend #3.  It's how we do it where I work.
> 
> At first it seems like a big, slow, complicated monster, but once you've
> built the tools and have it running reliably it's very nice.  Our system
> does the dumps overnight via cron (we have over 100 production databases)
> then changes the sensitive data, as well changing all the passwords to
> "password" so developers can easily log in as any account.  During the
> day, the developers have access to all the sanitized dump files and can
> use them to make as many testing databases as they need.  Yes, the data
> gets up to 24 hours out of date, but it's never been a problem for us.

Thanks for your response and your insights to your process. Sounds reasonable. 

Regards
Janning
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] WHERE IN (subselect) versus WHERE IN (1,2,3,)

2012-03-19 Thread Tom Lane

Kevin Goess  writes:
> On Mon, Mar 19, 2012 at 9:24 AM, Albe Laurenz wrote:
>> That means that your statistics are not accurate.

> Aha, thanks, that explains why my test table with one row was so bad.  But
> even with all freshly ANALYZE'd tables, I still see the query reverting to
> a sequential scan on that big contexts table once the number of rows in the
> subselect goes over 199.  Here's a simplified version that demonstrates the
> problem.

You've still got a nasty join-size estimation error:

>  ->  Nested Loop  (cost=6.18..1939.43 rows=411736 width=8) (actual
> time=0.203..3.487 rows=35 loops=1)

It's not apparent why that's so far off ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Anonymized database dumps

2012-03-19 Thread Bill Moran

In response to Janning Vygen :
> 
> I am working on postgresql 9.1 and loving it!
> 
> Sometimes we need a full database dump to test some performance issues 
> with real data.
> 
> Of course we don't like to have sensible data like bunches of e-mail 
> addresses on our development machines as they are of no interest for 
> developers and should be kept secure.
> 
> So we need an anonymized database dump. I thought about a few ways to 
> achieve this.
> 
> 1. Best solution would be a special db user and some rules which fire on 
> reading some tables and replace privacy data with some random data. Now 
> doing a dump as this special user doesn't even copy the sensible data at 
> all. The user just has a different view on this database even when he 
> calls pg_dump.
> 
> But as rules are not fired on COPY it can't work, right?
> 
> 2. The other solution I can think of is something like
> 
> pg_dump | sed > pgdump_anon
> 
> where 'sed' does a lot of magical replace operations on the content of 
> the dump. I don't think this is going to work reliable.
> 
> 3. More reliable would be to dump the database, restore it on a 
> different server, run some sql script which randomize some data, and 
> dump it again. hmm, seems to be the only reliable way so far. But it is 
> no fun when dumping and restoring takes an hour.
> 
> Does anybody has a better idea how to achieve an anonymized database dump?

I highly recommend #3.  It's how we do it where I work.

At first it seems like a big, slow, complicated monster, but once you've
built the tools and have it running reliably it's very nice.  Our system
does the dumps overnight via cron (we have over 100 production databases)
then changes the sensitive data, as well changing all the passwords to
"password" so developers can easily log in as any account.  During the
day, the developers have access to all the sanitized dump files and can
use them to make as many testing databases as they need.  Yes, the data
gets up to 24 hours out of date, but it's never been a problem for us.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] nice'ing the postgres COPY backend process to make pg_dumps run more "softly"

2012-03-19 Thread Kiriakos Georgiou

Try this:   http://klicman.org/throttle/

Kiriakos


On Mar 19, 2012, at 12:06 AM, Aleksey Tsalolikhin wrote:

> Hi.  When pg_dump runs, our application becomes inoperative (too
> slow).  I was going to ask if nice'ing the postgres backend process
> that handles the COPY would help but I just realized probably the
> pg_dump takes out locks when it runs and nice'ing it would just make
> it run longer...
> 
> However the man page says "pg_dump does not block other users
> accessing the database  (readers  or writers)."  But if we run a
> pg_dump, the phone starts ringing, users are complaining that the web
> app is not working.
> 
> Would appreciate some pointer to help me reconcile these two
> apparently contradictory facts.
> 
> Best,
> -at
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] WHERE IN (subselect) versus WHERE IN (1,2,3,)

2012-03-19 Thread Kevin Goess

On Mon, Mar 19, 2012 at 9:24 AM, Albe Laurenz wrote:

> That means that your statistics are not accurate.
>
> As a first measure, you should ANALYZE the tables involved and see if
> the problem persists.  If yes, post the new plans.
>

Aha, thanks, that explains why my test table with one row was so bad.  But
even with all freshly ANALYZE'd tables, I still see the query reverting to
a sequential scan on that big contexts table once the number of rows in the
subselect goes over 199.  Here's a simplified version that demonstrates the
problem.

production=> explain (analyze, buffers) SELECT contexts.context_key FROM
contexts JOIN articles ON (articles.context_key=contexts.context_key) WHERE
contexts.context_key IN (SELECT context_key FROM virtual_ancestors limit
200) AND articles.indexed;
 QUERY
PLAN
-
 Hash Join  (cost=7086.13..219322.15 rows=411736 width=4) (actual
time=50.118..1213.046 rows=35 loops=1)
   Hash Cond: (contexts.context_key = articles.context_key)
   Buffers: shared hit=72539 read=100104
   ->  Seq Scan on contexts  (cost=0.00..190285.83 rows=1783283 width=4)
(actual time=0.040..769.891 rows=1786074 loops=1)
 Buffers: shared hit=72399 read=100054
   ->  Hash  (cost=1939.43..1939.43 rows=411736 width=8) (actual
time=3.510..3.510 rows=35 loops=1)
 Buckets: 65536  Batches: 1  Memory Usage: 2kB
 Buffers: shared hit=140 read=50
 ->  Nested Loop  (cost=6.18..1939.43 rows=411736 width=8) (actual
time=0.203..3.487 rows=35 loops=1)
   Buffers: shared hit=140 read=50
   ->  HashAggregate  (cost=6.18..8.18 rows=200 width=4)
(actual time=0.174..0.198 rows=48 loops=1)
 Buffers: shared read=2
 ->  Limit  (cost=0.00..3.68 rows=200 width=4) (actual
time=0.015..0.108 rows=200 loops=1)
   Buffers: shared read=2
   ->  Seq Scan on virtual_ancestors
(cost=0.00..87676.17 rows=4759617 width=4) (actual time=0.015..0.075
rows=200 loops=1)
 Buffers: shared read=2
   ->  Index Scan using articles_pkey on articles
(cost=0.00..9.64 rows=1 width=4) (actual time=0.015..0.068 rows=1 loops=48)
 Index Cond: (articles.context_key =
virtual_ancestors.context_key)
 Filter: articles.indexed
 Buffers: shared hit=140 read=48
 Total runtime: 1213.138 ms
(21 rows)


But if I write the keys in the subquery inline, I get a very nice execution
plan, all the way up to a tested maximum of about 50,000 keys:


production=> explain (analyze, buffers) SELECT contexts.context_key FROM
contexts JOIN articles ON (articles.context_key=contexts.context_key) WHERE
contexts.context_key IN (2482612,2482612,...) AND articles.indexed;

QUERY PLAN
-
 Nested Loop  (cost=758.71..3418.40 rows=200 width=4) (actual
time=0.621..1.089 rows=35 loops=1)
   Buffers: shared hit=826 read=1
   ->  Bitmap Heap Scan on contexts  (cost=752.58..1487.55 rows=200
width=4) (actual time=0.604..0.699 rows=48 loops=1)
 Recheck Cond: (context_key = ANY
('{2482612,2482612,...}'::integer[]))
 Buffers: shared hit=639
 ->  Bitmap Index Scan on contexts_pkey  (cost=0.00..752.53
rows=200 width=0) (actual time=0.591..0.591 rows=200 loops=1)
   Index Cond: (context_key = ANY
('{2482612,2482612,...}'::integer[]))
   Buffers: shared hit=600
   ->  Bitmap Heap Scan on articles  (cost=6.13..9.64 rows=1 width=4)
(actual time=0.007..0.007 rows=1 loops=48)
 Recheck Cond: (articles.context_key = contexts.context_key)
 Filter: articles.indexed
 Buffers: shared hit=187 read=1
 ->  Bitmap Index Scan on articles_pkey  (cost=0.00..6.13 rows=1
width=0) (actual time=0.005..0.005 rows=1 loops=48)
   Index Cond: (articles.context_key = contexts.context_key)
   Buffers: shared hit=148
 Total runtime: 1.147 ms

Is this expected behavior, that writing the ids inline does much better
than the subquery?  I've been told that it's not, but this isn't the first
time I've seen this, so I feel like I'm not understanding something.

Re: [GENERAL] Multi server query

2012-03-19 Thread Florent THOMAS


Great thanks to all of you regards

Le 19/03/2012 09:58, Sergey Konoplev a écrit :

Hi,

On Mon, Mar 19, 2012 at 12:12 AM, Florent THOMAS  wrote:

How do you query multi servers and multi databases on postgresql?


Look at this http://wiki.postgresql.org/wiki/PL/Proxy

Re: [GENERAL] Anonymized database dumps

2012-03-19 Thread Marko Kreen

On Mon, Mar 19, 2012 at 10:12:01AM +0100, hari.fu...@gmail.com wrote:
> Janning Vygen  writes:
> > pgcrypto does not work for this scenario as far as i know.
> >
> > pgcrypto enables me to encrypt my data and let only a user with the
> > right password (or key or whatever) decrypt it, right? So if i run it
> > in a test environment without this password the application is broken.
> >
> > I still want to use these table columns in my test environment but
> > instead of real email addresses i want addresses like
> > random_num...@example.org.
> >
> > You might be right that it is a good idea to additional encrypt this data.
> 
> Maybe you could change your application so that it doesn't access the
> critical tables directly and instead define views for them which, based
> on current_user, either do decryption or return randim strings.

Encryption is wrong tool for "anonymization".

The right tool is hmac() which gives you one-way hash that
is protected by key, which means other side can't even
calcutate the hashes unless they have same key.

You can calculate it with pgcrypto when dumping,
or later post-processing the dumps.

But it produces random values, if you need something
realistic-looking you need custom mapping logic.

-- 
marko


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Multi server query

2012-03-19 Thread John R Pierce


On 03/18/12 1:12 PM, Florent THOMAS wrote:

How do you query _multi servers_ and multi databases on postgresql?


other than plproxy, there's dblink, in the contrib collection.   this 
lets you query another database or server from within a SQL query






--
john r pierceN 37, W 122
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Anonymized database dumps

2012-03-19 Thread hari . fuchs

Janning Vygen  writes:

> pgcrypto does not work for this scenario as far as i know.
>
> pgcrypto enables me to encrypt my data and let only a user with the
> right password (or key or whatever) decrypt it, right? So if i run it
> in a test environment without this password the application is broken.
>
> I still want to use these table columns in my test environment but
> instead of real email addresses i want addresses like
> random_num...@example.org.
>
> You might be right that it is a good idea to additional encrypt this data.

Maybe you could change your application so that it doesn't access the
critical tables directly and instead define views for them which, based
on current_user, either do decryption or return randim strings.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Multi server query

2012-03-19 Thread Sergey Konoplev

Hi,

On Mon, Mar 19, 2012 at 12:12 AM, Florent THOMAS  wrote:
> How do you query multi servers and multi databases on postgresql?


Look at this http://wiki.postgresql.org/wiki/PL/Proxy


-- 
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray...@gmail.com Skype: gray-hemp

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] WHERE IN (subselect) versus WHERE IN (1,2,3,)

2012-03-19 Thread Albe Laurenz

Kevin Goess wrote:
> We have a table "contexts" with 1.6 million rows, and a table
"articles" with 1.4 million rows, where
> an "article" is a particular kind of "context".  We want to select
from a join on those two tables
> like this
> 
> SELECT COUNT(*)
> FROM contexts
> JOIN articles ON (articles.context_key=contexts.context_key)
> WHERE contexts.context_key IN (...);
> /* and some combination of columns from articles and contexts */
> 
> If "IN(...)" is a query, then this guy does a seq scan on the contexts
table, even if the subquery is
> "select col_a from kgtest" where kgtest has one row.  If however I
read the ids beforehand and write
> them into the query, a la "IN (111,222,333...)", then the everything
is happy, up to at least 20,000
> values written into the sql, at which point smaller machines will take
2-5 minutes to parse the query.
> 
> I can certainly write the ids inline into the SQL, but when I do that
I get the distinct impression
> that I'm Doing It Wrong.  Is this expected behavior?  It seems
surprising to me.
> 
> 
> To demonstrate:
> 
> /* nothing up my sleeve */
> # select * from kgtest;
>   cola
> -
>  1652729
> (1 row)

[...]

>   /* subselect, query plan does seq scan on contexts */
[...]
> ->  Seq Scan on kgtest  (cost=0.00..34.00 rows=2400 width=4)
(actual time=0.048..0.050 rows
[...]

There is something missing in this line, but according to what you wrote
it must be "actual [...] rows=1", And yet the planner assumes that the
scan will return 2400 rows.
That means that your statistics are not accurate.

As a first measure, you should ANALYZE the tables involved and see if
the problem persists.  If yes, post the new plans.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Anonymized database dumps

2012-03-19 Thread Janning Vygen


pgcrypto does not work for this scenario as far as i know.

pgcrypto enables me to encrypt my data and let only a user with the 
right password (or key or whatever) decrypt it, right? So if i run it in 
a test environment without this password the application is broken.


I still want to use these table columns in my test environment but 
instead of real email addresses i want addresses like 
random_num...@example.org.


You might be right that it is a good idea to additional encrypt this data.

regards
Janning

Am 19.03.2012 06:24, schrieb Kiriakos Georgiou:

I would store sensitive data encrypted in the database.  Check the pgcrypto 
module.

Kiriakos


On Mar 18, 2012, at 1:00 PM, Janning Vygen wrote:


Hi,

I am working on postgresql 9.1 and loving it!

Sometimes we need a full database dump to test some performance issues with 
real data.

Of course we don't like to have sensible data like bunches of e-mail addresses 
on our development machines as they are of no interest for developers and 
should be kept secure.

So we need an anonymized database dump. I thought about a few ways to achieve 
this.

1. Best solution would be a special db user and some rules which fire on 
reading some tables and replace privacy data with some random data. Now doing a 
dump as this special user doesn't even copy the sensible data at all. The user 
just has a different view on this database even when he calls pg_dump.

But as rules are not fired on COPY it can't work, right?

2. The other solution I can think of is something like

pg_dump | sed>  pgdump_anon

where 'sed' does a lot of magical replace operations on the content of the 
dump. I don't think this is going to work reliable.

3. More reliable would be to dump the database, restore it on a different 
server, run some sql script which randomize some data, and dump it again. hmm, 
seems to be the only reliable way so far. But it is no fun when dumping and 
restoring takes an hour.

Does anybody has a better idea how to achieve an anonymized database dump?

regards
Janning





--
Kicktipp GmbH

Venloer Straße 8, 40477 Düsseldorf
Sitz der Gesellschaft: Düsseldorf
Geschäftsführung: Janning Vygen
Handelsregister Düsseldorf: HRB 55639

http://www.kicktipp.de/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




--
Kicktipp GmbH

Venloer Straße 8, 40477 Düsseldorf
Sitz der Gesellschaft: Düsseldorf
Geschäftsführung: Janning Vygen
Handelsregister Düsseldorf: HRB 55639

http://www.kicktipp.de/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] WHERE IN (subselect) versus WHERE IN (1,2,3,)

Re: [GENERAL] Anonymized database dumps

Re: [GENERAL] nice'ing the postgres COPY backend process to make pg_dumps run more "softly"

Re: [GENERAL] pg_upgrade + streaming replication ?

Re: [GENERAL] Anonymized database dumps

Re: [GENERAL] Anonymized database dumps

Re: [GENERAL] nice'ing the postgres COPY backend process to make pg_dumps run more "softly"

Re: [GENERAL] Conditionnal validation for transaction

Re: [GENERAL] nice'ing the postgres COPY backend process to make pg_dumps run more "softly"

Re: [GENERAL] current thinking on Amazon EC2?

Re: [GENERAL] current thinking on Amazon EC2?

Re: [GENERAL] usage of pg_get_functiondef() -- SQL state 42809

Re: [GENERAL] current thinking on Amazon EC2?

[GENERAL] usage of pg_get_functiondef() -- SQL state 42809

Re: [GENERAL] pg_upgrade + streaming replication ?

Re: [GENERAL] pg_upgrade + streaming replication ?

Re: [GENERAL] current thinking on Amazon EC2?

[GENERAL] Slow information_schema.views

Re: [GENERAL] Conditionnal validation for transaction

Re: [GENERAL] Conditionnal validation for transaction

[GENERAL] Conditionnal validation for transaction

Re: [GENERAL] Anonymized database dumps

Re: [GENERAL] WHERE IN (subselect) versus WHERE IN (1,2,3,)

Re: [GENERAL] Anonymized database dumps

Re: [GENERAL] nice'ing the postgres COPY backend process to make pg_dumps run more "softly"

Re: [GENERAL] WHERE IN (subselect) versus WHERE IN (1,2,3,)

Re: [GENERAL] Multi server query

Re: [GENERAL] Anonymized database dumps

Re: [GENERAL] Multi server query

Re: [GENERAL] Anonymized database dumps

Re: [GENERAL] Multi server query

Re: [GENERAL] WHERE IN (subselect) versus WHERE IN (1,2,3,)

Re: [GENERAL] Anonymized database dumps

33 matches

Site Navigation

Mail list logo

Footer information