Re: [GENERAL] config file question between versions 7.4 - 9.1

2012-05-10 Thread Albe Laurenz
Randy Johnson wrote:
 in the config file for 7.4 we have an entry:
 
 shared_buffers = 1000 # min 16, at least max_connections*2, 8KB each
 
 in 9.1 the default is:
 
 shared_buffers = 32MB
 
 
 max connections is the default 100
 
 Do I need to make any adjustments or can I leave it at the default?
 
 The machine is dedicated to Postgres and has 8GB of memory and a
default install of 9.1 and 7.4
 doesn't appear to have any custom configuration.

Read the documentation at
http://www.postgresql.org/docs/current/static/runtime-config-resource.ht
ml#GUC-SHARED-BUFFERS

I'd set it to something between 500MB and 2GB.
You can use the pg_buffercache contrib module to check how
the buffer cache is used and adjust accordingly.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Prepared statements performance

2012-05-10 Thread Daniel McGreal
Hi!

My reading to date suggests that prepared statements should be faster to
execute than issuing the same statement multiple times. However, issuing
100'000 INSERTs turned out to be more than ten times faster than executing
the same prepared statement 100'000 times when executed via pgAdmin. The
table was:

CREATE TABLE test
(
  one date,
  two boolean,
  three character varying,
  four integer,
  five numeric(18,5),
  id serial NOT NULL --note the index here
)

The prepared statement test lasting ~160 seconds was:

TRUNCATE test;

BEGIN;
PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3, $4,
$5);

EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
-- 99'999 more executes...
END;

The insertion test lasting ~12 seconds was:

TRUNCATE test;

BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01', true,
'three', 4, 5.5);
-- 99'999 more inserts...
END;

I'm assuming then that I've done something mistakenly.

Many thanks,
Dan.


Re: [GENERAL] Prepared statements performance

2012-05-10 Thread Daniel McGreal
Hi again,

I did a follow up test using 'multi-value' inserts which is three times
faster than multiple inserts thusly:

TRUNCATE test;
BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01', true,
'three', 4, 5.5)
,('2011-01-01', true, 'three', 4, 5.5)
-- 99'998 more , ('2011-01-01', true, 'three', 4, 5.5) ...;
END;

This is the kind of speed increase I was hoping for when using prepared
statements (which makes sense because in this multi-value insert the query
is only being planned once?).

Thanks,
Dan.
P.S. Mac OS X 10.7.3 using PostgreSQL 9.1.2.

On Thu, May 10, 2012 at 9:25 AM, Daniel McGreal
daniel.mcgr...@redbite.comwrote:

 Hi!

 My reading to date suggests that prepared statements should be faster to
 execute than issuing the same statement multiple times. However, issuing
 100'000 INSERTs turned out to be more than ten times faster than
 executing the same prepared statement 100'000 times when executed via
 pgAdmin. The table was:

 CREATE TABLE test
 (
   one date,
   two boolean,
   three character varying,
   four integer,
   five numeric(18,5),
   id serial NOT NULL --note the index here
 )

 The prepared statement test lasting ~160 seconds was:

 TRUNCATE test;

 BEGIN;
 PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
 INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3,
 $4, $5);

 EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
 -- 99'999 more executes...
 END;

 The insertion test lasting ~12 seconds was:

 TRUNCATE test;

 BEGIN;
 INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01',
 true, 'three', 4, 5.5);
 -- 99'999 more inserts...
 END;

 I'm assuming then that I've done something mistakenly.

 Many thanks,
 Dan.





Re: [GENERAL] Prepared statements performance

2012-05-10 Thread Pavel Stehule
Hello

2012/5/10 Daniel McGreal daniel.mcgr...@redbite.com:
 Hi again,

 I did a follow up test using 'multi-value' inserts which is three times
 faster than multiple inserts thusly:


if you need speed, use a COPY statement - it should be 10x faster than INSERTS

Pavel


 TRUNCATE test;
 BEGIN;
 INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01', true,
 'three', 4, 5.5)

 ,('2011-01-01', true, 'three', 4, 5.5)
 -- 99'998 more , ('2011-01-01', true, 'three', 4, 5.5) ...;
 END;

 This is the kind of speed increase I was hoping for when using prepared
 statements (which makes sense because in this multi-value insert the query
 is only being planned once?).

 Thanks,
 Dan.
 P.S. Mac OS X 10.7.3 using PostgreSQL 9.1.2.


 On Thu, May 10, 2012 at 9:25 AM, Daniel McGreal
 daniel.mcgr...@redbite.com wrote:

 Hi!

 My reading to date suggests that prepared statements should be faster to
 execute than issuing the same statement multiple times. However, issuing
 100'000 INSERTs turned out to be more than ten times faster than executing
 the same prepared statement 100'000 times when executed via pgAdmin. The
 table was:

 CREATE TABLE test
 (
   one date,
   two boolean,
   three character varying,
   four integer,
   five numeric(18,5),
   id serial NOT NULL --note the index here
 )

 The prepared statement test lasting ~160 seconds was:

 TRUNCATE test;

 BEGIN;
 PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
     INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3,
 $4, $5);

 EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
 -- 99'999 more executes...
 END;

 The insertion test lasting ~12 seconds was:

 TRUNCATE test;

 BEGIN;
 INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01',
 true, 'three', 4, 5.5);
 -- 99'999 more inserts...
 END;

 I'm assuming then that I've done something mistakenly.

 Many thanks,
 Dan.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Prepared statements performance

2012-05-10 Thread Daniel McGreal
Hi,

Unfortunately these are experimental conditions. The conditions surrounding
the intended application are such that my two options are prepared
statements or many inserts. I put the multi-value inserts in as I was
curious as to why prepared statements would be slower given they only plan
the query once (as also does the multi-value insert, I assume).

It turns out though that the results are skewed by using pgAdmin. Executing
my scripts from the command line gives much more appropriate results.

Thanks,
Dan.

On Thu, May 10, 2012 at 10:16 AM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 2012/5/10 Daniel McGreal daniel.mcgr...@redbite.com:
  Hi again,
 
  I did a follow up test using 'multi-value' inserts which is three times
  faster than multiple inserts thusly:
 

 if you need speed, use a COPY statement - it should be 10x faster than
 INSERTS

 Pavel

 
  TRUNCATE test;
  BEGIN;
  INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01',
 true,
  'three', 4, 5.5)
 
  ,('2011-01-01', true, 'three', 4, 5.5)
  -- 99'998 more , ('2011-01-01', true, 'three', 4, 5.5) ...;
  END;
 
  This is the kind of speed increase I was hoping for when using prepared
  statements (which makes sense because in this multi-value insert the
 query
  is only being planned once?).
 
  Thanks,
  Dan.
  P.S. Mac OS X 10.7.3 using PostgreSQL 9.1.2.
 
 
  On Thu, May 10, 2012 at 9:25 AM, Daniel McGreal
  daniel.mcgr...@redbite.com wrote:
 
  Hi!
 
  My reading to date suggests that prepared statements should be faster
 to
  execute than issuing the same statement multiple times. However,
 issuing
  100'000 INSERTs turned out to be more than ten times faster than
 executing
  the same prepared statement 100'000 times when executed via pgAdmin.
 The
  table was:
 
  CREATE TABLE test
  (
one date,
two boolean,
three character varying,
four integer,
five numeric(18,5),
id serial NOT NULL --note the index here
  )
 
  The prepared statement test lasting ~160 seconds was:
 
  TRUNCATE test;
 
  BEGIN;
  PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
  INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3,
  $4, $5);
 
  EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
  -- 99'999 more executes...
  END;
 
  The insertion test lasting ~12 seconds was:
 
  TRUNCATE test;
 
  BEGIN;
  INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01',
  true, 'three', 4, 5.5);
  -- 99'999 more inserts...
  END;
 
  I'm assuming then that I've done something mistakenly.
 
  Many thanks,
  Dan.
 
 
 



[GENERAL] vacuum, vacuum full and problems releasing disk space

2012-05-10 Thread Horaci Macias

Hi everybody,

I'm running postgres 9.1 and having disk space problems.
My application captures information 24x7 and stores it into the 
database. This includes several bytea and can be ~5M entries a day, so 
the size can be an issue after several days.
My application also cleans up entries older than 10 days; it does this 
every night and the delete operations are happening successfully. I 
cannot truncate the tables as they contain both stale and active data.
The database is able to store all the entries for ~15 days without 
problems, but for some reason the deletion of old entries is not freeing 
up the space (or the insertion of new entries is not reusing the space 
used by old entries) because after running the application for ~20days I 
run out of space on disk.
I've been reading on this forum and the postgres documentation; vacuum 
full is not recommended and apparently vacuum should be all I need. I'm 
using autovacuum but this doesn't seem to be solving the problem 
(perhaps because while vacuum is running the application keeps inserting 
entries 24x7?)


Just to clarify, I don't really care if the disk space is returned to 
the OS; what I need though is to be sure that I can keep a window of 10 
days of records (assuming of course my HD is big enough for those 10 
days, which seems to be the case).


Some questions:
* Although not being generally recommended, I've read that vacuum full 
is sometimes the only choice when large deletions are in place in order 
to maintain the database. Is this the case here?
* Should I try to have a maintenance window and stop all 
inserts/writes while vacuum is running? If so, is there any way to 
configure at what time vacuum will be executed by autovacuum or should I 
rely on cron-type jobs for this? and is there any way to prevent 
external connections at certain times of day to make sure inserts/writes 
don't happen while vacuum is going, or again I should use cron-type jobs 
for this?
* Any other suggestions/ideas to troubleshoot this or any pointers to 
further documentation?


thank you,

Horaci

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL Magazine #01 is out !

2012-05-10 Thread Enrico Pirozzi
Great Job!!! ;)

2012/5/9 Simon Riggs si...@2ndquadrant.com:
 On 9 May 2012 13:02, PostgreSQL Magazine cont...@pgmag.org wrote:
 Dear PostgreSQL users,

 I am very pleased to announce the release of the first issue of
 PostgreSQL Magazine.

 This issue #01 is brought to you thanks to the collective work of
 dozen of people. Writers, Editors, Reviewers. Kudos to them all !

 Here's a quick view of the ToC :

  - PostgreSQL 9.1 : 10 awesome new features
  - NoSQL : The Key Value store everyone ignored
  - Interview : Stefan Kaltenbrunner
  - Opinion : Funding PostgreSQL Features
  - Waiting for 9.2 : Cascading Streaming Replication
  - Tips  Tricks : PostgreSQL in Mac OS X Lion

 The magazine is available online and on paper. You can either :

  * Read it Online: http://pgmag.org/01/read
  * Buy the Print Edition: http://pgmag.org/01/buy
  * or Download the PDF: http://pgmag.org/01/download

 The magazine is currently available only in US Letter and A4 format.

 Finally, I would like to thank our benefactors… Fotolia.com has offered
 us a free subscription plan to access their stock photo database. We
 also received fundings from PostgreSQL Europe (PGEU) and Software in the
 Public Interest (SPI). Thanks a lot to them !

 Well done. This is very good.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
That's one small step for man; one giant leap for mankind

Enrico Pirozzi
Tel.  +39 0861 1855771
Mob.+39 328 4164437
Fax  +39 0861 1850310
www.enricopirozzi.info
i...@enricopirozzi.info
Skype sscotty71
Gtalk sscott...@gmail.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Prepared statements performance

2012-05-10 Thread Alban Hertroys
On 10 May 2012 11:30, Daniel McGreal daniel.mcgr...@redbite.com wrote:
 I put the multi-value inserts in as I was
 curious as to why prepared statements would be slower given they only plan
 the query once (as also does the multi-value insert, I assume).

That's a common misconception.

The reason that prepared statements are often slower, is exactly
_because_ they only plan the query once. Because the query-plan is
stored when the query gets prepared, the same plan gets used for every
combination of query parameters, so it has to be a fairly generic
query plan.

OTOH, the multi-value insert knows exactly what combinations of
parameters will be used in the query and the query planner can
optimise the query for those parameters. It wouldn't surprise me if it
would re-evaluate plan branch choices based on which row of values is
currently being inserted.

I think it's safe to say that prepared statements are only efficient
when you're dealing with repeated complicated queries, where preparing
the query plan takes a significant amount of time. It'll also shave
some time off queries that are inefficient regardless of how you
execute them (for example, because the query always needs to perform a
sequential scan).
They'll also be faster on database servers with a slower query planner
than the one in Postgres.

In most (all?) other cases, executing the query directly is probably faster.

Of course there are other benefits to prepared statements, such as a
natural immunity to SQL injection.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Prepared statements performance

2012-05-10 Thread Radosław Smogura

On Thu, 10 May 2012 13:52:29 +0200, Alban Hertroys wrote:
On 10 May 2012 11:30, Daniel McGreal daniel.mcgr...@redbite.com 
wrote:

I put the multi-value inserts in as I was
curious as to why prepared statements would be slower given they 
only plan

the query once (as also does the multi-value insert, I assume).


That's a common misconception.

The reason that prepared statements are often slower, is exactly
_because_ they only plan the query once. Because the query-plan is
stored when the query gets prepared, the same plan gets used for 
every

combination of query parameters, so it has to be a fairly generic
query plan.

OTOH, the multi-value insert knows exactly what combinations of
parameters will be used in the query and the query planner can
optimise the query for those parameters. It wouldn't surprise me if 
it

would re-evaluate plan branch choices based on which row of values is
currently being inserted.

I think it's safe to say that prepared statements are only efficient
when you're dealing with repeated complicated queries, where 
preparing

the query plan takes a significant amount of time. It'll also shave
some time off queries that are inefficient regardless of how you
execute them (for example, because the query always needs to perform 
a

sequential scan).
They'll also be faster on database servers with a slower query 
planner

than the one in Postgres.

In most (all?) other cases, executing the query directly is probably 
faster.


Of course there are other benefits to prepared statements, such as a
natural immunity to SQL injection.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

May I ask what kind of planning may occur during insert?
Regards,
Radek

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Prepared statements performance

2012-05-10 Thread Merlin Moncure
On Thu, May 10, 2012 at 6:52 AM, Alban Hertroys haram...@gmail.com wrote:
 On 10 May 2012 11:30, Daniel McGreal daniel.mcgr...@redbite.com wrote:
 I put the multi-value inserts in as I was
 curious as to why prepared statements would be slower given they only plan
 the query once (as also does the multi-value insert, I assume).

 That's a common misconception.

 The reason that prepared statements are often slower, is exactly
 _because_ they only plan the query once. Because the query-plan is
 stored when the query gets prepared, the same plan gets used for every
 combination of query parameters, so it has to be a fairly generic
 query plan.

 OTOH, the multi-value insert knows exactly what combinations of
 parameters will be used in the query and the query planner can
 optimise the query for those parameters. It wouldn't surprise me if it
 would re-evaluate plan branch choices based on which row of values is
 currently being inserted.

 I think it's safe to say that prepared statements are only efficient
 when you're dealing with repeated complicated queries, where preparing
 the query plan takes a significant amount of time. It'll also shave
 some time off queries that are inefficient regardless of how you
 execute them (for example, because the query always needs to perform a
 sequential scan).
 They'll also be faster on database servers with a slower query planner
 than the one in Postgres.

 In most (all?) other cases, executing the query directly is probably faster.

 Of course there are other benefits to prepared statements, such as a
 natural immunity to SQL injection.

That can be often true, but for simple inserts there is no plan to get
wrong.  Prepared statements can knock about 30-50% of statement
latency off in such cases if you're not i/o bound.

Definitely though prepared statements are headache though and I rarely use them.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Prepared statements performance

2012-05-10 Thread Alban Hertroys
On 10 May 2012 15:05, Radosław Smogura rsmog...@softperience.eu wrote:
 May I ask what kind of planning may occur during insert?

Well, for example, if there's a unique constraint on the table then
the database will have to check that the newly inserted values don't
conflict with values that are already in the table. It needs to plan
an efficient strategy for that, which depends on the values being
inserted.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Prepared statements performance

2012-05-10 Thread Tom Lane
Alban Hertroys haram...@gmail.com writes:
 On 10 May 2012 15:05, Radosław Smogura rsmog...@softperience.eu wrote:
 May I ask what kind of planning may occur during insert?

 Well, for example, if there's a unique constraint on the table then
 the database will have to check that the newly inserted values don't
 conflict with values that are already in the table. It needs to plan
 an efficient strategy for that, which depends on the values being
 inserted.

There is no planning associated with checking unique constraints; that's
just a matter for the index mechanisms.

I think the real point here is that a simple INSERT/VALUES has such a
trivial plan that there is hardly any gain to be had by avoiding the
planning stage.  Then the other overhead of a prepared statement
(looking up the saved plan, checking it's not stale, etc) outweighs
that.  Or at least it could.  3x slower seems a bit fishy; I wonder
whether there's some client-side inefficiency involved in that.
Doing performance measurements with pgAdmin seems pretty questionable
in the first place ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Prepared statements performance

2012-05-10 Thread Daniel McGreal
Doing the same tests from psql gives:

   1. ~2.5 seconds for INSERT/VALUES
   2. ~10 seconds for prepared statement executes
   3. ~15 seconds for multiple INSERTs

Dan.

On Thu, May 10, 2012 at 3:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Alban Hertroys haram...@gmail.com writes:
  On 10 May 2012 15:05, Radosław Smogura rsmog...@softperience.eu wrote:
  May I ask what kind of planning may occur during insert?

  Well, for example, if there's a unique constraint on the table then
  the database will have to check that the newly inserted values don't
  conflict with values that are already in the table. It needs to plan
  an efficient strategy for that, which depends on the values being
  inserted.

 There is no planning associated with checking unique constraints; that's
 just a matter for the index mechanisms.

 I think the real point here is that a simple INSERT/VALUES has such a
 trivial plan that there is hardly any gain to be had by avoiding the
 planning stage.  Then the other overhead of a prepared statement
 (looking up the saved plan, checking it's not stale, etc) outweighs
 that.  Or at least it could.  3x slower seems a bit fishy; I wonder
 whether there's some client-side inefficiency involved in that.
 Doing performance measurements with pgAdmin seems pretty questionable
 in the first place ...

regards, tom lane



Re: [GENERAL] vacuum, vacuum full and problems releasing disk space

2012-05-10 Thread Albe Laurenz
Horaci Macias wrote:
 I'm running postgres 9.1 and having disk space problems.
 My application captures information 24x7 and stores it into the
 database. This includes several bytea and can be ~5M entries a day, so
 the size can be an issue after several days.
 My application also cleans up entries older than 10 days; it does this
 every night and the delete operations are happening successfully. I
 cannot truncate the tables as they contain both stale and active data.
 The database is able to store all the entries for ~15 days without
 problems, but for some reason the deletion of old entries is not
freeing
 up the space (or the insertion of new entries is not reusing the space
 used by old entries) because after running the application for ~20days
I
 run out of space on disk.
 I've been reading on this forum and the postgres documentation; vacuum
 full is not recommended and apparently vacuum should be all I need.
I'm
 using autovacuum but this doesn't seem to be solving the problem
 (perhaps because while vacuum is running the application keeps
inserting
 entries 24x7?)

That is possible.

You can check the last_autovacuum field in pg_stat_all_tables to see
when the table was last vacuumed.
Do you have any long running transactions? Either long statements or
sessions that are idle in connection. Those can also block vacuum.
Do you use two phase commit?

I would try to make autovacuum more aggressive (see the documentation)
and see if that helps.

 Just to clarify, I don't really care if the disk space is returned to
 the OS; what I need though is to be sure that I can keep a window of
10
 days of records (assuming of course my HD is big enough for those 10
 days, which seems to be the case).
 
 Some questions:
 * Although not being generally recommended, I've read that vacuum full
 is sometimes the only choice when large deletions are in place in
order
 to maintain the database. Is this the case here?

You need VACUUM FULL once the bloat of the table is unacceptable,
i.e. if you don't want to leave the empty space in the tables but
want to return it to the operating system.

 * Should I try to have a maintenance window and stop all
 inserts/writes while vacuum is running? If so, is there any way to
 configure at what time vacuum will be executed by autovacuum or should
I
 rely on cron-type jobs for this? and is there any way to prevent
 external connections at certain times of day to make sure
inserts/writes
 don't happen while vacuum is going, or again I should use cron-type
jobs
 for this?

If you cannot keep up using autovacuum, that will be the other option.

If you want to run VACUUM, say, once daily, cron is a good way to do it.
If it turns out to be necessary, you can block new connections with
pg_hba.conf or by revoking connect permissions on the database.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum, vacuum full and problems releasing disk space

2012-05-10 Thread Horaci Macias
thanks Laurenz, I'll keep an eye on pg_stat_all_tables. I checked just 
now and apparently autovacuum has been ran, but this is after my recent 
upgrade to 9.1 from 8.3 (I upgraded hoping this problem would go away 
and so far the application hasn't been running for long enough for me to 
confirm whether it's gone or not).
I don't see any clients idle in connection and I don't think I use long 
running transactions. I may be using a lot of short-lived transactions 24x7.


If pg_stat_all_tables has a date on last_autovacuum, does this mean all 
deleted tuples should be marked for reuse or is there any scenario where 
autovacuum runs but some deleted tuples are not marked so the space is 
being reused at some point? In other words, if last_autovacuum has a 
recent date, can I forget about checking idle clients or long/short 
running transactions or can this still be a problem even if 
last_autovacuum shows autovacuum ran?


thanks,

H


On 10/05/12 16:50, Albe Laurenz wrote:

Horaci Macias wrote:

I'm running postgres 9.1 and having disk space problems.
My application captures information 24x7 and stores it into the
database. This includes several bytea and can be ~5M entries a day, so
the size can be an issue after several days.
My application also cleans up entries older than 10 days; it does this
every night and the delete operations are happening successfully. I
cannot truncate the tables as they contain both stale and active data.
The database is able to store all the entries for ~15 days without
problems, but for some reason the deletion of old entries is not

freeing

up the space (or the insertion of new entries is not reusing the space
used by old entries) because after running the application for ~20days

I

run out of space on disk.
I've been reading on this forum and the postgres documentation; vacuum
full is not recommended and apparently vacuum should be all I need.

I'm

using autovacuum but this doesn't seem to be solving the problem
(perhaps because while vacuum is running the application keeps

inserting

entries 24x7?)

That is possible.

You can check the last_autovacuum field in pg_stat_all_tables to see
when the table was last vacuumed.
Do you have any long running transactions? Either long statements or
sessions that are idle in connection. Those can also block vacuum.
Do you use two phase commit?

I would try to make autovacuum more aggressive (see the documentation)
and see if that helps.


Just to clarify, I don't really care if the disk space is returned to
the OS; what I need though is to be sure that I can keep a window of

10

days of records (assuming of course my HD is big enough for those 10
days, which seems to be the case).

Some questions:
* Although not being generally recommended, I've read that vacuum full
is sometimes the only choice when large deletions are in place in

order

to maintain the database. Is this the case here?

You need VACUUM FULL once the bloat of the table is unacceptable,
i.e. if you don't want to leave the empty space in the tables but
want to return it to the operating system.


* Should I try to have a maintenance window and stop all
inserts/writes while vacuum is running? If so, is there any way to
configure at what time vacuum will be executed by autovacuum or should

I

rely on cron-type jobs for this? and is there any way to prevent
external connections at certain times of day to make sure

inserts/writes

don't happen while vacuum is going, or again I should use cron-type

jobs

for this?

If you cannot keep up using autovacuum, that will be the other option.

If you want to run VACUUM, say, once daily, cron is a good way to do it.
If it turns out to be necessary, you can block new connections with
pg_hba.conf or by revoking connect permissions on the database.

Yours,
Laurenz Albe


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum, vacuum full and problems releasing disk space

2012-05-10 Thread Horaci Macias
thanks Guy. I had thought about using per-day tables (although I didn't 
know about child tables) but my problem is that some entries are related 
and they can span several minutes, so my worry is that I end up not 
finding all the right entries when I search for entries that happen 
close to the end of day / start of day time.

Anyway, worth a thought for sure so thanks.

H

On 10/05/12 16:42, Guy Helmer wrote:

On May 10, 2012, at 4:31 AM, Horaci Macias wrote:


Hi everybody,

I'm running postgres 9.1 and having disk space problems.
My application captures information 24x7 and stores it into the database. This 
includes several bytea and can be ~5M entries a day, so the size can be an 
issue after several days.
My application also cleans up entries older than 10 days; it does this every 
night and the delete operations are happening successfully. I cannot truncate 
the tables as they contain both stale and active data.
The database is able to store all the entries for ~15 days without problems, 
but for some reason the deletion of old entries is not freeing up the space (or 
the insertion of new entries is not reusing the space used by old entries) 
because after running the application for ~20days I run out of space on disk.
I've been reading on this forum and the postgres documentation; vacuum full is 
not recommended and apparently vacuum should be all I need. I'm using 
autovacuum but this doesn't seem to be solving the problem (perhaps because 
while vacuum is running the application keeps inserting entries 24x7?)

Just to clarify, I don't really care if the disk space is returned to the OS; 
what I need though is to be sure that I can keep a window of 10 days of records 
(assuming of course my HD is big enough for those 10 days, which seems to be 
the case).

Some questions:
* Although not being generally recommended, I've read that vacuum full is 
sometimes the only choice when large deletions are in place in order to 
maintain the database. Is this the case here?
* Should I try to have a maintenance window and stop all inserts/writes while 
vacuum is running? If so, is there any way to configure at what time vacuum will be 
executed by autovacuum or should I rely on cron-type jobs for this? and is there any way 
to prevent external connections at certain times of day to make sure inserts/writes don't 
happen while vacuum is going, or again I should use cron-type jobs for this?
* Any other suggestions/ideas to troubleshoot this or any pointers to further 
documentation?

I would expect a plain VACUUM to make unused space available for re-use -- not 
sure why it would not be helping.

Since Postgresql can have tables that are children of tables, a neat trick in 
this situation is to create per-day child tables and insert the new data 
directly into the appropriate per-day table; with this approach, deleting old 
data is accomplished by simply dropping outdated tables and thereby avoiding 
VACUUM completely. With constraints on the child tables, Postgresql can 
optimize a query on the parent table by knowing what child table has data from 
what day and will only check child tables that would have data for a given 
query.

For example, I have a table called data_tbl, and child per-day tables like 
data_tbl_ts_20120509. The child table data_tbl_ts_20120509 has constraint:
 data_tbl_20120509_ts_check CHECK (ts= '2012-05-08 19:00:00-05'::timestamp with 
time zone AND ts  '2012-05-09 19:00:00-05'::timestamp with time zone)
(each child table has data from 00:00:00 GMT to 23:59:59 GMT for that day)

Each day on my systems, a cron job creates the child table, constraints on the 
child table, and index(es) for the child table to hold the next day's data, and 
another cron job drops any outdated child tables. I believe the command to 
create the example child table above would have been:

CREATE TABLE data_tbl_ts_20120509 (CHECK (ts= '2012-05-08 19:00:00-05'::timestamp 
with time zone AND ts  '2012-05-09 19:00:00-05'::timestamp with time zone)) 
INHERITS (data_tbl)

(followed by any necessary GRANT commands to provide access to the new child 
table)

Hope this helps,
Guy




This message has been scanned by ComplianceSafe, powered by Palisade's 
PacketSure.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Hot strandby fails to restart after pg_subtrans corruption (?)

2012-05-10 Thread Paul Guyot
Hello,

After a hardware reboot, a hot standby server fails to start.
The log mentions pg_subtrans files. While the machine got a hard reboot, 
underlying filesystem is ZFS which did not detect or report any corruption.

For the first restart after the reboot, the server (then 9.1.0) complained that 
a file did not exist.
May 10 14:16:49 citrinus postgres[1914]: [6-1] LOG:  file pg_subtrans/B7E5 
doesn't exist, reading as zeroes

Considering this may be related to this thread :
http://archives.postgresql.org/pgsql-general/2011-12/msg00291.php

postgres was cleanly restarted during the recovery :
May 10 14:22:30 citrinus postgres[1489]: [1-1] LOG:  received fast shutdown 
request
May 10 14:22:30 citrinus postgres[1916]: [1-1] LOG:  shutting down
May 10 14:22:31 citrinus postgres[1916]: [2-1] LOG:  database system is shut 
down

and then updated to 9.1.3.

Now, the server just fails later on the same file.


May 10 15:48:05 citrinus postgres[39203]: [1-1] LOG:  database system was 
interrupted while in recovery at log time 2012-04-25 07:53:33 UTC
May 10 15:48:05 citrinus postgres[39203]: [1-2] HINT:  If this has occurred 
more than once some data might be corrupted and you might need to choose an 
earlier recovery target.
May 10 15:48:05 citrinus postgres[39203]: [2-1] LOG:  entering standby mode
May 10 15:48:07 citrinus postgres[39203]: [3-1] LOG:  restored log file 
000106D60017 from archive
May 10 15:48:10 citrinus postgres[39203]: [4-1] LOG:  restored log file 
000106D50068 from archive
May 10 15:48:10 citrinus postgres[39203]: [5-1] LOG:  redo starts at 
6D5/684B13E8
May 10 15:49:47 citrinus postgres[39203]: [6-1] LOG:  restored log file 
000106D50069 from archive
May 10 15:50:37 citrinus postgres[39203]: [7-1] LOG:  restored log file 
000106D5006A from archive
May 10 15:51:19 citrinus postgres[39203]: [8-1] LOG:  restored log file 
000106D5006B from archive
May 10 15:52:05 citrinus postgres[39203]: [9-1] LOG:  restored log file 
000106D5006C from archive
May 10 15:53:06 citrinus postgres[39203]: [10-1] LOG:  restored log file 
000106D5006D from archive
May 10 15:53:24 citrinus postgres[39203]: [11-1] FATAL:  could not access 
status of transaction 3085299721
May 10 15:53:24 citrinus postgres[39203]: [11-2] DETAIL:  Could not read from 
file pg_subtrans/B7E5 at offset 245760: No error: 0.
May 10 15:53:24 citrinus postgres[39203]: [11-3] CONTEXT:  xlog redo xid 
assignment xtop 3085293107: subxacts: 3085299449 3085299451 3085299462 
3085299465 3085299469 3085299473 3085299478 3085299485 3085299487 3085299489 
3085299491 3085299494 3085299497 3085299498 3085299543 3085299545 3085299547 
3085299549 3085299551 3085299553 3085299555 3085299557 3085299558 3085299559 
3085299560 3085299561 3085299562 3085299563 3085299564 3085299567 3085299568 
3085299569 3085299570 3085299571 3085299573 3085299574 3085299576 3085299578 
3085299580 3085299582 3085299594 3085299596 3085299607 3085299608 3085299611 
3085299635 3085299646 3085299648 3085299650 3085299652 3085299654 3085299656 
3085299658 3085299660 3085299663 3085299665 3085299667 3085299669 3085299671 
3085299673 3085299681 3085299683 3085299687 3085299721
May 10 15:53:24 citrinus postgres[39202]: [1-1] LOG:  startup process (PID 
39203) exited with exit code 1
May 10 15:53:24 citrinus postgres[39202]: [2-1] LOG:  terminating any other 
active server processes


File B7E5 has exactly 245760 bytes.

If I remove the B7E5 file as well as file B7F6 which was created during 
recovery, and retry, it fails later on like this:


May 10 16:15:53 citrinus postgres[41142]: [75-1] LOG:  file pg_subtrans/B7E6 
doesn't exist, reading as zeroes
May 10 16:15:53 citrinus postgres[41142]: [75-2] CONTEXT:  xlog redo xid 
assignment xtop 3085346439: subxacts: 3085364307 3085364352 3085364355 
3085364358 3085364363 3085364404 3085364407 3085364409 3085364496 3085364518 
3085364532 3085364568 3085364571 3085364611 3085364625 3085364660 3085364663 
3085364668 3085364671 3085364715 3085364719 3085364727 3085364738 3085364740 
3085364768 3085364770 3085364773 3085364792 3085364800 3085364803 3085364806 
3085364817 3085364819 3085364829 3085364830 3085364832 3085364854 3085364874 
3085364877 3085364885 3085364886 3085364889 3085364890 3085364891 3085364911 
3085364914 3085364918 3085364920 3085364964 3085364988 3085364998 3085365001 
3085365013 3085365014 3085365015 3085365016 3085365068 3085365069 3085365071 
3085365106 3085365183 3085365284 3085365287 3085365288
May 10 16:16:10 citrinus postgres[41142]: [76-1] LOG:  restored log file 
000106D50077 from archive
May 10 16:16:13 citrinus postgres[41142]: [77-1] FATAL:  could not access 
status of transaction 3085367299
May 10 16:16:13 citrinus postgres[41142]: [77-2] DETAIL:  Could not read from 
file pg_subtrans/B7E6 at offset 253952: No error: 0.
May 10 16:16:13 citrinus postgres[41142]: [77-3] CONTEXT:  xlog redo xid 
assignment 

[GENERAL] Custom Domain; migration from 8.4 to 9.1 and COLLATE

2012-05-10 Thread Evan Carroll
This is a cross post from: http://dba.stackexchange.com/q/17609/2639

Just recently I upgraded from Postgresql 8.4.11 to 9.1.3. I used
pg_dump in the process. Now I'm getting an error:

 ERROR:  no collation was derived for column vin with collatable type citext
 HINT:  Use the COLLATE clause to set the collation explicitly.

This is a continuation of the [problem that I had
earlier](http://dba.stackexchange.com/q/17604/2639). It seems the fix
suggested in the Release Notes did not catch [custom
domains](http://www.postgresql.org/docs/9.1/interactive/sql-createdomain.html).
It seems as if the [CREATE
DOMAIN](http://www.postgresql.org/docs/8.3/interactive/sql-createdomain.html)
statement in 8.4 didn't even support the `COLLATE` clause.

This is how I created the `vin` type,

CREATE DOMAIN inventory.valid_vin AS citext
  CHECK ( inventory.valid_vin( VALUE ) );

How do I best resolve this error?

-- 
Evan Carroll - m...@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Multiple COPY statements

2012-05-10 Thread Lee Hachadoorian
Does anyone have experience or advice on how to efficiently issue a
large number of COPY statements? The data (US Census) comes in  100
segments (each will be copied to its own database tables) for each
state (51), for a total of  5000 text files. I can generate the COPY
statements with a script.

The two specific question I can think of (but I'm sure there's more
that I'm not thinking of) are:

1) COPY is fastest when used within the same transaction as an
earlier CREATE TABLE or TRUNCATE command. In such cases no WAL needs
to be written, because in case of an error, the files containing the
newly loaded data will be removed anyway. Would I be able to take
advantage of this if I:

BEGIN;
TRUNCATE import_table;
COPY import_table FROM 'file1';
COPY import_table FROM 'file2';
...
COPY import_table FROM 'file51';
END;

2) Is there a performance hit to doing a COPY to more than one table
in the same transaction?

Any other advice will be appreciated.

Regards,
--Lee

-- 
Lee Hachadoorian
PhD, Earth  Environmental Sciences (Geography)
Research Associate, CUNY Center for Urban Research
http://freecity.commons.gc.cuny.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum, vacuum full and problems releasing disk space

2012-05-10 Thread Guy Helmer
BTW, it's not a problem to query data across multiple days as long as you query 
from the parent table -- Postgresql will use the child table constraints to 
search all the child tables that could contain data.

Guy

On May 10, 2012, at 11:01 AM, Horaci Macias wrote:

 thanks Guy. I had thought about using per-day tables (although I didn't know 
 about child tables) but my problem is that some entries are related and they 
 can span several minutes, so my worry is that I end up not finding all the 
 right entries when I search for entries that happen close to the end of day / 
 start of day time.
 Anyway, worth a thought for sure so thanks.
 
 H
 
 On 10/05/12 16:42, Guy Helmer wrote:
 On May 10, 2012, at 4:31 AM, Horaci Macias wrote:
 
 Hi everybody,
 
 I'm running postgres 9.1 and having disk space problems.
 My application captures information 24x7 and stores it into the database. 
 This includes several bytea and can be ~5M entries a day, so the size can 
 be an issue after several days.
 My application also cleans up entries older than 10 days; it does this 
 every night and the delete operations are happening successfully. I cannot 
 truncate the tables as they contain both stale and active data.
 The database is able to store all the entries for ~15 days without 
 problems, but for some reason the deletion of old entries is not freeing up 
 the space (or the insertion of new entries is not reusing the space used by 
 old entries) because after running the application for ~20days I run out of 
 space on disk.
 I've been reading on this forum and the postgres documentation; vacuum full 
 is not recommended and apparently vacuum should be all I need. I'm using 
 autovacuum but this doesn't seem to be solving the problem (perhaps because 
 while vacuum is running the application keeps inserting entries 24x7?)
 
 Just to clarify, I don't really care if the disk space is returned to the 
 OS; what I need though is to be sure that I can keep a window of 10 days of 
 records (assuming of course my HD is big enough for those 10 days, which 
 seems to be the case).
 
 Some questions:
 * Although not being generally recommended, I've read that vacuum full is 
 sometimes the only choice when large deletions are in place in order to 
 maintain the database. Is this the case here?
 * Should I try to have a maintenance window and stop all inserts/writes 
 while vacuum is running? If so, is there any way to configure at what time 
 vacuum will be executed by autovacuum or should I rely on cron-type jobs 
 for this? and is there any way to prevent external connections at certain 
 times of day to make sure inserts/writes don't happen while vacuum is 
 going, or again I should use cron-type jobs for this?
 * Any other suggestions/ideas to troubleshoot this or any pointers to 
 further documentation?
 I would expect a plain VACUUM to make unused space available for re-use -- 
 not sure why it would not be helping.
 
 Since Postgresql can have tables that are children of tables, a neat trick 
 in this situation is to create per-day child tables and insert the new data 
 directly into the appropriate per-day table; with this approach, deleting 
 old data is accomplished by simply dropping outdated tables and thereby 
 avoiding VACUUM completely. With constraints on the child tables, Postgresql 
 can optimize a query on the parent table by knowing what child table has 
 data from what day and will only check child tables that would have data for 
 a given query.
 
 For example, I have a table called data_tbl, and child per-day tables like 
 data_tbl_ts_20120509. The child table data_tbl_ts_20120509 has constraint:
 data_tbl_20120509_ts_check CHECK (ts= '2012-05-08 
 19:00:00-05'::timestamp with time zone AND ts  '2012-05-09 
 19:00:00-05'::timestamp with time zone)
 (each child table has data from 00:00:00 GMT to 23:59:59 GMT for that day)
 
 Each day on my systems, a cron job creates the child table, constraints on 
 the child table, and index(es) for the child table to hold the next day's 
 data, and another cron job drops any outdated child tables. I believe the 
 command to create the example child table above would have been:
 
 CREATE TABLE data_tbl_ts_20120509 (CHECK (ts= '2012-05-08 
 19:00:00-05'::timestamp with time zone AND ts  '2012-05-09 
 19:00:00-05'::timestamp with time zone)) INHERITS (data_tbl)
 
 (followed by any necessary GRANT commands to provide access to the new child 
 table)
 
 Hope this helps,
 Guy
 
 



This message has been scanned by ComplianceSafe, powered by Palisade's 
PacketSure.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum, vacuum full and problems releasing disk space

2012-05-10 Thread Guy Helmer
On May 10, 2012, at 4:31 AM, Horaci Macias wrote:

 Hi everybody,
 
 I'm running postgres 9.1 and having disk space problems.
 My application captures information 24x7 and stores it into the database. 
 This includes several bytea and can be ~5M entries a day, so the size can be 
 an issue after several days.
 My application also cleans up entries older than 10 days; it does this every 
 night and the delete operations are happening successfully. I cannot truncate 
 the tables as they contain both stale and active data.
 The database is able to store all the entries for ~15 days without problems, 
 but for some reason the deletion of old entries is not freeing up the space 
 (or the insertion of new entries is not reusing the space used by old 
 entries) because after running the application for ~20days I run out of space 
 on disk.
 I've been reading on this forum and the postgres documentation; vacuum full 
 is not recommended and apparently vacuum should be all I need. I'm using 
 autovacuum but this doesn't seem to be solving the problem (perhaps because 
 while vacuum is running the application keeps inserting entries 24x7?)
 
 Just to clarify, I don't really care if the disk space is returned to the OS; 
 what I need though is to be sure that I can keep a window of 10 days of 
 records (assuming of course my HD is big enough for those 10 days, which 
 seems to be the case).
 
 Some questions:
 * Although not being generally recommended, I've read that vacuum full is 
 sometimes the only choice when large deletions are in place in order to 
 maintain the database. Is this the case here?
 * Should I try to have a maintenance window and stop all inserts/writes 
 while vacuum is running? If so, is there any way to configure at what time 
 vacuum will be executed by autovacuum or should I rely on cron-type jobs for 
 this? and is there any way to prevent external connections at certain times 
 of day to make sure inserts/writes don't happen while vacuum is going, or 
 again I should use cron-type jobs for this?
 * Any other suggestions/ideas to troubleshoot this or any pointers to further 
 documentation?

I would expect a plain VACUUM to make unused space available for re-use -- not 
sure why it would not be helping.

Since Postgresql can have tables that are children of tables, a neat trick in 
this situation is to create per-day child tables and insert the new data 
directly into the appropriate per-day table; with this approach, deleting old 
data is accomplished by simply dropping outdated tables and thereby avoiding 
VACUUM completely. With constraints on the child tables, Postgresql can 
optimize a query on the parent table by knowing what child table has data from 
what day and will only check child tables that would have data for a given 
query.

For example, I have a table called data_tbl, and child per-day tables like 
data_tbl_ts_20120509. The child table data_tbl_ts_20120509 has constraint:
data_tbl_20120509_ts_check CHECK (ts = '2012-05-08 
19:00:00-05'::timestamp with time zone AND ts  '2012-05-09 
19:00:00-05'::timestamp with time zone)
(each child table has data from 00:00:00 GMT to 23:59:59 GMT for that day)

Each day on my systems, a cron job creates the child table, constraints on the 
child table, and index(es) for the child table to hold the next day's data, and 
another cron job drops any outdated child tables. I believe the command to 
create the example child table above would have been:

CREATE TABLE data_tbl_ts_20120509 (CHECK (ts = '2012-05-08 
19:00:00-05'::timestamp with time zone AND ts  '2012-05-09 
19:00:00-05'::timestamp with time zone)) INHERITS (data_tbl)

(followed by any necessary GRANT commands to provide access to the new child 
table)

Hope this helps,
Guy




This message has been scanned by ComplianceSafe, powered by Palisade's 
PacketSure.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Multiple COPY statements

2012-05-10 Thread Andy Colson

On 5/10/2012 1:10 PM, Lee Hachadoorian wrote:

Does anyone have experience or advice on how to efficiently issue a
large number of COPY statements? The data (US Census) comes in  100
segments (each will be copied to its own database tables) for each
state (51), for a total of  5000 text files. I can generate the COPY
statements with a script.

The two specific question I can think of (but I'm sure there's more
that I'm not thinking of) are:

1) COPY is fastest when used within the same transaction as an
earlier CREATE TABLE or TRUNCATE command. In such cases no WAL needs
to be written, because in case of an error, the files containing the
newly loaded data will be removed anyway. Would I be able to take
advantage of this if I:

BEGIN;
TRUNCATE import_table;
COPY import_table FROM 'file1';
COPY import_table FROM 'file2';
...
COPY import_table FROM 'file51';
END;


Yes, I believe so.



2) Is there a performance hit to doing a COPY to more than one table
in the same transaction?


No, I don't think so.  I assume you are the only user hitting the 
import_table, so holding one big transaction wont hurt anything.




Any other advice will be appreciated.


To really speed it up, you'd need to run multiple concurrent connections 
each doing COPY's.  Maybe up to the number of cores you have.  (of 
course you dont want each connection to fire off truncates, but 
concurrent should trump skip wall in terms of speed).


If import_table is just a temp holding stot you can look into temp 
and/or unlogged tables.



-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Multiple COPY statements

2012-05-10 Thread Lee Hachadoorian
On Thu, May 10, 2012 at 2:42 PM, Andy Colson a...@squeakycode.net wrote:
 On 5/10/2012 1:10 PM, Lee Hachadoorian wrote:

 2) Is there a performance hit to doing a COPY to more than one table
 in the same transaction?


 No, I don't think so.  I assume you are the only user hitting the
 import_table, so holding one big transaction wont hurt anything.

Actually what I mean is that there are multiple import tables,
import_table1 ... import_table100. But it is true that I would be the
only user hitting the import tables.

 Any other advice will be appreciated.


 To really speed it up, you'd need to run multiple concurrent connections
 each doing COPY's.  Maybe up to the number of cores you have.  (of course
 you dont want each connection to fire off truncates, but concurrent should
 trump skip wall in terms of speed).

 If import_table is just a temp holding stot you can look into temp and/or
 unlogged tables.

Yes, it is a staging table, data needs to be manipulated before
shunting to its desired destination. I think unlogged tables will be
helpful, and if I understand correctly then I wouldn't need to use the
BEGIN; TRUNCATE; COPY...; END; trick. And would unlogged + concurrent
connections work together?

--Lee

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Custom Domain; migration from 8.4 to 9.1 and COLLATE

2012-05-10 Thread Tom Lane
Evan Carroll m...@evancarroll.com writes:
 This is a cross post from: http://dba.stackexchange.com/q/17609/2639
 Just recently I upgraded from Postgresql 8.4.11 to 9.1.3. I used
 pg_dump in the process. Now I'm getting an error:

 ERROR:  no collation was derived for column vin with collatable type citext
 HINT:  Use the COLLATE clause to set the collation explicitly.

Could we see the complete context for this?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Sequence scan if OR Condition in where statement

2012-05-10 Thread Prashant Bharucha
Hello All

Could you please help me , index is not using if have OR condition in where 
statement ?

Always have sequence scan.

Thanks
Prashant


Re: [GENERAL] Multiple COPY statements

2012-05-10 Thread Andy Colson

On 5/10/2012 2:00 PM, Lee Hachadoorian wrote:

On Thu, May 10, 2012 at 2:42 PM, Andy Colsona...@squeakycode.net  wrote:

On 5/10/2012 1:10 PM, Lee Hachadoorian wrote:


2) Is there a performance hit to doing a COPY to more than one table
in the same transaction?



No, I don't think so.  I assume you are the only user hitting the
import_table, so holding one big transaction wont hurt anything.


Actually what I mean is that there are multiple import tables,
import_table1 ... import_table100. But it is true that I would be the
only user hitting the import tables.


Any other advice will be appreciated.



To really speed it up, you'd need to run multiple concurrent connections
each doing COPY's.  Maybe up to the number of cores you have.  (of course
you dont want each connection to fire off truncates, but concurrent should
trump skip wall in terms of speed).

If import_table is just a temp holding stot you can look into temp and/or
unlogged tables.


Yes, it is a staging table, data needs to be manipulated before
shunting to its desired destination. I think unlogged tables will be
helpful, and if I understand correctly then I wouldn't need to use the
BEGIN; TRUNCATE; COPY...; END; trick. And would unlogged + concurrent
connections work together?

--Lee




Oh yes.  concurrent + unlogged would be the best of all worlds.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Custom Domain; migration from 8.4 to 9.1 and COLLATE

2012-05-10 Thread Evan Carroll
 Could we see the complete context for this?

Sure.
dealermade=# CREATE OR REPLACE TEMP VIEW chrome_vinmatch_best_match AS
dealermade-# SELECT DISTINCT ON (v.vin) v.vin, vd.*
dealermade-# FROM inventory.view_in_stock_vehicles AS v
dealermade-# JOIN chrome_vinmatch.view_vin_decode AS vd
dealermade-# ON substring(v.vin FROM 0 FOR 9) =
substring(vd.pattern FROM 0 FOR 9)
dealermade-# AND v.vin LIKE vd.pattern
dealermade-# ORDER BY vin, length(pattern) DESC
dealermade-# ;
ERROR:  no collation was derived for column vin with collatable type citext
HINT:  Use the COLLATE clause to set the collation explicitly.

v.vin is the column with the custom DOMAIN.

-- 
Evan Carroll - m...@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Custom Domain; migration from 8.4 to 9.1 and COLLATE

2012-05-10 Thread Tom Lane
Evan Carroll m...@evancarroll.com writes:
 Could we see the complete context for this?
 Sure.
 dealermade=# CREATE OR REPLACE TEMP VIEW chrome_vinmatch_best_match AS
 dealermade-# SELECT DISTINCT ON (v.vin) v.vin, vd.*
 dealermade-# FROM inventory.view_in_stock_vehicles AS v
 dealermade-# JOIN chrome_vinmatch.view_vin_decode AS vd
 dealermade-# ON substring(v.vin FROM 0 FOR 9) =
 substring(vd.pattern FROM 0 FOR 9)
 dealermade-# AND v.vin LIKE vd.pattern
 dealermade-# ORDER BY vin, length(pattern) DESC
 dealermade-# ;
 ERROR:  no collation was derived for column vin with collatable type citext
 HINT:  Use the COLLATE clause to set the collation explicitly.

 v.vin is the column with the custom DOMAIN.

Hm, this example works fine for me in 9.1 branch tip, and I see no
relevant-looking patches in the commit logs since 9.1.3.  What I suspect
is that you are being bit by the failure of 9.1.0 or 9.1.1 to set
pg_type.typcollation for the citext data type, as per this item in the
9.1.2 release notes:

  Make contrib/citext's upgrade script fix collations of citext columns
  and indexes (Tom Lane)

  Existing citext columns and indexes aren't correctly marked as being
  of a collatable data type during pg_upgrade from a pre-9.1
  server. That leads to operations on them failing with errors such as
  could not determine which collation to use for string
  comparison. This change allows them to be fixed by the same script
  that upgrades the citext module into a proper 9.1 extension during
  CREATE EXTENSION citext FROM unpackaged.

  If you have a previously-upgraded database that is suffering from this
  problem, and you already ran the CREATE EXTENSION command, you can
  manually run (as superuser) the UPDATE commands found at the end of
  SHAREDIR/extension/citext--unpackaged--1.0.sql. (Run pg_config
  --sharedir if you're uncertain where SHAREDIR is.)


regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum, vacuum full and problems releasing disk space

2012-05-10 Thread Horaci Macias
oh I see what you mean; I definitely didn't know that, and makes this 
approach much more interesting now :)


thanks,

H

On 10/05/12 19:28, Guy Helmer wrote:

BTW, it's not a problem to query data across multiple days as long as you query 
from the parent table -- Postgresql will use the child table constraints to 
search all the child tables that could contain data.

Guy

On May 10, 2012, at 11:01 AM, Horaci Macias wrote:


thanks Guy. I had thought about using per-day tables (although I didn't know 
about child tables) but my problem is that some entries are related and they 
can span several minutes, so my worry is that I end up not finding all the 
right entries when I search for entries that happen close to the end of day / 
start of day time.
Anyway, worth a thought for sure so thanks.

H

On 10/05/12 16:42, Guy Helmer wrote:

On May 10, 2012, at 4:31 AM, Horaci Macias wrote:


Hi everybody,

I'm running postgres 9.1 and having disk space problems.
My application captures information 24x7 and stores it into the database. This 
includes several bytea and can be ~5M entries a day, so the size can be an 
issue after several days.
My application also cleans up entries older than 10 days; it does this every 
night and the delete operations are happening successfully. I cannot truncate 
the tables as they contain both stale and active data.
The database is able to store all the entries for ~15 days without problems, 
but for some reason the deletion of old entries is not freeing up the space (or 
the insertion of new entries is not reusing the space used by old entries) 
because after running the application for ~20days I run out of space on disk.
I've been reading on this forum and the postgres documentation; vacuum full is 
not recommended and apparently vacuum should be all I need. I'm using 
autovacuum but this doesn't seem to be solving the problem (perhaps because 
while vacuum is running the application keeps inserting entries 24x7?)

Just to clarify, I don't really care if the disk space is returned to the OS; 
what I need though is to be sure that I can keep a window of 10 days of records 
(assuming of course my HD is big enough for those 10 days, which seems to be 
the case).

Some questions:
* Although not being generally recommended, I've read that vacuum full is 
sometimes the only choice when large deletions are in place in order to 
maintain the database. Is this the case here?
* Should I try to have a maintenance window and stop all inserts/writes while 
vacuum is running? If so, is there any way to configure at what time vacuum will be 
executed by autovacuum or should I rely on cron-type jobs for this? and is there any way 
to prevent external connections at certain times of day to make sure inserts/writes don't 
happen while vacuum is going, or again I should use cron-type jobs for this?
* Any other suggestions/ideas to troubleshoot this or any pointers to further 
documentation?

I would expect a plain VACUUM to make unused space available for re-use -- not 
sure why it would not be helping.

Since Postgresql can have tables that are children of tables, a neat trick in 
this situation is to create per-day child tables and insert the new data 
directly into the appropriate per-day table; with this approach, deleting old 
data is accomplished by simply dropping outdated tables and thereby avoiding 
VACUUM completely. With constraints on the child tables, Postgresql can 
optimize a query on the parent table by knowing what child table has data from 
what day and will only check child tables that would have data for a given 
query.

For example, I have a table called data_tbl, and child per-day tables like 
data_tbl_ts_20120509. The child table data_tbl_ts_20120509 has constraint:
 data_tbl_20120509_ts_check CHECK (ts= '2012-05-08 19:00:00-05'::timestamp with 
time zone AND ts   '2012-05-09 19:00:00-05'::timestamp with time zone)
(each child table has data from 00:00:00 GMT to 23:59:59 GMT for that day)

Each day on my systems, a cron job creates the child table, constraints on the 
child table, and index(es) for the child table to hold the next day's data, and 
another cron job drops any outdated child tables. I believe the command to 
create the example child table above would have been:

CREATE TABLE data_tbl_ts_20120509 (CHECK (ts= '2012-05-08 19:00:00-05'::timestamp 
with time zone AND ts   '2012-05-09 19:00:00-05'::timestamp with time zone)) 
INHERITS (data_tbl)

(followed by any necessary GRANT commands to provide access to the new child 
table)

Hope this helps,
Guy





This message has been scanned by ComplianceSafe, powered by Palisade's 
PacketSure.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Custom Domain; migration from 8.4 to 9.1 and COLLATE

2012-05-10 Thread Evan Carroll
I did that, and I had to do that to get the error I pasted. I am hit
by that bug. I get this error after I fix that error. Here we my post
about the issue that you just mentioned:

* http://dba.stackexchange.com/q/17604/2639

BTW, The database version is 9.1.3. I'll try and work on a test that
generates this same error, not exactly sure why it is getting
generated though.

-- 
Evan Carroll - m...@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Custom Domain; migration from 8.4 to 9.1 and COLLATE

2012-05-10 Thread Tom Lane
Evan Carroll m...@evancarroll.com writes:
 BTW, The database version is 9.1.3. I'll try and work on a test that
 generates this same error, not exactly sure why it is getting
 generated though.

Also see whether you can reproduce the error in a fresh database.
I continue to think the problem is an incorrect collation value in
some system catalog entry; if that's it, nobody will be able to
reproduce it.  You might try checking to see that there are no
un-updated rows matching those fixup queries.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Custom Domain; migration from 8.4 to 9.1 and COLLATE

2012-05-10 Thread Evan Carroll
 Also see whether you can reproduce the error in a fresh database.
 I continue to think the problem is an incorrect collation value in
 some system catalog entry; if that's it, nobody will be able to
 reproduce it.  You might try checking to see that there are no
 un-updated rows matching those fixup queries.

I've been able to reproduce it in a fresh database. This is a PSQL
script 2a and 2b will fail.

\echo CREATING DOMAIN footype

CREATE DOMAIN footype AS citext;

\echo [1a] CREATING TABLE tablefoo_before (contains columns bar, type footype)

CREATE TABLE tablefoo_before ( bar footype );

\echo [1b] CREATING TEMP TABLE trash AS SELECT * FROM tablefoo_before

CREATE TEMP TABLE trash AS SELECT * FROM tablefoo_before ;

\echo RUNING PATCH TO UPDATE citext

UPDATE pg_catalog.pg_type SET typcollation = 100
WHERE oid = 'citext'::pg_catalog.regtype;

UPDATE pg_catalog.pg_attribute SET attcollation = 100
WHERE atttypid = 'citext'::pg_catalog.regtype;

\echo [2a] CREATING TABLE tablefoo_after (contains columns bar, type footype)

CREATE TABLE tablefoo_after ( bar footype );

\echo [2b] CREATING TEMP TABLE trash2 AS SELECT * FROM tablefoo_before

CREATE TEMP TABLE trash2 AS SELECT * FROM tablefoo_before ;


-- 
Evan Carroll - m...@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Custom Domain; migration from 8.4 to 9.1 and COLLATE

2012-05-10 Thread Evan Carroll
I think I can best get around this, if I issue a

 CREATE EXTENSION citext;

And, then load the database with the result of pg_dump. It seems to be
working, but there are some citext related statements from the dump
that fail because the stuff is already there in the DB when you issue
the CREATE EXTENSION.

-- 
Evan Carroll - m...@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Custom Domain; migration from 8.4 to 9.1 and COLLATE

2012-05-10 Thread Tom Lane
Evan Carroll m...@evancarroll.com writes:
 I've been able to reproduce it in a fresh database. This is a PSQL
 script 2a and 2b will fail.

Doesn't reproduce for me.  I guess one question is how you are loading
citext into the fresh database --- maybe you are inheriting a bum copy
from template1?

But anyway, looking at this example makes me realize that there is an
oversight in the recommended update script: it does not consider the
possibility that it needs to fix domains over citext.  Try doing
the updates with target type name equal to each such domain you have.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Custom Domain; migration from 8.4 to 9.1 and COLLATE

2012-05-10 Thread Evan Carroll
 Doesn't reproduce for me.  I guess one question is how you are loading
 citext into the fresh database --- maybe you are inheriting a bum copy
 from template1?

That doesn't seem likely but it's possible. How can I tell?

Also, here is a copy of the complete script -- including the citext
creation statements from the dump, with the patch, with the bugged
statements.

https://gist.github.com/2656537

I'll reload the database the otherway and try to update the domain
with the same update statements.

-- 
Evan Carroll - m...@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Custom Domain; migration from 8.4 to 9.1 and COLLATE

2012-05-10 Thread Tom Lane
Evan Carroll m...@evancarroll.com writes:
 Also, here is a copy of the complete script -- including the citext
 creation statements from the dump, with the patch, with the bugged
 statements.

Well, if that's how you're creating citext, then yeah it's broken.
As of 9.1 the citext type needs to be created with the attribute
COLLATABLE = true.  The suggested UPDATE statements are a means
of correcting a failure to do that after-the-fact, but they don't
cover any domains that have already been created on top of citext.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Question about schema-level permissions

2012-05-10 Thread Mike Christensen
I've created a schema called Indexer and a user called Indexer.  I
then grant Indexer ALL on said schema:

GRANT ALL ON SCHEMA Indexer TO Indexer;

Next, I attempt to INSERT into Indexer.ParseErrors, I get a permission
denied error message.  However, if I specifically grant Indexer INSERT
permissions:

GRANT INSERT ON Indexer.ParseErrors TO Indexer;

Then everything works.

Am I missing something?  Doesn't GRANT ALL mean that user can do
anything they want with objects in that schema, including inserts?
Thanks!

Mike

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question about schema-level permissions

2012-05-10 Thread Christophe Pettus

On May 10, 2012, at 9:16 PM, Mike Christensen wrote:
 Am I missing something?  Doesn't GRANT ALL mean that user can do
 anything they want with objects in that schema, including inserts?

No, it means that user has all privileges on *schema itself*; the objects 
inside of it have their own permissions.  What you are looking for is:

GRANT ALL ON ALL TABLES IN SCHEMA Indexer TO Indexer;

--
-- Christophe Pettus
   x...@thebuild.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question about schema-level permissions

2012-05-10 Thread Mike Christensen
Excellent, thanks so much!

Mike

On Thu, May 10, 2012 at 9:38 PM, Christophe Pettus x...@thebuild.com wrote:

 On May 10, 2012, at 9:16 PM, Mike Christensen wrote:
 Am I missing something?  Doesn't GRANT ALL mean that user can do
 anything they want with objects in that schema, including inserts?

 No, it means that user has all privileges on *schema itself*; the objects 
 inside of it have their own permissions.  What you are looking for is:

        GRANT ALL ON ALL TABLES IN SCHEMA Indexer TO Indexer;

 --
 -- Christophe Pettus
   x...@thebuild.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general