[PERFORM] Pros and Cons of 8.3.1

2008-05-02 Thread Gauri Kanekar
Hi,

Can anyone who have started using 8.3.1 list out the pros and cons.

Thanx in advance

~ Gauri


Re: [PERFORM] Pros and Cons of 8.3.1

2008-05-02 Thread Claus Guttesen
 Can anyone who have started using 8.3.1 list out the pros and cons.

I upgraded to 8.3.1 yesterday from 8.3.0. I've used 8.3.0 since it was
released and it's working fine. I upgraded from 7.4 (dump/restore) and
it was working out of the box. We have somewhat simple sql-queries so
there was no need to change/alter these. The largest table has approx.
85 mill. records (image-table).

One thing I had newer seen before was that duplicate rows was inserted
into our order-table but I don't know whether this is due to changes
in the web-app or 8.3.0. Now that I upgraded to 8.3.1 I will wait a
few weeks and see if I get the same error before I alter the column
and add a unique contraint.

So from a 7.4-perspective and fairly simple queries I don't see any issues.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Vacuum statistics

2008-05-02 Thread chirag . dave
What version of Postgres you are running ?

If you are using 8.3, you can use pg_stat_all_tables.If Not you can use
http://www.postgresql.org/docs/current/static/pgstattuple.html

Chirag

On Tue, Apr 29, 2008 at 8:14 AM, Francisco Reyes [EMAIL PROTECTED]
wrote:

 I recall reading posts in the past where one could query the stat tables
 and see how well autovacuum was performing. Not finding the posts.


 I found this query:
 SELECT relname, relkind, reltuples, relpages FROM pg_class where relkind =
 'r';

 From the output how can I tell the number of dead tuples? Or how effective
 autovacuum is in the particular table..

 Recently inheritted several large Postgresql DBs (tables in the hundreds
 of millions and some tables over a billion rows) and I am just starting to
 go over them and see how autovacuum has been performing.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Please ignore ...

2008-05-02 Thread Alvaro Herrera
Marc G. Fournier wrote:

 Someone on this list has one of those 'confirm your email' filters on their 
 mailbox, which is bouncing back messages ... this is an attempt to try and 
 narrow down the address that is causing this ...

Did you find out?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pros and Cons of 8.3.1

2008-05-02 Thread Justin



Gauri Kanekar wrote:


Hi,

Can anyone who have started using 8.3.1 list out the pros and cons.

Thanx in advance

~ Gauri


don't know for sure if it is windows to linux  but we moved to 8.2 that 
was install on windows and moved to 8.3.1 on Ubuntu using the compiled 
version from Ubuntu


We had minor annoying problem with implicit data conversion no longer 
happens


Had several pl/pgsql functions called each other where a programmer got 
lazy to not making sure the variable typed matched the parameter type so 
we get an error yelling at us can't find function due to data type 
mismatch .  Its was very easy to fix. 


There is allot changes to how Text searches work and the indexes


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pros and Cons of 8.3.1

2008-05-02 Thread Kevin Grittner
 On Fri, May 2, 2008 at  2:31 AM, in message
[EMAIL PROTECTED], Gauri
Kanekar
[EMAIL PROTECTED] wrote: 
 
 Can anyone who have started using 8.3.1 list out the pros and cons.
 
There are bugs in the 8.3.1 release which bit us when we started using
it; however, these are fixed in the 8.3 stable branch of cvs.  We are
running successfully with that.  These fixes will be in 8.3.2 when it is
released.
 
http://archives.postgresql.org/pgsql-bugs/2008-04/msg00168.php
 
It's generally a good idea to test with a new release before putting it
into production, especially a major release.
 
Since you asked on the performance list -- we have found performance to
be significantly better under 8.3 than earlier releases.  Also, the data
takes less space on the disk, and checkpoint disk activity spikes are
reduced in 8.3.
 
-Kevin
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pros and Cons of 8.3.1

2008-05-02 Thread Kevin Grittner
Attempting to resend.  My first attempt was rejected with this
explanation:
 
Your message to the pgsql-performance list has been denied
for the following reason(s):

A message was previous posted with this Message-ID
Duplicate Message-ID - [EMAIL PROTECTED] (Fri May  2
13:36:52 2008)
Duplicate Partial Message Checksum (Fri May  2 13:36:52 2008)
 
 
 On Fri, May 2, 2008 at  2:31 AM, in message
[EMAIL PROTECTED], Gauri
Kanekar
[EMAIL PROTECTED] wrote: 
 
 Can anyone who have started using 8.3.1 list out the pros and cons.
 
There are bugs in the 8.3.1 release which bit us when we started using
it; however, these are fixed in the 8.3 stable branch of cvs.  We are
running successfully with that.  These fixes will be in 8.3.2 when it is
released.
 
http://archives.postgresql.org/pgsql-bugs/2008-04/msg00168.php
 
It's generally a good idea to test with a new release before putting it
into production, especially a major release.
 
Since you asked on the performance list -- we have found performance to
be significantly better under 8.3 than earlier releases.  Also, the data
takes less space on the disk, and checkpoint disk activity spikes are
reduced in 8.3.
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pros and Cons of 8.3.1

2008-05-02 Thread Kevin Grittner
Attempting to resend.  My first attempt was rejected with this
explanation:
 
Your message to the pgsql-performance list has been denied
for the following reason(s):

A message was previous posted with this Message-ID
Duplicate Message-ID - [EMAIL PROTECTED] (Fri May  2
13:36:52 2008)
Duplicate Partial Message Checksum (Fri May  2 13:36:52 2008)
 
 
 Gauri Kanekar wrote: 
 
 Can anyone who have started using 8.3.1 list out the pros and cons.
 
There are bugs in the 8.3.1 release which bit us when we started using
it; however, these are fixed in the 8.3 stable branch of cvs.  We are
running successfully with that.  These fixes will be in 8.3.2 when it is
released.
 
http://archives.postgresql.org/pgsql-bugs/2008-04/msg00168.php
 
It's generally a good idea to test with a new release before putting it
into production, especially a major release.
 
Since you asked on the performance list -- we have found performance to
be significantly better under 8.3 than earlier releases.  Also, the data
takes less space on the disk, and checkpoint disk activity spikes are
reduced in 8.3.
 
-Kevin



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pros and Cons of 8.3.1

2008-05-02 Thread Joshua D. Drake
On Fri, 02 May 2008 12:11:34 -0500
Justin [EMAIL PROTECTED] wrote:

 
 don't know for sure if it is windows to linux  but we moved to 8.2
 that was install on windows and moved to 8.3.1 on Ubuntu using the
 compiled version from Ubuntu

 We had minor annoying problem with implicit data conversion no longer 
 happens


It is 8.3.x and the change was documented in the release notes.

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




signature.asc
Description: PGP signature


[PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov
Greetings -- I have an UPDATE query updating a 100 million row table,  
and allocate enough memory via shared_buffers=1500MB.  However, I see  
two processes in top, the UPDATE process eating about 850 MB and the  
writer process eating about 750 MB.  The box starts paging.   Why is  
there the writer taking almost as much space as the UPDATE, and how  
can I shrink it?


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Scott Marlowe
On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote:
 Greetings -- I have an UPDATE query updating a 100 million row table, and
 allocate enough memory via shared_buffers=1500MB.  However, I see two
 processes in top, the UPDATE process eating about 850 MB and the writer
 process eating about 750 MB.  The box starts paging.   Why is there the
 writer taking almost as much space as the UPDATE, and how can I shrink it?

Shared_buffers is NOT the main memory pool for all operations in
pgsql, it is simply the buffer pool used to hold data being operated
on.

Things like sorts etc. use other memory and can exhaust your machine.
However, I'd like to see the output of vmstat 1 or top while this is
happening.

How much memory does this machine have?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov


On May 2, 2008, at 12:30 PM, Scott Marlowe wrote:

On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov  
[EMAIL PROTECTED] wrote:
Greetings -- I have an UPDATE query updating a 100 million row  
table, and

allocate enough memory via shared_buffers=1500MB.  However, I see two
processes in top, the UPDATE process eating about 850 MB and the  
writer
process eating about 750 MB.  The box starts paging.   Why is there  
the
writer taking almost as much space as the UPDATE, and how can I  
shrink it?


Shared_buffers is NOT the main memory pool for all operations in
pgsql, it is simply the buffer pool used to hold data being operated
on.

Things like sorts etc. use other memory and can exhaust your machine.
However, I'd like to see the output of vmstat 1 or top while this is
happening.

How much memory does this machine have?


It's a 2GB RAM MacBook.  Here's the top for postgres

Processes:  117 total, 2 running, 6 stuck, 109 sleeping... 459  
threads 
  12 
:34:27
Load Avg:  0.27,  0.24,  0.32CPU usage:  8.41% user, 11.06% sys,  
80.53% idle
SharedLibs: num =   15, resident =   40M code, 2172K data, 3172K  
linkedit.

MemRegions: num = 20719, resident =  265M +   12M private, 1054M shared.
PhysMem:  354M wired, 1117M active,  551M inactive, 2022M used,   19M  
free.

VM: 26G + 373M   1176145(160) pageins, 1446482(2) pageouts

  PID COMMAND  %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD   
RSIZE  VSIZE
51775 postgres 6.8%  2:40.16   1 9 39 1504K   896M   859M+  
1562M
51767 postgres 0.0%  0:39.74   1 8 28  752K   896M   752M   
1560M


the first is the UPDATE, the second is the writer.

The query is very simple,

netflix= create index movs_mid_idx on movs(mid);
CREATE INDEX
netflix= update ratings set offset1=avg-rating from movs where  
mid=movie_id;


where the table ratings has about 100 million rows, movs has about  
20,000.


I randomly increased values in postgresql.conf to

shared_buffers = 1500MB
max_fsm_pages = 200
max_fsm_relations = 1

Should I set the background writer parameters somehow to decrease the  
RAM consumed by the writer?


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Scott Marlowe
On Fri, May 2, 2008 at 1:38 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote:


  On May 2, 2008, at 12:30 PM, Scott Marlowe wrote:


  On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov [EMAIL PROTECTED]
 wrote:
 
   Greetings -- I have an UPDATE query updating a 100 million row table,
 and
   allocate enough memory via shared_buffers=1500MB.  However, I see two
   processes in top, the UPDATE process eating about 850 MB and the writer
   process eating about 750 MB.  The box starts paging.   Why is there the
   writer taking almost as much space as the UPDATE, and how can I shrink
 it?
  
 
  Shared_buffers is NOT the main memory pool for all operations in
  pgsql, it is simply the buffer pool used to hold data being operated
  on.
 
  Things like sorts etc. use other memory and can exhaust your machine.
  However, I'd like to see the output of vmstat 1 or top while this is
  happening.
 
  How much memory does this machine have?
 

  It's a 2GB RAM MacBook.  Here's the top for postgres

  Processes:  117 total, 2 running, 6 stuck, 109 sleeping... 459 threads
 12:34:27
  Load Avg:  0.27,  0.24,  0.32CPU usage:  8.41% user, 11.06% sys, 80.53%
 idle
  SharedLibs: num =   15, resident =   40M code, 2172K data, 3172K linkedit.
  MemRegions: num = 20719, resident =  265M +   12M private, 1054M shared.
  PhysMem:  354M wired, 1117M active,  551M inactive, 2022M used,   19M free.
  VM: 26G + 373M   1176145(160) pageins, 1446482(2) pageouts

   PID COMMAND  %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
  51775 postgres 6.8%  2:40.16   1 9 39 1504K   896M   859M+
 1562M
  51767 postgres 0.0%  0:39.74   1 8 28  752K   896M   752M
 1560M

SOME snipping here.

  I randomly increased values in postgresql.conf to

  shared_buffers = 1500MB
  max_fsm_pages = 200
  max_fsm_relations = 1

On a laptop with 2G ram, 1.5Gig shared buffers is probably WAY too high.

  Should I set the background writer parameters somehow to decrease the RAM
 consumed by the writer?

No, the background writer reads through the shared buffers for dirty
ones and writes them out.  so, it's not really using MORE memory, it's
just showing that it's attached to the ginormous shared_buffer pool
you've set up.

Lower your shared_buffers to about 512M or so and see how it works.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Tom Lane
Scott Marlowe [EMAIL PROTECTED] writes:
 On Fri, May 2, 2008 at 1:38 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote:
 I randomly increased values in postgresql.conf to
 
 shared_buffers = 1500MB
 max_fsm_pages = 200
 max_fsm_relations = 1

 On a laptop with 2G ram, 1.5Gig shared buffers is probably WAY too high.

s/probably/definitely/, especially seeing that OS X is a bit of a memory
hog itself.  I don't think you should figure on more than 1GB being
usefully available to Postgres, and you can't give all or even most of
that space to shared_buffers.

 No, the background writer reads through the shared buffers for dirty
 ones and writes them out.  so, it's not really using MORE memory, it's
 just showing that it's attached to the ginormous shared_buffer pool
 you've set up.

Yeah.  You have to be aware of top's quirky behavior for shared memory:
on most platforms it will count the shared memory against *each*
process, but only as much of the shared memory as that process has
touched so far.  So over time the reported size of any PG process will
tend to climb to something over the shared memory size, but most of that
isn't real.

I haven't directly checked whether OS X's top behaves that way, but
given your report I think it does.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Greg Smith

On Fri, 2 May 2008, Alexy Khrabrov wrote:

I have an UPDATE query updating a 100 million row table, and 
allocate enough memory via shared_buffers=1500MB.


In addition to reducing that as you've been advised, you'll probably need 
to increase checkpoint_segments significantly from the default (3) in 
order to get good performance on an update that large.  Something like 30 
would be a reasonable starting point.


I'd suggest doing those two things, seeing how things go, and reporting 
back if you still think performance is unacceptable.  We'd need to know 
your PostgreSQL version in order to really target future suggestions.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov


On May 2, 2008, at 1:13 PM, Tom Lane wrote:

I don't think you should figure on more than 1GB being
usefully available to Postgres, and you can't give all or even most of
that space to shared_buffers.



So how should I divide say a 512 MB between shared_buffers and, um,  
what else?  (new to pg tuning :)


I naively thought that if I have a 100,000,000 row table, of the form  
(integer,integer,smallint,date), and add a real coumn to it, it will  
scroll through the memory reasonably fast.  Yet when I had  
shared_buffers=128 MB, it was hanging there 8 hours before I killed  
it, and now with 1500MB is paging again for several hours with no end  
in sight.  Why can't it just add a column to a row at a time and be  
done with it soon enough? :)  It takes inordinately long compared to a  
FORTRAN or even python program and there's no index usage for this  
table, a sequential scan, why all the paging?


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov
Interestingly, after shutting down the server with  
shared_buffer=1500MB in the middle of that UPDATE, I see this:


bash-3.2$ /opt/bin/pg_ctl -D /data/pgsql/ stop
waiting for server to shut downLOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
... failed
pg_ctl: server does not shut down
bash-3.2$ /opt/bin/pg_ctl -D /data/pgsql/ stop
waiting for server to shut  
down..LOG:   
shutting down

LOG:  database system is shut down
 done
server stopped

-- had to do it twice, the box was paging for a minute or two.

Should I do something about the autovacuum e.g. to turn it off  
completely?  I thought it's not on as all of it was still commented  
out in postgresql.conf as shipped, only tweaked a few numbers as  
reported before.


Cheers,
Alexy


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Scott Marlowe
On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote:

  So how should I divide say a 512 MB between shared_buffers and, um, what
 else?  (new to pg tuning :)

Don't worry so much about the rest of the settings.  Maybe increase
sort_mem (aka work_mem) to something like 16M or so.  that's about it.

  I naively thought that if I have a 100,000,000 row table, of the form
 (integer,integer,smallint,date), and add a real coumn to it, it will scroll
 through the memory reasonably fast.

This is a database.  It makes changes on disk in such a way that they
won't be lost should power be cut off.  If you're just gonna be batch
processing data that it's ok to lose halfway through, then python /
perl / php etc might be a better choice.

  Yet when I had shared_buffers=128 MB,
 it was hanging there 8 hours before I killed it, and now with 1500MB is
 paging again for several hours with no end in sight.

You went from kinda small to WAY too big.  512M should be a happy medium.

  Why can't it just add
 a column to a row at a time and be done with it soon enough? :)

Adding a column is instantaneous.  populating it is not.

 It takes
 inordinately long compared to a FORTRAN or even python program and there's
 no index usage for this table, a sequential scan, why all the paging?

Again, a database protects your data from getting scrambled should the
program updating it quit halfway through etc...

Have you been vacuuming between these update attempts?  Each one has
created millions of dead rows and bloated your data store.  vacuum
full / cluster / reindex may be needed.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov


On May 2, 2008, at 1:40 PM, Scott Marlowe wrote:

Again, a database protects your data from getting scrambled should the
program updating it quit halfway through etc...


Right -- but this is a data mining work, I add a derived column to a  
row, and it's computed from that very row and a small second table  
which should fit in RAM.



Have you been vacuuming between these update attempts?  Each one has
created millions of dead rows and bloated your data store.  vacuum
full / cluster / reindex may be needed.


I've read postgresql.conf better and see autovacuum = on is commented  
out, so it's on.  That explains why shutting down was taking so long  
to shut autovacuum down too.


Basically, the derived data is not critical at all, -- can I turn (1)  
off transactional behavior for an UPDATE, (2) should I care about  
vacuuming being done on the fly when saving RAM, or need I defer it/ 
manage it manually?


I wonder what MySQL would do here on MyISAM tables without  
transactional behavior -- perhaps this is the case more suitable for  
them?


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Craig James

On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote:

 I naively thought that if I have a 100,000,000 row table, of the form
(integer,integer,smallint,date), and add a real coumn to it, it will scroll
through the memory reasonably fast.


In Postgres, an update is the same as a delete/insert.  That means that 
changing the data in one column rewrites ALL of the columns for that row, and 
you end up with a table that's 50% dead space, which you then have to vacuum.

Sometimes if you have a volatile column that goes with several static 
columns, you're far better off to create a second table for the volatile data, duplicating the 
primary key in both tables.  In your case, it would mean the difference between 10^8 inserts of 
(int, float), very fast, compared to what you're doing now, which is 10^8 insert and 10^8 deletes 
of (int, int, smallint, date, float), followed by a big vacuum/analyze (also slow).

The down side of this design is that later on, it requires a join to fetch all 
the data for each key.

You do have a primary key on your data, right?  Or some sort of index?

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov


On May 2, 2008, at 2:02 PM, Craig James wrote:

On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov  
[EMAIL PROTECTED] wrote:

I naively thought that if I have a 100,000,000 row table, of the form
(integer,integer,smallint,date), and add a real coumn to it, it  
will scroll

through the memory reasonably fast.


In Postgres, an update is the same as a delete/insert.  That means  
that changing the data in one column rewrites ALL of the columns for  
that row, and you end up with a table that's 50% dead space, which  
you then have to vacuum.


Sometimes if you have a volatile column that goes with several  
static columns, you're far better off to create a second table for  
the volatile data, duplicating the primary key in both tables.  In  
your case, it would mean the difference between 10^8 inserts of  
(int, float), very fast, compared to what you're doing now, which is  
10^8 insert and 10^8 deletes of (int, int, smallint, date, float),  
followed by a big vacuum/analyze (also slow).


The down side of this design is that later on, it requires a join to  
fetch all the data for each key.


You do have a primary key on your data, right?  Or some sort of index?


I created several indices for the primary table, yes.  Sure I can do a  
table for a volatile column, but then I'll have to create a new such  
table for each derived column -- that's why I tried to add a column to  
the existing table.  Yet seeing this is really slow, and I need to to  
many derived analyses like this -- which are later scanned in other  
computations, so should persist -- I indeed see no other way but to  
procreate derived tables with the same key, one column per each...


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Greg Smith

On Fri, 2 May 2008, Alexy Khrabrov wrote:


I created several indices for the primary table, yes.


That may be part of your problem.  All of the indexes all are being 
updated along with the main data in the row each time you touch a record. 
There's some optimization there in 8.3 but it doesn't make index overhead 
go away completely.  As mentioned already, the optimal solution to 
problems in this area is to adjust table normalization as much as feasible 
to limit what you're updating.


Basically, the derived data is not critical at all, -- can I turn (1) 
off transactional behavior for an UPDATE,


What you can do is defer transaction commits to only happen periodically 
rather than all the time by turning off syncronous_commit and increasing 
wal_writer_delay; see 
http://www.postgresql.com.cn/docs/8.3/static/wal-async-commit.html


(2) should I care about vacuuming being done on the fly when saving RAM, 
or need I defer it/manage it manually?


It's hard to speculate from here about what optimal vacuum behavior will 
be.  You might find it more efficient to turn autovacuum off when doing 
these large updates.  The flip side is that you'll be guaranteed to end up 
with more dead rows in the table and that has its own impact later.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread PFC


I created several indices for the primary table, yes.  Sure I can do a  
table for a volatile column, but then I'll have to create a new such  
table for each derived column -- that's why I tried to add a column to  
the existing table.  Yet seeing this is really slow, and I need to to  
many derived analyses like this -- which are later scanned in other  
computations, so should persist -- I indeed see no other way but to  
procreate derived tables with the same key, one column per each...


	OK, so in that case, if you could do all of your derived column  
calculations in one query like this :


CREATE TABLE derived AS SELECT ... FROM ... (perform all your derived  
calculations here)


or :

BEGIN;  -- this is important to avoid writing xlog
CREATE TABLE derived AS ...
INSERT INTO derived SELECT ... FROM ... (perform all your derived  
calculations here)

COMMIT;

	Basically, updating the entire table several times to add a few simple  
columns is a bad idea. If you can compute all the data you need in one  
query, like above, it will be much faster. Especially if you join one  
large table to several smaller ones, and as long as the huge data set  
doesn't need to be sorted (check the query plan using EXPLAIN). Try to do  
as much as possible in one query to scan the large dataset only once.


	Note that the above will be faster than updating the entire table since  
it needs to write much less data : it doesn't need to delete the old rows,  
and it doesn't need to write the transaction log, since if the transaction  
rolls back, the table never existed anyway. Also since your newly created  
table doesn't have any indexes, they won't need to be updated.


	If you really need to update an entire table multiple times, you will  
need to :


	- Use hardware that can handle disk writes at a decent speed (that isn't  
a characteristic of a laptop drive)
	- use MyIsam, yes (but if you need to make complex queries on the data  
afterwards, it could suck).



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Very slow INFORMATION_SCHEMA

2008-05-02 Thread Ernesto


Hi,

I'm porting an application written with pretty portable SQL, but tested 
almost exclusively on MySQL.


I'm wondering why would this query take about 90 seconds to return 74 rows?


SELECT INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_NAME, 
INFORMATION_SCHEMA.TABLE_CONSTRAINTS.TABLE_NAME, 
INFORMATION_SCHEMA.TABLE_CONSTRAINTS.TABLE_NAME, 
INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_TYPE,
   INFORMATION_SCHEMA.KEY_COLUMN_USAGE.COLUMN_NAME, 
INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME, 
INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_COLUMN_NAME
   FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS, 
INFORMATION_SCHEMA.KEY_COLUMN_USAGE
   WHERE 
INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_NAME=INFORMATION_SCHEMA.KEY_COLUMN_USAGE.CONSTRAINT_NAME
   AND 
INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_SCHEMA=INFORMATION_SCHEMA.KEY_COLUMN_USAGE.CONSTRAINT_SCHEMA
   AND 
INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_SCHEMA='mydbname'
   AND 
INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_TYPE='FOREIGN KEY'
   ORDER BY INFORMATION_SCHEMA.TABLE_CONSTRAINTS.TABLE_NAME, 
INFORMATION_SCHEMA.KEY_COLUMN_USAGE.ORDINAL_POSITION


An equivalent query with the same data set on the same server takes a 
couple of milliseconds on MySQL 5.
Is it something I'm doing wrong or it's just that PostgreSQL 
INFORMATION_SCHEMA is not optimized for speed? BTW, what I'm trying to 
do is get some info on every FOREIGN KEY in a database.


It's PostgreSQL 8.2.7 on Fedora 8 64, Athlon 64 X2 3600+.

Ernesto


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov

On May 2, 2008, at 2:23 PM, Greg Smith wrote:


On Fri, 2 May 2008, Alexy Khrabrov wrote:


I created several indices for the primary table, yes.


That may be part of your problem.  All of the indexes all are being  
updated along with the main data in the row each time you touch a  
record. There's some optimization there in 8.3 but it doesn't make  
index overhead go away completely.  As mentioned already, the  
optimal solution to problems in this area is to adjust table  
normalization as much as feasible to limit what you're updating.


Was wondering about it, too -- intuitively I 'd like to say, stop all  
indexing until the column is added, then say reindex, is it  
doable?  Or would it take longer anyways?  SInce I don't index on that  
new column, I'd assume my old indices would do -- do they change  
because of rows deletions/insertions, with the effective new rows  
addresses?


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Kevin Grittner
 Alexy Khrabrov wrote: 
 
 SInce I don't index on that  
 new column, I'd assume my old indices would do -- do they change  
 because of rows deletions/insertions, with the effective new rows  
 addresses?
 
Every update is a delete and insert.  The new version of the row must
be added to the index.  Every access through the index then has to
look at both versions of the row to see which one is current for its
transaction.  Vacuum will make the space used by the dead rows
available for reuse, as well as removing the old index entries and
making that space available for new index entries.
 
-Kevin
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow INFORMATION_SCHEMA

2008-05-02 Thread Tom Lane
Ernesto [EMAIL PROTECTED] writes:
 I'm wondering why would this query take about 90 seconds to return 74 rows?

EXPLAIN ANALYZE might tell you something.

Is this really the query you're running?  Because these two columns
don't exist:

 INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME, 
 INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_COLUMN_NAME

Leaving those out, I get sub-second runtimes for 70-odd foreign key
constraints, on much slower hardware than I think you are using.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov

On May 2, 2008, at 2:43 PM, Kevin Grittner wrote:


Alexy Khrabrov wrote:



SInce I don't index on that
new column, I'd assume my old indices would do -- do they change
because of rows deletions/insertions, with the effective new rows
addresses?


Every update is a delete and insert.  The new version of the row must
be added to the index.  Every access through the index then has to
look at both versions of the row to see which one is current for its
transaction.  Vacuum will make the space used by the dead rows
available for reuse, as well as removing the old index entries and
making that space available for new index entries.


OK.  I've cancelled all previous attempts at UPDATE and will now  
create some derived tables.  See no changes in the previous huge table  
-- the added column was completely empty.  Dropped it.  Should I  
vacuum just in case, or am I guaranteed not to have any extra rows  
since no UPDATE actually went through and none are showing?


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Kevin Grittner
 Alexy Khrabrov wrote: 
 
 OK.  I've cancelled all previous attempts at UPDATE and will now  
 create some derived tables.  See no changes in the previous huge
table  
 -- the added column was completely empty.  Dropped it.  Should I  
 vacuum just in case, or am I guaranteed not to have any extra rows  
 since no UPDATE actually went through and none are showing?
 
The canceled attempts would have left dead space.  If you have
autovacuum running, it probably made the space available for reuse,
but depending on exactly how you got to where you are, you may have
bloat.  Personally, I would do a VACUUM ANALYZE VERBOSE and capture
the output.  If bloat is too bad, you may want to CLUSTER the table
(if you have the free disk space for a temporary extra copy of the
table) or VACUUM FULL followed by REINDEX (if you don't have that much
free disk space).
 
Let us know if you need help interpreting the VERBOSE output.
 
-Kevin
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance