Re: [PERFORM] two memory-consuming postgres processes

2008-05-03 Thread Merlin Moncure
On Fri, May 2, 2008 at 4:51 PM, Alexy Khrabrov <[EMAIL PROTECTED]> wrote:
>
>  On May 2, 2008, at 1:40 PM, Scott Marlowe wrote:
>
> > Again, a database protects your data from getting scrambled should the
> > program updating it quit halfway through etc...
> >
>
>  Right -- but this is a data mining work, I add a derived column to a row,
> and it's computed from that very row and a small second table which should
> fit in RAM.

Full table update of a single field is one of the worst possible
operations with PostgreSQL.  mysql is better at this because lack of
proper transactions and full table locking allow the rows to be
(mostly) updated in place.  Ideally, you should be leveraging the
power of PostgreSQL so that you can avoid the full table update if
possible.  Maybe if you step back and think about the problem you may
be able to come up with a solution that is more efficient.

Also, if you must do it this way, (as others suggest), do CREATE TABLE
new_table AS SELECT, then create keys, and drop the old table when
done.  This is much faster than update.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-03 Thread Matthew Wakeling

On Fri, 2 May 2008, PFC wrote:
CREATE TABLE derived AS SELECT ... FROM ... (perform all your derived 
calculations here)


Given what you have said (that you really want all the data in one table) 
it may be best to proceed like this:


First, take your original table, create an index on the primary key field, 
and CLUSTER on that index.


CREATE TABLE derived AS SELECT ... FROM ... ORDER BY primary key field
CREATE INDEX derived_pk ON derived(primary key field)

Repeat those last two commands ad nauseum.

Then, when you want a final full table, run:

CREATE TABLE new_original AS SELECT * FROM original, derived, derived2,
  ... WHERE original.pk = derived.pk ...

That should be a merge join, which should run really quickly, and you can 
then create all the indexes you want on it.


Matthew

--
When I first started working with sendmail, I was convinced that the cf
file had been created by someone bashing their head on the keyboard. After
a week, I realised this was, indeed, almost certainly the case.
   -- Unknown

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov

On May 2, 2008, at 2:43 PM, Kevin Grittner wrote:


Alexy Khrabrov wrote:



SInce I don't index on that
new column, I'd assume my old indices would do -- do they change
because of rows deletions/insertions, with the effective new rows
addresses?


Every update is a delete and insert.  The new version of the row must
be added to the index.  Every access through the index then has to
look at both versions of the row to see which one is "current" for its
transaction.  Vacuum will make the space used by the dead rows
available for reuse, as well as removing the old index entries and
making that space available for new index entries.


OK.  I've cancelled all previous attempts at UPDATE and will now  
create some derived tables.  See no changes in the previous huge table  
-- the added column was completely empty.  Dropped it.  Should I  
vacuum just in case, or am I guaranteed not to have any extra rows  
since no UPDATE actually went through and none are showing?


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Kevin Grittner
>>> Alexy Khrabrov wrote: 
 
> OK.  I've cancelled all previous attempts at UPDATE and will now  
> create some derived tables.  See no changes in the previous huge
table  
> -- the added column was completely empty.  Dropped it.  Should I  
> vacuum just in case, or am I guaranteed not to have any extra rows  
> since no UPDATE actually went through and none are showing?
 
The canceled attempts would have left dead space.  If you have
autovacuum running, it probably made the space available for reuse,
but depending on exactly how you got to where you are, you may have
bloat.  Personally, I would do a VACUUM ANALYZE VERBOSE and capture
the output.  If bloat is too bad, you may want to CLUSTER the table
(if you have the free disk space for a temporary extra copy of the
table) or VACUUM FULL followed by REINDEX (if you don't have that much
free disk space).
 
Let us know if you need help interpreting the VERBOSE output.
 
-Kevin
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Kevin Grittner
>>> Alexy Khrabrov wrote: 
 
> SInce I don't index on that  
> new column, I'd assume my old indices would do -- do they change  
> because of rows deletions/insertions, with the effective new rows  
> addresses?
 
Every update is a delete and insert.  The new version of the row must
be added to the index.  Every access through the index then has to
look at both versions of the row to see which one is "current" for its
transaction.  Vacuum will make the space used by the dead rows
available for reuse, as well as removing the old index entries and
making that space available for new index entries.
 
-Kevin
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov

On May 2, 2008, at 2:23 PM, Greg Smith wrote:


On Fri, 2 May 2008, Alexy Khrabrov wrote:


I created several indices for the primary table, yes.


That may be part of your problem.  All of the indexes all are being  
updated along with the main data in the row each time you touch a  
record. There's some optimization there in 8.3 but it doesn't make  
index overhead go away completely.  As mentioned already, the  
optimal solution to problems in this area is to adjust table  
normalization as much as feasible to limit what you're updating.


Was wondering about it, too -- intuitively I 'd like to say, "stop all  
indexing" until the column is added, then say "reindex", is it  
doable?  Or would it take longer anyways?  SInce I don't index on that  
new column, I'd assume my old indices would do -- do they change  
because of rows deletions/insertions, with the effective new rows  
addresses?


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread PFC


I created several indices for the primary table, yes.  Sure I can do a  
table for a volatile column, but then I'll have to create a new such  
table for each derived column -- that's why I tried to add a column to  
the existing table.  Yet seeing this is really slow, and I need to to  
many derived analyses like this -- which are later scanned in other  
computations, so should persist -- I indeed see no other way but to  
procreate derived tables with the same key, one column per each...


	OK, so in that case, if you could do all of your derived column  
calculations in one query like this :


CREATE TABLE derived AS SELECT ... FROM ... (perform all your derived  
calculations here)


or :

BEGIN;  <-- this is important to avoid writing xlog
CREATE TABLE derived AS ...
INSERT INTO derived SELECT ... FROM ... (perform all your derived  
calculations here)

COMMIT;

	Basically, updating the entire table several times to add a few simple  
columns is a bad idea. If you can compute all the data you need in one  
query, like above, it will be much faster. Especially if you join one  
large table to several smaller ones, and as long as the huge data set  
doesn't need to be sorted (check the query plan using EXPLAIN). Try to do  
as much as possible in one query to scan the large dataset only once.


	Note that the above will be faster than updating the entire table since  
it needs to write much less data : it doesn't need to delete the old rows,  
and it doesn't need to write the transaction log, since if the transaction  
rolls back, the table never existed anyway. Also since your newly created  
table doesn't have any indexes, they won't need to be updated.


	If you really need to update an entire table multiple times, you will  
need to :


	- Use hardware that can handle disk writes at a decent speed (that isn't  
a characteristic of a laptop drive)
	- use MyIsam, yes (but if you need to make complex queries on the data  
afterwards, it could suck).



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Greg Smith

On Fri, 2 May 2008, Alexy Khrabrov wrote:


I created several indices for the primary table, yes.


That may be part of your problem.  All of the indexes all are being 
updated along with the main data in the row each time you touch a record. 
There's some optimization there in 8.3 but it doesn't make index overhead 
go away completely.  As mentioned already, the optimal solution to 
problems in this area is to adjust table normalization as much as feasible 
to limit what you're updating.


Basically, the derived data is not critical at all, -- can I turn (1) 
off transactional behavior for an UPDATE,


What you can do is defer transaction commits to only happen periodically 
rather than all the time by turning off syncronous_commit and increasing 
wal_writer_delay; see 
http://www.postgresql.com.cn/docs/8.3/static/wal-async-commit.html


(2) should I care about vacuuming being done on the fly when saving RAM, 
or need I defer it/manage it manually?


It's hard to speculate from here about what optimal vacuum behavior will 
be.  You might find it more efficient to turn autovacuum off when doing 
these large updates.  The flip side is that you'll be guaranteed to end up 
with more dead rows in the table and that has its own impact later.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov


On May 2, 2008, at 2:02 PM, Craig James wrote:

On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov  
<[EMAIL PROTECTED]> wrote:

I naively thought that if I have a 100,000,000 row table, of the form
(integer,integer,smallint,date), and add a real coumn to it, it  
will scroll

through the memory reasonably fast.


In Postgres, an update is the same as a delete/insert.  That means  
that changing the data in one column rewrites ALL of the columns for  
that row, and you end up with a table that's 50% dead space, which  
you then have to vacuum.


Sometimes if you have a "volatile" column that goes with several  
"static" columns, you're far better off to create a second table for  
the volatile data, duplicating the primary key in both tables.  In  
your case, it would mean the difference between 10^8 inserts of  
(int, float), very fast, compared to what you're doing now, which is  
10^8 insert and 10^8 deletes of (int, int, smallint, date, float),  
followed by a big vacuum/analyze (also slow).


The down side of this design is that later on, it requires a join to  
fetch all the data for each key.


You do have a primary key on your data, right?  Or some sort of index?


I created several indices for the primary table, yes.  Sure I can do a  
table for a volatile column, but then I'll have to create a new such  
table for each derived column -- that's why I tried to add a column to  
the existing table.  Yet seeing this is really slow, and I need to to  
many derived analyses like this -- which are later scanned in other  
computations, so should persist -- I indeed see no other way but to  
procreate derived tables with the same key, one column per each...


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Craig James

On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov <[EMAIL PROTECTED]> wrote:

 I naively thought that if I have a 100,000,000 row table, of the form
(integer,integer,smallint,date), and add a real coumn to it, it will scroll
through the memory reasonably fast.


In Postgres, an update is the same as a delete/insert.  That means that 
changing the data in one column rewrites ALL of the columns for that row, and 
you end up with a table that's 50% dead space, which you then have to vacuum.

Sometimes if you have a "volatile" column that goes with several "static" 
columns, you're far better off to create a second table for the volatile data, duplicating the 
primary key in both tables.  In your case, it would mean the difference between 10^8 inserts of 
(int, float), very fast, compared to what you're doing now, which is 10^8 insert and 10^8 deletes 
of (int, int, smallint, date, float), followed by a big vacuum/analyze (also slow).

The down side of this design is that later on, it requires a join to fetch all 
the data for each key.

You do have a primary key on your data, right?  Or some sort of index?

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov


On May 2, 2008, at 1:40 PM, Scott Marlowe wrote:

Again, a database protects your data from getting scrambled should the
program updating it quit halfway through etc...


Right -- but this is a data mining work, I add a derived column to a  
row, and it's computed from that very row and a small second table  
which should fit in RAM.



Have you been vacuuming between these update attempts?  Each one has
created millions of dead rows and bloated your data store.  vacuum
full / cluster / reindex may be needed.


I've read postgresql.conf better and see autovacuum = on is commented  
out, so it's on.  That explains why shutting down was taking so long  
to shut autovacuum down too.


Basically, the derived data is not critical at all, -- can I turn (1)  
off transactional behavior for an UPDATE, (2) should I care about  
vacuuming being done on the fly when saving RAM, or need I defer it/ 
manage it manually?


I wonder what MySQL would do here on MyISAM tables without  
transactional behavior -- perhaps this is the case more suitable for  
them?


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Scott Marlowe
On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov <[EMAIL PROTECTED]> wrote:
>
>  So how should I divide say a 512 MB between shared_buffers and, um, what
> else?  (new to pg tuning :)

Don't worry so much about the rest of the settings.  Maybe increase
sort_mem (aka work_mem) to something like 16M or so.  that's about it.

>  I naively thought that if I have a 100,000,000 row table, of the form
> (integer,integer,smallint,date), and add a real coumn to it, it will scroll
> through the memory reasonably fast.

This is a database.  It makes changes on disk in such a way that they
won't be lost should power be cut off.  If you're just gonna be batch
processing data that it's ok to lose halfway through, then python /
perl / php etc might be a better choice.

>  Yet when I had shared_buffers=128 MB,
> it was hanging there 8 hours before I killed it, and now with 1500MB is
> paging again for several hours with no end in sight.

You went from kinda small to WAY too big.  512M should be a happy medium.

>  Why can't it just add
> a column to a row at a time and be done with it soon enough? :)

Adding a column is instantaneous.  populating it is not.

> It takes
> inordinately long compared to a FORTRAN or even python program and there's
> no index usage for this table, a sequential scan, why all the paging?

Again, a database protects your data from getting scrambled should the
program updating it quit halfway through etc...

Have you been vacuuming between these update attempts?  Each one has
created millions of dead rows and bloated your data store.  vacuum
full / cluster / reindex may be needed.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov
Interestingly, after shutting down the server with  
shared_buffer=1500MB in the middle of that UPDATE, I see this:


bash-3.2$ /opt/bin/pg_ctl -D /data/pgsql/ stop
waiting for server to shut downLOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
... failed
pg_ctl: server does not shut down
bash-3.2$ /opt/bin/pg_ctl -D /data/pgsql/ stop
waiting for server to shut  
down..LOG:   
shutting down

LOG:  database system is shut down
 done
server stopped

-- had to do it twice, the box was paging for a minute or two.

Should I do something about the autovacuum e.g. to turn it off  
completely?  I thought it's not on as all of it was still commented  
out in postgresql.conf as shipped, only tweaked a few numbers as  
reported before.


Cheers,
Alexy


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov

On May 2, 2008, at 1:22 PM, Greg Smith wrote:


On Fri, 2 May 2008, Alexy Khrabrov wrote:

I have an UPDATE query updating a 100 million row table, and  
allocate enough memory via shared_buffers=1500MB.


In addition to reducing that as you've been advised, you'll probably  
need to increase checkpoint_segments significantly from the default  
(3) in order to get good performance on an update that large.   
Something like 30 would be a reasonable starting point.


I'd suggest doing those two things, seeing how things go, and  
reporting back if you still think performance is unacceptable.  We'd  
need to know your PostgreSQL version in order to really target  
future suggestions.


PostgreSQL 8.3.1, compiled from source on Mac OSX 10.5.2 (Leopard).   
Saw the checkpoint_segments warning every ~20sec and increased it to  
100 already.  Will see what 512 MB buys me, but 128 MB was paging  
miserably.


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov


On May 2, 2008, at 1:13 PM, Tom Lane wrote:

I don't think you should figure on more than 1GB being
usefully available to Postgres, and you can't give all or even most of
that space to shared_buffers.



So how should I divide say a 512 MB between shared_buffers and, um,  
what else?  (new to pg tuning :)


I naively thought that if I have a 100,000,000 row table, of the form  
(integer,integer,smallint,date), and add a real coumn to it, it will  
scroll through the memory reasonably fast.  Yet when I had  
shared_buffers=128 MB, it was hanging there 8 hours before I killed  
it, and now with 1500MB is paging again for several hours with no end  
in sight.  Why can't it just add a column to a row at a time and be  
done with it soon enough? :)  It takes inordinately long compared to a  
FORTRAN or even python program and there's no index usage for this  
table, a sequential scan, why all the paging?


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Greg Smith

On Fri, 2 May 2008, Alexy Khrabrov wrote:

I have an UPDATE query updating a 100 million row table, and 
allocate enough memory via shared_buffers=1500MB.


In addition to reducing that as you've been advised, you'll probably need 
to increase checkpoint_segments significantly from the default (3) in 
order to get good performance on an update that large.  Something like 30 
would be a reasonable starting point.


I'd suggest doing those two things, seeing how things go, and reporting 
back if you still think performance is unacceptable.  We'd need to know 
your PostgreSQL version in order to really target future suggestions.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Tom Lane
"Scott Marlowe" <[EMAIL PROTECTED]> writes:
> On Fri, May 2, 2008 at 1:38 PM, Alexy Khrabrov <[EMAIL PROTECTED]> wrote:
>> I randomly increased values in postgresql.conf to
>> 
>> shared_buffers = 1500MB
>> max_fsm_pages = 200
>> max_fsm_relations = 1

> On a laptop with 2G ram, 1.5Gig shared buffers is probably WAY too high.

s/probably/definitely/, especially seeing that OS X is a bit of a memory
hog itself.  I don't think you should figure on more than 1GB being
usefully available to Postgres, and you can't give all or even most of
that space to shared_buffers.

> No, the background writer reads through the shared buffers for dirty
> ones and writes them out.  so, it's not really using MORE memory, it's
> just showing that it's attached to the ginormous shared_buffer pool
> you've set up.

Yeah.  You have to be aware of top's quirky behavior for shared memory:
on most platforms it will count the shared memory against *each*
process, but only as much of the shared memory as that process has
touched so far.  So over time the reported size of any PG process will
tend to climb to something over the shared memory size, but most of that
isn't "real".

I haven't directly checked whether OS X's top behaves that way, but
given your report I think it does.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Scott Marlowe
On Fri, May 2, 2008 at 1:38 PM, Alexy Khrabrov <[EMAIL PROTECTED]> wrote:
>
>
>  On May 2, 2008, at 12:30 PM, Scott Marlowe wrote:
>
>
> > On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov <[EMAIL PROTECTED]>
> wrote:
> >
> > > Greetings -- I have an UPDATE query updating a 100 million row table,
> and
> > > allocate enough memory via shared_buffers=1500MB.  However, I see two
> > > processes in top, the UPDATE process eating about 850 MB and the writer
> > > process eating about 750 MB.  The box starts paging.   Why is there the
> > > writer taking almost as much space as the UPDATE, and how can I shrink
> it?
> > >
> >
> > Shared_buffers is NOT the main memory pool for all operations in
> > pgsql, it is simply the buffer pool used to hold data being operated
> > on.
> >
> > Things like sorts etc. use other memory and can exhaust your machine.
> > However, I'd like to see the output of vmstat 1 or top while this is
> > happening.
> >
> > How much memory does this machine have?
> >
>
>  It's a 2GB RAM MacBook.  Here's the top for postgres
>
>  Processes:  117 total, 2 running, 6 stuck, 109 sleeping... 459 threads
> 12:34:27
>  Load Avg:  0.27,  0.24,  0.32CPU usage:  8.41% user, 11.06% sys, 80.53%
> idle
>  SharedLibs: num =   15, resident =   40M code, 2172K data, 3172K linkedit.
>  MemRegions: num = 20719, resident =  265M +   12M private, 1054M shared.
>  PhysMem:  354M wired, 1117M active,  551M inactive, 2022M used,   19M free.
>  VM: 26G + 373M   1176145(160) pageins, 1446482(2) pageouts
>
>   PID COMMAND  %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
>  51775 postgres 6.8%  2:40.16   1 9 39 1504K   896M   859M+
> 1562M
>  51767 postgres 0.0%  0:39.74   1 8 28  752K   896M   752M
> 1560M

SOME snipping here.

>  I randomly increased values in postgresql.conf to
>
>  shared_buffers = 1500MB
>  max_fsm_pages = 200
>  max_fsm_relations = 1

On a laptop with 2G ram, 1.5Gig shared buffers is probably WAY too high.

>  Should I set the background writer parameters somehow to decrease the RAM
> consumed by the writer?

No, the background writer reads through the shared buffers for dirty
ones and writes them out.  so, it's not really using MORE memory, it's
just showing that it's attached to the ginormous shared_buffer pool
you've set up.

Lower your shared_buffers to about 512M or so and see how it works.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Alexy Khrabrov


On May 2, 2008, at 12:30 PM, Scott Marlowe wrote:

On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov  
<[EMAIL PROTECTED]> wrote:
Greetings -- I have an UPDATE query updating a 100 million row  
table, and

allocate enough memory via shared_buffers=1500MB.  However, I see two
processes in top, the UPDATE process eating about 850 MB and the  
writer
process eating about 750 MB.  The box starts paging.   Why is there  
the
writer taking almost as much space as the UPDATE, and how can I  
shrink it?


Shared_buffers is NOT the main memory pool for all operations in
pgsql, it is simply the buffer pool used to hold data being operated
on.

Things like sorts etc. use other memory and can exhaust your machine.
However, I'd like to see the output of vmstat 1 or top while this is
happening.

How much memory does this machine have?


It's a 2GB RAM MacBook.  Here's the top for postgres

Processes:  117 total, 2 running, 6 stuck, 109 sleeping... 459  
threads 
  12 
:34:27
Load Avg:  0.27,  0.24,  0.32CPU usage:  8.41% user, 11.06% sys,  
80.53% idle
SharedLibs: num =   15, resident =   40M code, 2172K data, 3172K  
linkedit.

MemRegions: num = 20719, resident =  265M +   12M private, 1054M shared.
PhysMem:  354M wired, 1117M active,  551M inactive, 2022M used,   19M  
free.

VM: 26G + 373M   1176145(160) pageins, 1446482(2) pageouts

  PID COMMAND  %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD   
RSIZE  VSIZE
51775 postgres 6.8%  2:40.16   1 9 39 1504K   896M   859M+  
1562M
51767 postgres 0.0%  0:39.74   1 8 28  752K   896M   752M   
1560M


the first is the UPDATE, the second is the writer.

The query is very simple,

netflix=> create index movs_mid_idx on movs(mid);
CREATE INDEX
netflix=> update ratings set offset1=avg-rating from movs where  
mid=movie_id;


where the table ratings has about 100 million rows, movs has about  
20,000.


I randomly increased values in postgresql.conf to

shared_buffers = 1500MB
max_fsm_pages = 200
max_fsm_relations = 1

Should I set the background writer parameters somehow to decrease the  
RAM consumed by the writer?


Cheers,
Alexy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Scott Marlowe
On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov <[EMAIL PROTECTED]> wrote:
> Greetings -- I have an UPDATE query updating a 100 million row table, and
> allocate enough memory via shared_buffers=1500MB.  However, I see two
> processes in top, the UPDATE process eating about 850 MB and the writer
> process eating about 750 MB.  The box starts paging.   Why is there the
> writer taking almost as much space as the UPDATE, and how can I shrink it?

Shared_buffers is NOT the main memory pool for all operations in
pgsql, it is simply the buffer pool used to hold data being operated
on.

Things like sorts etc. use other memory and can exhaust your machine.
However, I'd like to see the output of vmstat 1 or top while this is
happening.

How much memory does this machine have?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance