Re: [GENERAL] Tuning Postgres for single user manipulating large amounts of data

2010-12-10 Thread Reid Thompson
On Thu, 2010-12-09 at 17:40 +, Paul Taylor wrote:
 what 
 have I got to be careful of. 

I think that was in reference to turning fsync off, not work_mem values.


Re: [GENERAL] Tuning Postgres for single user manipulating large amounts of data

2010-12-10 Thread tv
 Hi ( sorry for the double posting, thought Id use the wrong email
 address but both have been posted anyway). As far as the db is concerned
 Im just reading data then writing the data to a lucene search index (which
 is outside of the database) , but my labtop is jut a test machine I want
 to run the same code on production. Why would work_mem not be safe at 64MB
 if I have 2GB of memory - what have I got to be careful of.

 Paul

Well, the problem with work_mem is that it's 'per operation' so a query
may actually need several work_mem segments. And it's not just sorting, a
query containing a hash join, hash aggregate and a sort may consume up to
3x work_mem memory.

And if you have a lot of concurrent users running such queries, you may
easily run out of memory - in that case the feared OOM killer comes and
kills one of the processes (usually postmaster, which means the database
goes bottoms up). Not sure how OOM works on MacOS.

But as you said there will be single user running queries on the database,
you can set the work_mem limit pretty high. Depends on the queries though
- a complicated query may consume a lot of memory.

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tuning Postgres for single user manipulating large amounts of data

2010-12-09 Thread Andy Colson

On 12/9/2010 6:25 AM, Paul Taylor wrote:

Hi, Im using Postgres 8.3 on a Macbook Pro Labtop.
I using the database with just one db connection to build a lucene
search index from some of the data, and Im trying to improve
performance. The key thing is that I'm only a single user but
manipulating large amounts of data , i.e processing tables with upto 10
million rows in them, so I think want to configure Postgres so that it
can create large temporary tables in memory

I've tried changes various parameters such as shared_buffers, work_mem
and checkpoint_segments but I don't really understand what they values
are, and the documentation seems to be aimed towards configuring for
multiple users, and my changes make things worse. For example my machine
has 2GB of memory and I read if using as a dedicated server you should
set shared memory to 40% of total memory, but when I increase to more
than 30MB Postgres will not start complaining about my SHMMAX limit.

Paul



You need to bump up your SHMMAX is your OS.  I'm sure google knows how 
to do it.  (in linux use sysctl, so it may be similar in macos).


checkpoint_segments: I've bumped them up to 10, but only when inserting 
a huge amount of data, not sure how much it'll help otherwise.


shared_buffers: this is the big one.  Set it big, 1G maybe

work_mem: this is for temp work a query might need to do, like sorting, 
merging, etc.  A big value (100Meg or so) would be ok.  Its Per User, 
but since there is only one of you, splurge.


There is also an effective_cache_size (or something like that): its the 
amount of memory PG can assume is being used for disk cache.  Its not 
something that'll be allocated.  So you have 2G, 1G for PG, 300Meg for 
os and other stuff, so 700Meg for effective_cache_size?


In Linux I use free to see how much is being used for disk cache, and 
set it to that.



-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tuning Postgres for single user manipulating large amounts of data

2010-12-09 Thread Andy Colson

On 12/9/2010 8:50 AM, Andy Colson wrote:

On 12/9/2010 6:25 AM, Paul Taylor wrote:

Hi, Im using Postgres 8.3 on a Macbook Pro Labtop.
I using the database with just one db connection to build a lucene
search index from some of the data, and Im trying to improve
performance. The key thing is that I'm only a single user but
manipulating large amounts of data , i.e processing tables with upto 10
million rows in them, so I think want to configure Postgres so that it
can create large temporary tables in memory

I've tried changes various parameters such as shared_buffers, work_mem
and checkpoint_segments but I don't really understand what they values
are, and the documentation seems to be aimed towards configuring for
multiple users, and my changes make things worse. For example my machine
has 2GB of memory and I read if using as a dedicated server you should
set shared memory to 40% of total memory, but when I increase to more
than 30MB Postgres will not start complaining about my SHMMAX limit.

Paul



You need to bump up your SHMMAX is your OS.


sorry: SHMMAX _in_ your OS.

its an OS setting not a PG one.

-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tuning Postgres for single user manipulating large amounts of data

2010-12-09 Thread Reid Thompson
On 12/09/2010 09:59 AM, Andy Colson wrote:
 On 12/9/2010 8:50 AM, Andy Colson wrote:
 On 12/9/2010 6:25 AM, Paul Taylor wrote:

 You need to bump up your SHMMAX is your OS.
 
 sorry: SHMMAX _in_ your OS.
 
 its an OS setting not a PG one.
 
 -Andy
 
 
scroll down to the section on OSX
http://developer.postgresql.org/pgdocs/postgres/kernel-resources.html

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tuning Postgres for single user manipulating large amounts of data

2010-12-09 Thread tv
 Hi, Im using Postgres 8.3 on a Macbook Pro Labtop.
 I using the database with just one db connection to build a lucene
 search index from some of the data, and Im trying to improve
 performance. The key thing is that I'm only a single user but
 manipulating large amounts of data , i.e processing tables with upto 10
 million rows in them, so I think want to configure Postgres so that it
 can create large temporary tables in memory

I'm not sure what exactly you mean by manipulating data - does that mean
reading or writing?

I'd definitely increase shared_buffers, work_mem (don't be afraid to set
work_mem to say 32MB or 64MB - this should be safe with a single
connection, although it depends on the queries). To improve writes,
increase checkpoint_segments etc.

If you really don't need extra safety - e.g. if you have all the data
backed up and just need to run some ad-hoc analysis (and it does not
matter if it crashes as you can recover it from backup), you can disable
fsync. This will make writes much faster, but it won't be safe in case of
crash.

DON'T DO THIS IF YOU NEED TO KEEP YOUR DATA SAFE!

But as you're runninng the app on your laptop, I guess you can live with
frync=off. You'll loose the consistency but you'll get better performance.

 Ive tried changes various paramters such as shared_buffers, work_mem and
 checkpoint_segments but I don't really understand what they values are,
 and the documentation seems to be aimed towards configuring for multiple
 users, and my changes make things worse. For example my machine has 2GB
 of memory and I read if using as a dedicated server you should set
 shared memory to 40% of total memory, but when I increase to more than
 30MB Postgres will not start complaining about my SHMMAX limit.

You're heading in the right direction I think, but you're hitting kernel
limits. A process can't allocate more shared memory (shared buffers) than
SHMMAX limit, so you need to bump this up.

See this - http://www.postgresql.org/docs/9.0/static/kernel-resources.html
There's even a section for MacOS X (which is the OS you're running, I
guess).

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tuning Postgres for single user manipulating large amounts of data

2010-12-09 Thread Paul Taylor

On 09/12/2010 15:12, Reid Thompson wrote:

On 12/09/2010 09:59 AM, Andy Colson wrote:

On 12/9/2010 8:50 AM, Andy Colson wrote:

On 12/9/2010 6:25 AM, Paul Taylor wrote:
You need to bump up your SHMMAX is your OS.

sorry: SHMMAX _in_ your OS.

its an OS setting not a PG one.

-Andy



scroll down to the section on OSX
http://developer.postgresql.org/pgdocs/postgres/kernel-resources.html

Thanks guys, but one think I dont get is why does setting shared_buffers 
to 40mb break the kernel limit, I mean 40 mb doesnt sound like very much 
at all


Paul

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tuning Postgres for single user manipulating large amounts of data

2010-12-09 Thread Reid Thompson
On 12/09/2010 12:36 PM, Paul Taylor wrote:
 On 09/12/2010 15:12, Reid Thompson wrote:
 On 12/09/2010 09:59 AM, Andy Colson wrote:
 On 12/9/2010 8:50 AM, Andy Colson wrote:
 On 12/9/2010 6:25 AM, Paul Taylor wrote:
 You need to bump up your SHMMAX is your OS.
 sorry: SHMMAX _in_ your OS.

 its an OS setting not a PG one.

 -Andy


 scroll down to the section on OSX
 http://developer.postgresql.org/pgdocs/postgres/kernel-resources.html

 Thanks guys, but one think I dont get is why does setting shared_buffers to 
 40mb break the kernel limit, I mean 40 mb doesnt sound
 like very much at all
 
 Paul

It's not -- from the same page (near the top)
17.4.1. Shared Memory and Semaphores

Shared memory and semaphores are collectively referred to as System V IPC 
(together with message queues, which are not relevant
for PostgreSQL). Almost all modern operating systems provide these features, 
but many of them don't have them turned on or
sufficiently sized by default, especially as available RAM and the demands of 
database applications grow.

and/but most of these system defaults originated when system RAM availability 
was much smaller

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tuning Postgres for single user manipulating large amounts of data

2010-12-09 Thread Paul Taylor

On 09/12/2010 15:51, t...@fuzzy.cz wrote:

Hi, Im using Postgres 8.3 on a Macbook Pro Labtop.
I using the database with just one db connection to build a lucene
search index from some of the data, and Im trying to improve
performance. The key thing is that I'm only a single user but
manipulating large amounts of data , i.e processing tables with upto 10
million rows in them, so I think want to configure Postgres so that it
can create large temporary tables in memory

I'm not sure what exactly you mean by manipulating data - does that mean
reading or writing?

I'd definitely increase shared_buffers, work_mem (don't be afraid to set
work_mem to say 32MB or 64MB - this should be safe with a single
connection, although it depends on the queries). To improve writes,
increase checkpoint_segments etc.

If you really don't need extra safety - e.g. if you have all the data
backed up and just need to run some ad-hoc analysis (and it does not
matter if it crashes as you can recover it from backup), you can disable
fsync. This will make writes much faster, but it won't be safe in case of
crash.

DON'T DO THIS IF YOU NEED TO KEEP YOUR DATA SAFE!

But as you're runninng the app on your laptop, I guess you can live with
frync=off. You'll loose the consistency but you'll get better performance.

Hi ( sorry for the double posting, thought Id use the wrong email 
address but both have been posted anyway).
As far as the db is concerned Im just reading data then writing the data 
to a lucene search index (which is outside of the database) , but my 
labtop is jut a test machine I want to run the same code on production.
Why would work_mem not be safe at 64MB if I have 2GB of memory - what 
have I got to be careful of.


Paul

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tuning Postgres for single user manipulating large amounts of data

2010-12-09 Thread Scott Marlowe
On Thu, Dec 9, 2010 at 5:25 AM, Paul Taylor ij...@fastmail.fm wrote:
 Hi, Im using Postgres 8.3 on a Macbook Pro Labtop.
 I using the database with just one db connection to build a lucene search
 index from some of the data, and Im trying to improve performance. The key
 thing is that I'm only a single user but manipulating large amounts of data
 , i.e processing tables with upto 10 million rows in them, so I think want
 to configure Postgres so that it can create large temporary tables in memory

 I've tried changes various parameters such as shared_buffers, work_mem and
 checkpoint_segments but I don't really understand what they values are, and
 the documentation seems to be aimed towards configuring for multiple users,
 and my changes make things worse. For example my machine has 2GB of memory
 and I read if using as a dedicated server you should set shared memory to
 40% of total memory, but when I increase to more than 30MB Postgres will not
 start complaining about my SHMMAX limit.

So you're pretty much batch processing.  Not Postgresql's strongest
point.  But we'll see what we can do.  Large shared buffers aren't
gonna help a lot here, since your OS will be caching files as well,
and you've only got one process running.  You do want a large enough
shared_buffer to hold everything you're working on at one point in
time, so getting it into the hundred or so megabyte range will likely
help.  After that you'll be stealing memory that could be used for OS
caching or work_mem, so don't go crazy, especially on a machine with
only 2 Gigs ram.  Note I just picked up 8 gigs of DDR3 ram for $99 on
newegg, so if you MBP can handle more memory, now's the time to
splurge.

Crank up work_mem to something pretty big, in the 60 to 200 meg range.
 note that work_mem is PER sort, not total or per connection.  So if
your single user runs a query with three sorts, it could use 3x
work_mem.  Once it allocates too much memory your machine will start
swapping and slow to a crawl.  So don't overshoot.

Assuming you can recreate your db should things go horribly wrong, you
can turn off fsync.  Also crank up WAL segments to 32 or 64 or so.

Make sure accesstime updates are turned off for the file system.
(noatime mount option).

-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general