Re: [PERFORM] Performance Woes

2007-05-09 Thread Tom Lane
Jeff Davis <[EMAIL PROTECTED]> writes:
> On Wed, 2007-05-09 at 17:29 -0700, Joshua D. Drake wrote:
>> Sounds to me like you just need to up the total amount of open files 
>> allowed by the operating system.

> It looks more like the opposite, here's the docs for
> max_files_per_process:

I think Josh has got the right advice.  The manual is just saying that
you can reduce max_files_per_process to avoid the failure, but it's not
making any promises about the performance penalty for doing that.
Apparently Ralph's app needs a working set of between 800 and 1000 open
files to have reasonable performance.

> That is a lot of tables. Maybe a different OS will handle it better?
> Maybe there's some way that you can use fewer connections and then the
> OS could still handle it?

Also, it might be worth rethinking the database structure to reduce the
number of tables.  But for a quick-fix, increasing the kernel limit
seems like the easiest answer.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance Woes

2007-05-09 Thread Alvaro Herrera
Ralph Mason wrote:

> I have several databases.  They are each about 35gb in size and have about
> 10.5K relations (count from pg_stat_all_tables) in them.  Pg_class is about
> 26k rows and the data directory contains about 70k files.  These are busy
> machines, they run about 50 xactions per second, ( aproxx insert / update /
> delete about 500 rows per second).

Is it always the same trigger the problematic one?  Is it just PL/pgSQL,
or do you have something else?  Something that may be trying to open
additional files for example?  Something that may be trying to open
files behind your back?  PL/Perl with funky operators or temp files?

Also, what PG version is this?

> So  we decreased the max_files_per_process to 800.  This took care of the
> error *BUT* about quadrupled  the IO wait that is happening on the machine.
> It went from a peek of about 50% to peeks of over 200% (4 processor
> machines, 4 gigs ram, raid).  The load on the machine remained constant.

The max_files_per_process settings controls how many actual file
descriptors each process is allowed to have.  Postgres uses internally a
"virtual file descriptor", which normally have one file descriptor open
each.  However, if your transactions need to access lots of files, the
VFDs will close the kernel FDs to allow other VFDs to open theirs.

So it sounds like your transaction has more than 800 files open.  The
extra IO wait could be caused by the additional system calls to open and
close those files as needed.  I would actually expect it to cause extra
"system" load (as opposed to "user") rather than IO, but I'm not sure.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance Woes

2007-05-09 Thread Scott Mohekey

Just adding a bit of relevant information:

We have the kernel file-max setting set to 297834 (256 per 4mb of ram).

/proc/sys/fs/file-nr tells us that we have roughly 13000 allocated handles
of which zero are always free.



On 10/05/07, Jeff Davis <[EMAIL PROTECTED]> wrote:


On Wed, 2007-05-09 at 17:29 -0700, Joshua D. Drake wrote:
> > 2007-05-09 03:07:50.083 GMT 1146975740: LOCATION:  BasicOpenFile,
> > fd.c:471
> >
> > 2007-05-09 03:07:50.091 GMT 0: LOG:  0: duration: 12.362 ms
> >
> > 2007-05-09 03:07:50.091 GMT 0: LOCATION:  exec_simple_query,
> > postgres.c:1090
> >
> >
> >
> > So  we decreased the max_files_per_process to 800.  This took care
> > of the error **BUT** about quadrupled  the IO wait that is
happening
> > on the machine. It went from a peek of about 50% to peeks of over
> > 200% (4 processor machines, 4 gigs ram, raid).  The load on the
> > machine remained constant.
> >
>
> Sounds to me like you just need to up the total amount of open files
> allowed by the operating system.

It looks more like the opposite, here's the docs for
max_files_per_process:

"Sets the maximum number of simultaneously open files allowed to each
server subprocess. The default is one thousand files. If the kernel is
enforcing a safe per-process limit, you don't need to worry about this
setting. But on some platforms (notably, most BSD systems), the kernel
will allow individual processes to open many more files than the system
can really support when a large number of processes all try to open that
many files. If you find yourself seeing "Too many open files" failures,
try reducing this setting. This parameter can only be set at server
start."

To me, that means that his machine is allowing the new FD to be created,
but then can't really support that many so it gives an error.

Ralph, how many connections do you have open at once? It seems like the
machine perhaps just can't handle that many FDs in all of those
processes at once.

That is a lot of tables. Maybe a different OS will handle it better?
Maybe there's some way that you can use fewer connections and then the
OS could still handle it?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster





--
Scott Mohekey
Systems Administrator
Telogis
Intelligent Location Technologies

NOTICE:
This message (including any attachments) contains CONFIDENTIAL INFORMATION
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message and are
hereby notified that any disclosure, copying, or distribution of this
message, or the taking of any action based on it, is strictly prohibited


Re: [PERFORM] Performance Woes

2007-05-09 Thread Ralph Mason
>To me, that means that his machine is allowing the new FD to be created,
>but then can't really support that many so it gives an error.

files-max is 297834
ulimit is 100

(doesn't make sense but there you go)

What I don’t really understand is with max_files_per_process at 800 we don't
get the problem, but with 1000 we do.

$lsof | wc -l 
14944


$cat /proc/sys/fs/file-nr 
12240   0   297834


>Ralph, how many connections do you have open at once? It seems like the
>machine perhaps just can't handle that many FDs in all of those
>processes at once.

There are only 30 connections - of those probably only 10 are really active.
It doesn't seem like we should be stressing this machine/


>That is a lot of tables. Maybe a different OS will handle it better?
>Maybe there's some way that you can use fewer connections and then the
>OS could still handle it?

It would be less but then you can't maintain the db b/c of the constant
vacuuming needed :-(  I think the linux folks would get up in arms if you
told them they couldn't handle that many open files ;-)

Thanks,
Ralph

-- 
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 05/12/2006
16:07
 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance Woes

2007-05-09 Thread Jeff Davis
On Wed, 2007-05-09 at 17:29 -0700, Joshua D. Drake wrote:
> > 2007-05-09 03:07:50.083 GMT 1146975740: LOCATION:  BasicOpenFile,
> > fd.c:471
> > 
> > 2007-05-09 03:07:50.091 GMT 0: LOG:  0: duration: 12.362 ms
> > 
> > 2007-05-09 03:07:50.091 GMT 0: LOCATION:  exec_simple_query,
> > postgres.c:1090
> > 
> >  
> > 
> > So  we decreased the max_files_per_process to 800.  This took care
> > of the error **BUT** about quadrupled  the IO wait that is happening
> > on the machine. It went from a peek of about 50% to peeks of over
> > 200% (4 processor machines, 4 gigs ram, raid).  The load on the
> > machine remained constant.
> > 
> 
> Sounds to me like you just need to up the total amount of open files 
> allowed by the operating system.

It looks more like the opposite, here's the docs for
max_files_per_process:

"Sets the maximum number of simultaneously open files allowed to each
server subprocess. The default is one thousand files. If the kernel is
enforcing a safe per-process limit, you don't need to worry about this
setting. But on some platforms (notably, most BSD systems), the kernel
will allow individual processes to open many more files than the system
can really support when a large number of processes all try to open that
many files. If you find yourself seeing "Too many open files" failures,
try reducing this setting. This parameter can only be set at server
start."

To me, that means that his machine is allowing the new FD to be created,
but then can't really support that many so it gives an error.

Ralph, how many connections do you have open at once? It seems like the
machine perhaps just can't handle that many FDs in all of those
processes at once.

That is a lot of tables. Maybe a different OS will handle it better?
Maybe there's some way that you can use fewer connections and then the
OS could still handle it?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Woes

2007-05-09 Thread Joshua D. Drake

Hello,

You likely need to increase your file-max parameters using sysctl.conf.

Sincerely,

Joshua D. Drake
--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performance Woes

2007-05-09 Thread Ralph Mason
 

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of CAJ CAJ
Sent: 10 May 2007 12:26
To: Ralph Mason
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance Woes

 

 

I have several databases.  They are each about 35gb in size and have about
10.5K relations (count from pg_stat_all_tables) in them.  Pg_class is about
26k rows and the data directory contains about 70k files.  These are busy
machines, they run about 50 xactions per second, ( aproxx insert / update /
delete about 500 rows per second).

We started getting errors about the number of open file descriptors

: 2007-05-09 03:07:50.083 GMT 1146975740: LOG:  53000: out of file
descriptors: Too many open files; release and retry

2007-05-09 03:07:50.083 GMT 1146975740: CONTEXT:  SQL statement "insert …..
"

PL/pgSQL function "trigfunc_whatever" line 50 at execute statement

2007-05-09 03:07:50.083 GMT 1146975740: LOCATION:  BasicOpenFile, fd.c:471

2007-05-09 03:07:50.091 GMT 0: LOG:  0: duration: 12.362 ms

2007-05-09 03:07:50.091 GMT 0: LOCATION:  exec_simple_query, postgres.c:1090

 

So  we decreased the max_files_per_process to 800.  This took care of the
error *BUT* about quadrupled  the IO wait that is happening on the machine.
It went from a peek of about 50% to peeks of over 200% (4 processor
machines, 4 gigs ram, raid).  The load on the machine remained constant.

>What version of Pg/OS? What is your hardware config? 

>I had seen these errors with earlier versions of Pg 7.4.x which was fixed
in later releases according to the changelogs 



"PostgreSQL 8.1.4 on x86_64-redhat-linux-gnu, compiled by GCC
x86_64-redhat-linux-gcc (GCC) 4.1.0 20060304 (Red Hat 4.1.0-3)"

 

su postgres -c 'ulimit -a'

 

core file size  (blocks, -c) 0

data seg size   (kbytes, -d) unlimited

max nice(-e) 0

file size   (blocks, -f) unlimited

pending signals (-i) 49152

max locked memory   (kbytes, -l) 32

max memory size (kbytes, -m) unlimited

open files  (-n) 100

pipe size(512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

max rt priority (-r) 0

stack size  (kbytes, -s) 10240

cpu time   (seconds, -t) unlimited

max user processes  (-u) 49152

virtual memory  (kbytes, -v) unlimited

file locks  (-x) unlimited

file locks  (-x) unlimited

Seems like I should be able to use lots and lots of open files.

Machines are quad processor opterons, 4gb ram with raid 5 data and logging
to a raid 0.

Ralph

 


-- 
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 05/12/2006
16:07
 


Re: [PERFORM] Performance Woes

2007-05-09 Thread Joshua D. Drake



2007-05-09 03:07:50.083 GMT 1146975740: LOCATION:  BasicOpenFile,
fd.c:471

2007-05-09 03:07:50.091 GMT 0: LOG:  0: duration: 12.362 ms

2007-05-09 03:07:50.091 GMT 0: LOCATION:  exec_simple_query,
postgres.c:1090

 


So  we decreased the max_files_per_process to 800.  This took care
of the error **BUT** about quadrupled  the IO wait that is happening
on the machine. It went from a peek of about 50% to peeks of over
200% (4 processor machines, 4 gigs ram, raid).  The load on the
machine remained constant.



Sounds to me like you just need to up the total amount of open files 
allowed by the operating system.


Joshua D. Drake

--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance Woes

2007-05-09 Thread CAJ CAJ

I have several databases.  They are each about 35gb in size and have about
10.5K relations (count from pg_stat_all_tables) in them.  Pg_class is
about 26k rows and the data directory contains about 70k files.  These are
busy machines, they run about 50 xactions per second, ( aproxx insert /
update / delete about 500 rows per second).

We started getting errors about the number of open file descriptors

: 2007-05-09 03:07:50.083 GMT 1146975740: LOG:  53000: out of file
descriptors: Too many open files; release and retry

2007-05-09 03:07:50.083 GMT 1146975740: CONTEXT:  SQL statement "insert
….. "

PL/pgSQL function "trigfunc_whatever" line 50 at execute statement

2007-05-09 03:07:50.083 GMT 1146975740: LOCATION:  BasicOpenFile, fd.c:471

2007-05-09 03:07:50.091 GMT 0: LOG:  0: duration: 12.362 ms

2007-05-09 03:07:50.091 GMT 0: LOCATION:  exec_simple_query, postgres.c
:1090



So  we decreased the max_files_per_process to 800.  This took care of the
error **BUT** about quadrupled  the IO wait that is happening on the
machine. It went from a peek of about 50% to peeks of over 200% (4 processor
machines, 4 gigs ram, raid).  The load on the machine remained constant.




What version of Pg/OS? What is your hardware config?

I had seen these errors with earlier versions of Pg 7.4.x which was fixed in
later releases according to the changelogs


[PERFORM] Performance Woes

2007-05-09 Thread Ralph Mason
Hi,

 

I have several databases.  They are each about 35gb in size and have about
10.5K relations (count from pg_stat_all_tables) in them.  Pg_class is about
26k rows and the data directory contains about 70k files.  These are busy
machines, they run about 50 xactions per second, ( aproxx insert / update /
delete about 500 rows per second).

 

We started getting errors about the number of open file descriptors

 

: 2007-05-09 03:07:50.083 GMT 1146975740: LOG:  53000: out of file
descriptors: Too many open files; release and retry

2007-05-09 03:07:50.083 GMT 1146975740: CONTEXT:  SQL statement "insert …..
"

PL/pgSQL function "trigfunc_whatever" line 50 at execute statement

2007-05-09 03:07:50.083 GMT 1146975740: LOCATION:  BasicOpenFile, fd.c:471

2007-05-09 03:07:50.091 GMT 0: LOG:  0: duration: 12.362 ms

2007-05-09 03:07:50.091 GMT 0: LOCATION:  exec_simple_query, postgres.c:1090

 

So  we decreased the max_files_per_process to 800.  This took care of the
error *BUT* about quadrupled  the IO wait that is happening on the machine.
It went from a peek of about 50% to peeks of over 200% (4 processor
machines, 4 gigs ram, raid).  The load on the machine remained constant.

 

I am really to get an understanding of exactly what this setting is and
‘what’ is out of file descriptors and how I can fix that.  I need to bring
that IO back down.

 

 

Thanks for any help.

Ralph

 

 

 


-- 
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 05/12/2006
16:07