Re: [HACKERS] Performance monitor signal handler

2001-03-22 Thread Jan Wieck

Bruce Momjian wrote:
> I have talked to Jan over the phone, and he has convinced me that UDP is
> the proper way to communicate stats to the collector, rather than my
> shared memory idea.
>
> The advantages of his UDP approach is that the collector can sleep on
> the UDP socket rather than having the collector poll the shared memory
> area.  It also has the auto-discard option.  He will make logging
> configurable on a per-database level, so it can be turned off when not
> in use.
>
> He has a trial UDP implementation that he will post soon.  Also, I asked
> him to try DGRAM Unix-domain sockets for performance reasons.  My
> Steven's book says it they should be supported.  He can put the socket
> file in /data.

"Trial" implementation attached :-)

First  attachment  is  a patch for various backend files plus
generating two new source files. If your patch(1) doesn't put
'em   automatically,  they  go  to  src/include/pgstat.h  and
src/backend/postmaster/pgstat.c.

BTW:  tgl  on  2/99  was  right,  the  hash_destroy()  really
crashes.  Maybe  we  want  to  pull  out  the  fix  I've done
(includes some new feature for hash table memory  allocation)
and apply that to 7.1?

Second   attachment  is  a  tarfile  that  should  unpack  to
contrib/pgstat_tmp.  I've placed the SQL level functions into
a shared module for now. The sql script also creates a couple
of views.

-   pgstat_all_tables shows scan- and tuple based  statistics
for all tables.  pgstat_sys_tables and pgstat_user_tables
filter out (you guess what) system or user tables.

-   pgstatio_all_tables,   pgstatio_sys_tablesand
pgstatio_user_tables   show   buffer  IO  statistics  for
tables.

-   pgstat_*_indexes and pgstatio_*_indexes are similar  like
the  above,  just that they give detailed info about each
single index.

-   pgstatio_*_sequences shows buffer IO statistics  about  -
right,   sequences.Since   sequences  aren't  scanned
regularely, they have no scan- and tuple related view.

-   pgstat_activity shows informations  about  all  currently
running  backends  of the entire instance. The underlying
function for displaying the  actual  query  returns  NULL
allways for non-superusers.

-   pgstat_database shows transaction commit/abort counts and
cumulated  buffer  IO   statistics   for   all   existing
databases.

The  collector  writes  frequently  a  file  data/pgstat.stat
(approx. every 500 milliseconds as long as there is something
to  tell,  so  nothing  is  done  if  the entire installation
sleeps). He also reads this file  on  startup,  so  collected
statistics survive postmaster restarts.

TODO:

-   Are  PF_UNIX  SOCK_DGRAM  sockets  supported  on  all the
platforms we do?  If not, what's wrong with  the  current
implementation?

-   There  is  no way yet to tell the collector about objects
(relations and  databases)  removed  from  the  database.
Basically  that  could be done with messages too, but who
will send them and how can we guarantee that  they'll  be
generated  even if somebody never queries the statistics?
Thus, the current collector will grow, and grow, and grow
until   you   remove   the  pgstat.stat  file  while  the
postmaster is down.

-   Also there aren't functions or  messages  implemented  to
explicitly reset statistics.

-   Possible additions would be to remember when the backends
started and collect resource usage (rstat(2)) information
as well.

-   The   entire  thing  needs  an  additional  attribute  in
pg_database that tells the  backends  what  to  tell  the
collector at all. Just to make them quiet again.

So far for an actual snapshot. Comments?


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #



 pgstat_tmp.tar.gz
 pgstat.diff.gz


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-20 Thread Bruce Momjian

I have talked to Jan over the phone, and he has convinced me that UDP is
the proper way to communicate stats to the collector, rather than my
shared memory idea.

The advantages of his UDP approach is that the collector can sleep on
the UDP socket rather than having the collector poll the shared memory
area.  It also has the auto-discard option.  He will make logging
configurable on a per-database level, so it can be turned off when not
in use.

He has a trial UDP implementation that he will post soon.  Also, I asked
him to try DGRAM Unix-domain sockets for performance reasons.  My
Steven's book says it they should be supported.  He can put the socket
file in /data.



> > > I figured it could just wake up every few seconds and check.  It will
> > > remember the loop counter and current pointer, and read any new
> > > information.  I was thinking of a 20k buffer, which could cover about 4k
> > > events.
> > 
> > Here  I  wonder what your EVENT is. With an Oid as identifier
> > and a 1 byte (even if it'd be anoter 32-bit value), how  many
> > messages do you want to generate to get these statistics:
> > 
> > -   Number of sequential scans done per table.
> > -   Number of tuples returned via sequential scans per table.
> > -   Number of buffer cache lookups  done  through  sequential
> > scans per table.
> > -   Number  of  buffer  cache  hits  for sequential scans per
> > table.
> > -   Number of tuples inserted per table.
> > -   Number of tuples updated per table.
> > -   Number of tuples deleted per table.
> > -   Number of index scans done per index.
> > -   Number of index tuples returned per index.
> > -   Number of buffer cache lookups  done  due  to  scans  per
> > index.
> > -   Number of buffer cache hits per index.
> > -   Number  of  valid heap tuples returned via index scan per
> > index.
> > -   Number of buffer cache lookups done for heap fetches  via
> > index scan per index.
> > -   Number  of  buffer  cache hits for heap fetches via index
> > scan per index.
> > -   Number of buffer cache lookups not accountable for any of
> > the above.
> > -   Number  of  buffer  cache hits not accountable for any of
> > the above.
> > 
> > What I see is that there's a difference in what we  two  want
> > to see in the statistics. You're talking about looking at the
> > actual querystring and such. That's  information  useful  for
> > someone   actually  looking  at  a  server,  to  see  what  a
> > particular backend  is  doing.  On  my  notebook  a  parallel
> > regression  test  (containing >4,000 queries) passes by under
> > 1:30, that's more than 40 queries per second. So that doesn't
> > tell me much.
> > 
> > What I'm after is to collect the above data over a week or so
> > and then generate a report to identify the hot spots  of  the
> > schema.  Which tables/indices cause the most disk I/O, what's
> > the average percentage of tuples returned in scans (not  from
> > the  query, I mean from the single scan inside of the joins).
> > That's the information I need  to  know  where  to  look  for
> > possibly  better  qualifications, useless indices that aren't
> > worth to maintain and the like.
> > 
> 
> I was going to have the per-table stats insert a stat record every time
> it does a sequential scan, so it sould be [oid][sequential_scan_value]
> and allow the collector to gather that and aggregate it.
> 
> I didn't think we wanted each backend to do the aggregation per oid. 
> Seems expensive. Maybe we would need a count for things like "number of
> rows returned" so it would be [oid][stat_type][value].
> 
> -- 
>   Bruce Momjian|  http://candle.pha.pa.us
>   [EMAIL PROTECTED]   |  (610) 853-3000
>   +  If your life is a hard drive, |  830 Blythe Avenue
>   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to [EMAIL PROTECTED] so that your
> message can get through to the mailing list cleanly
> 


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Performance monitor signal handler

2001-03-19 Thread Bruce Momjian

> > I figured it could just wake up every few seconds and check.  It will
> > remember the loop counter and current pointer, and read any new
> > information.  I was thinking of a 20k buffer, which could cover about 4k
> > events.
> 
> Here  I  wonder what your EVENT is. With an Oid as identifier
> and a 1 byte (even if it'd be anoter 32-bit value), how  many
> messages do you want to generate to get these statistics:
> 
> -   Number of sequential scans done per table.
> -   Number of tuples returned via sequential scans per table.
> -   Number of buffer cache lookups  done  through  sequential
> scans per table.
> -   Number  of  buffer  cache  hits  for sequential scans per
> table.
> -   Number of tuples inserted per table.
> -   Number of tuples updated per table.
> -   Number of tuples deleted per table.
> -   Number of index scans done per index.
> -   Number of index tuples returned per index.
> -   Number of buffer cache lookups  done  due  to  scans  per
> index.
> -   Number of buffer cache hits per index.
> -   Number  of  valid heap tuples returned via index scan per
> index.
> -   Number of buffer cache lookups done for heap fetches  via
> index scan per index.
> -   Number  of  buffer  cache hits for heap fetches via index
> scan per index.
> -   Number of buffer cache lookups not accountable for any of
> the above.
> -   Number  of  buffer  cache hits not accountable for any of
> the above.
> 
> What I see is that there's a difference in what we  two  want
> to see in the statistics. You're talking about looking at the
> actual querystring and such. That's  information  useful  for
> someone   actually  looking  at  a  server,  to  see  what  a
> particular backend  is  doing.  On  my  notebook  a  parallel
> regression  test  (containing >4,000 queries) passes by under
> 1:30, that's more than 40 queries per second. So that doesn't
> tell me much.
> 
> What I'm after is to collect the above data over a week or so
> and then generate a report to identify the hot spots  of  the
> schema.  Which tables/indices cause the most disk I/O, what's
> the average percentage of tuples returned in scans (not  from
> the  query, I mean from the single scan inside of the joins).
> That's the information I need  to  know  where  to  look  for
> possibly  better  qualifications, useless indices that aren't
> worth to maintain and the like.
> 

I was going to have the per-table stats insert a stat record every time
it does a sequential scan, so it sould be [oid][sequential_scan_value]
and allow the collector to gather that and aggregate it.

I didn't think we wanted each backend to do the aggregation per oid. 
Seems expensive. Maybe we would need a count for things like "number of
rows returned" so it would be [oid][stat_type][value].

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Performance monitor signal handler

2001-03-19 Thread Jan Wieck

Bruce Momjian wrote:
> > Bruce Momjian <[EMAIL PROTECTED]> writes:
> > > Only shared memory gives us near-zero cost for write/read.  99% of
> > > backends will not be using stats, so it has to be cheap.
> >
> > Not with a circular buffer it's not cheap, because you need interlocking
> > on writes.  Your claim that you can get away without that is simply
> > false.  You won't just get lost messages, you'll get corrupted messages.
>
> How do I get corrupt messages if they are all five bytes?  If I write
> five bytes, and another does the same, I guess the assembler could
> intersperse the writes so the oid gets to be a corrupt value.  Any cheap
> way around this, perhaps by skiping/clearing the write on a collision?
>
> >
> > > The collector program can read the shared memory stats and keep hashed
> > > values of accumulated stats.  It uses the "Loops" variable to know if it
> > > has read the current information in the buffer.
> >
> > And how does it sleep until the counter has been advanced?  Seems to me
> > it has to busy-wait (bad) or sleep (worse; if the minimum sleep delay
> > is 10 ms then it's guaranteed to miss a lot of data under load).
>
> I figured it could just wake up every few seconds and check.  It will
> remember the loop counter and current pointer, and read any new
> information.  I was thinking of a 20k buffer, which could cover about 4k
> events.

Here  I  wonder what your EVENT is. With an Oid as identifier
and a 1 byte (even if it'd be anoter 32-bit value), how  many
messages do you want to generate to get these statistics:

-   Number of sequential scans done per table.
-   Number of tuples returned via sequential scans per table.
-   Number of buffer cache lookups  done  through  sequential
scans per table.
-   Number  of  buffer  cache  hits  for sequential scans per
table.
-   Number of tuples inserted per table.
-   Number of tuples updated per table.
-   Number of tuples deleted per table.
-   Number of index scans done per index.
-   Number of index tuples returned per index.
-   Number of buffer cache lookups  done  due  to  scans  per
index.
-   Number of buffer cache hits per index.
-   Number  of  valid heap tuples returned via index scan per
index.
-   Number of buffer cache lookups done for heap fetches  via
index scan per index.
-   Number  of  buffer  cache hits for heap fetches via index
scan per index.
-   Number of buffer cache lookups not accountable for any of
the above.
-   Number  of  buffer  cache hits not accountable for any of
the above.

What I see is that there's a difference in what we  two  want
to see in the statistics. You're talking about looking at the
actual querystring and such. That's  information  useful  for
someone   actually  looking  at  a  server,  to  see  what  a
particular backend  is  doing.  On  my  notebook  a  parallel
regression  test  (containing >4,000 queries) passes by under
1:30, that's more than 40 queries per second. So that doesn't
tell me much.

What I'm after is to collect the above data over a week or so
and then generate a report to identify the hot spots  of  the
schema.  Which tables/indices cause the most disk I/O, what's
the average percentage of tuples returned in scans (not  from
the  query, I mean from the single scan inside of the joins).
That's the information I need  to  know  where  to  look  for
possibly  better  qualifications, useless indices that aren't
worth to maintain and the like.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #



_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Performance monitor signal handler

2001-03-19 Thread Bruce Momjian

I have a new statistics collection proposal.

I suggest three shared memory areas:

One per backend to hold the query string and other per-backend stats
One global area to hold accumulated stats for all backends
One global circular buffer to hold per-table/object stats

The circular buffer will look like:

(Loops) Start---End
 |
 current pointer

Loops is incremented every time the pointer reaches "end".

Each statistics record will have a length of five bytes made up of
oid(4) and action(1).  By having the same length for all statistics
records, we don't need to perform any locking of the buffer.  A backend
will grab the current pointer, add five to it, and write into the
reserved 5-byte area.  If two backends write at the same time, one
overwrites the other, but this is just statistics information, so it is
not a great lose.

Only shared memory gives us near-zero cost for write/read.  99% of
backends will not be using stats, so it has to be cheap.

The collector program can read the shared memory stats and keep hashed
values of accumulated stats.  It uses the "Loops" variable to know if it
has read the current information in the buffer.  When it receives a
signal, it can dump its stats to a file in standard COPY format of
.  It can also reset its counters with a
signal.

Comments?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Performance monitor signal handler

2001-03-19 Thread Bruce Momjian

> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Only shared memory gives us near-zero cost for write/read.  99% of
> > backends will not be using stats, so it has to be cheap.
> 
> Not with a circular buffer it's not cheap, because you need interlocking
> on writes.  Your claim that you can get away without that is simply
> false.  You won't just get lost messages, you'll get corrupted messages.

How do I get corrupt messages if they are all five bytes?  If I write
five bytes, and another does the same, I guess the assembler could
intersperse the writes so the oid gets to be a corrupt value.  Any cheap
way around this, perhaps by skiping/clearing the write on a collision?

> 
> > The collector program can read the shared memory stats and keep hashed
> > values of accumulated stats.  It uses the "Loops" variable to know if it
> > has read the current information in the buffer.
> 
> And how does it sleep until the counter has been advanced?  Seems to me
> it has to busy-wait (bad) or sleep (worse; if the minimum sleep delay
> is 10 ms then it's guaranteed to miss a lot of data under load).

I figured it could just wake up every few seconds and check.  It will
remember the loop counter and current pointer, and read any new
information.  I was thinking of a 20k buffer, which could cover about 4k
events.

Should we think about doing these writes into an OS file, and only
enabling the writes when we know there is a collector reading them,
perhaps using a /tmp file to activate recording.  We could allocation
1MB and be sure not to miss anything, even with a circular setup.


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-19 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> Only shared memory gives us near-zero cost for write/read.  99% of
> backends will not be using stats, so it has to be cheap.

Not with a circular buffer it's not cheap, because you need interlocking
on writes.  Your claim that you can get away without that is simply
false.  You won't just get lost messages, you'll get corrupted messages.

> The collector program can read the shared memory stats and keep hashed
> values of accumulated stats.  It uses the "Loops" variable to know if it
> has read the current information in the buffer.

And how does it sleep until the counter has been advanced?  Seems to me
it has to busy-wait (bad) or sleep (worse; if the minimum sleep delay
is 10 ms then it's guaranteed to miss a lot of data under load).

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Performance monitor signal handler

2001-03-18 Thread Tom Lane

Jan Wieck <[EMAIL PROTECTED]> writes:
> Just  to  get  some  evidence  at hand - could some owners of
> different platforms compile and run  the  attached  little  C
> source please?
> (The  program  tests how much data can be stuffed into a pipe
> or a Sys-V message queue before the writer would block or get
> an EAGAIN error).

One final followup on this --- I wasted a fair amount of time just
now trying to figure out why Perl 5.6.0 was silently hanging up
in its self-tests (at op/taint, which seems pretty unrelated...).

The upshot: Jan's test program had left a 16k SysV message queue
hanging about, and that queue was filling all available SysV message
space on my machine.  Seems Perl tries to test message-queue sending,
and it was patiently waiting for some message space to come free.

In short, the SysV message queue limits are so tiny that not only
are you quite likely to get bollixed up if you use messages, but
you're likely to bollix anything else that's using message queues too.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Performance monitor signal handler

2001-03-18 Thread Patrick Welche

On Fri, Mar 16, 2001 at 05:25:24PM -0500, Jan Wieck wrote:
> Jan Wieck wrote:
...
> Just  to  get  some  evidence  at hand - could some owners of
> different platforms compile and run  the  attached  little  C
> source please?
... 
> Seems Tom is (unfortunately) right. The pipe blocks at 4K.

On NetBSD-1.5S/i386 with just the highly conservative shmem defaults:

Pipe buffer is 4096 bytes
Sys-V message queue buffer is 2048 bytes

Cheers,

Patrick

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Performance monitor signal handler

2001-03-17 Thread Jan Wieck

Tom Lane wrote:
> Samuel Sieb <[EMAIL PROTECTED]> writes:
> > Just as another suggestion, what about sending the data to a different
> > computer, so instead of tying up the database server with processing the
> > statistics, you have another computer that has some free time to do the
> > processing.
>
> > Some drawbacks are that you can't automatically start/restart it from the
> > postmaster and it will put a little more load on the network,
>
> ... and a lot more load on the CPU.  Same-machine "network" connections
> are much cheaper (on most kernels, anyway) than real network
> connections.
>
> I think all of this discussion is vast overkill.  No one has yet
> demonstrated that it's not sufficient to have *one* collector process
> and a lossy transmission method.  Let's try that first, and if it really
> proves to be unworkable then we can get out the lily-gilding equipment.
> But there is tons more stuff to do before we have useful stats at all,
> and I don't think that this aspect is the most critical part of the
> problem.

Well,

back  to my initial approach with the UDP socket collector. I
now have a collector simply reading  all  messages  from  the
socket.  It  doesn't  do  anything useful except for counting
their number.

Every backend sends a couple  of  1K  junk  messages  at  the
beginning  of  the  main loop. Up to 16 messages, there is no
time(1) measurable  delay  in  the  execution  of  the  "make
runcheck".

The   dummy   collector  can  keep  up  during  the  parallel
regression test until the  backends  send  64  messages  each
time,  at  that number he lost 1.25% of the messages. That is
an amount of statistics data of >256MB to be collected.  Most
of  the  test  queries  will never generate 1K of message, so
that there should be some space here.

My plan  now  is  to  add  some  real  functionality  to  the
collector and the backend, to see if that has an impact.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #



_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Performance monitor signal handler

2001-03-17 Thread Bruce Momjian

> > But per-table stats aren't something that people will look at often,
> > right?  They can sit in the collector's memory for quite a while.  See
> > people wanting to look at per-backend stuff frequently, and that is why
> > I thought share memory should be good, and a global area for aggregate
> > stats for all backends.
> 
> >> I think you missed the point that somebody made a little while ago
> >> about waiting for functions that can return tuple sets.  Once we have
> >> that, the stats tables can be *virtual* tables, ie tables that are
> >> computed on-demand by some function.  That will be a lot less overhead
> >> than physically updating an actual table.
> 
> > Yes, but do we want to keep these stats between postmaster restarts? 
> > And what about writing them to tables when our storage of table stats
> > gets too big?
> 
> All those points seem to me to be arguments in *favor* of a virtual-
> table approach, not arguments against it.
> 
> Or are you confusing the method of collecting stats with the method
> of making the collected stats available for use?

Maybe I am confusing them.  I didn't see a distinction in the
discussion.

I assumed the UDP/message passing of information to the collector was
the way statistics were collected, and I don't understand why a
per-backend area and global area, with some kind of cicular buffer for
per-table stuff isn't the cheapest, cleanest solution.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-17 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> Even better, have an SQL table updated with the per-table stats
> periodically.
>> 
>> That will be horribly expensive, if it's a real table.

> But per-table stats aren't something that people will look at often,
> right?  They can sit in the collector's memory for quite a while.  See
> people wanting to look at per-backend stuff frequently, and that is why
> I thought share memory should be good, and a global area for aggregate
> stats for all backends.

>> I think you missed the point that somebody made a little while ago
>> about waiting for functions that can return tuple sets.  Once we have
>> that, the stats tables can be *virtual* tables, ie tables that are
>> computed on-demand by some function.  That will be a lot less overhead
>> than physically updating an actual table.

> Yes, but do we want to keep these stats between postmaster restarts? 
> And what about writing them to tables when our storage of table stats
> gets too big?

All those points seem to me to be arguments in *favor* of a virtual-
table approach, not arguments against it.

Or are you confusing the method of collecting stats with the method
of making the collected stats available for use?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Performance monitor signal handler

2001-03-17 Thread Bruce Momjian

> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > The only open issue is per-table stuff, and I would like to see some
> > circular buffer implemented to handle that, with a collection process
> > that has access to shared memory.
> 
> That will get us into locking/contention issues.  OTOH, frequent trips
> to the kernel to send stats messages --- regardless of the transport
> mechanism chosen --- don't seem all that cheap either.

I am confused.  Reading/writing shared memory is not a kernel call,
right?

I agree on the locking contention problems of a circular buffer.

> 
> > Even better, have an SQL table updated with the per-table stats
> > periodically.
> 
> That will be horribly expensive, if it's a real table.

But per-table stats aren't something that people will look at often,
right?  They can sit in the collector's memory for quite a while.  See
people wanting to look at per-backend stuff frequently, and that is why
I thought share memory should be good, and a global area for aggregate
stats for all backends.

> I think you missed the point that somebody made a little while ago
> about waiting for functions that can return tuple sets.  Once we have
> that, the stats tables can be *virtual* tables, ie tables that are
> computed on-demand by some function.  That will be a lot less overhead
> than physically updating an actual table.

Yes, but do we want to keep these stats between postmaster restarts? 
And what about writing them to tables when our storage of table stats
gets too big?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Performance monitor signal handler

2001-03-17 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> The only open issue is per-table stuff, and I would like to see some
> circular buffer implemented to handle that, with a collection process
> that has access to shared memory.

That will get us into locking/contention issues.  OTOH, frequent trips
to the kernel to send stats messages --- regardless of the transport
mechanism chosen --- don't seem all that cheap either.

> Even better, have an SQL table updated with the per-table stats
> periodically.

That will be horribly expensive, if it's a real table.

I think you missed the point that somebody made a little while ago
about waiting for functions that can return tuple sets.  Once we have
that, the stats tables can be *virtual* tables, ie tables that are
computed on-demand by some function.  That will be a lot less overhead
than physically updating an actual table.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Performance monitor signal handler

2001-03-17 Thread Bruce Momjian

> ... and a lot more load on the CPU.  Same-machine "network" connections
> are much cheaper (on most kernels, anyway) than real network
> connections.
> 
> I think all of this discussion is vast overkill.  No one has yet
> demonstrated that it's not sufficient to have *one* collector process
> and a lossy transmission method.  Let's try that first, and if it really
> proves to be unworkable then we can get out the lily-gilding equipment.
> But there is tons more stuff to do before we have useful stats at all,
> and I don't think that this aspect is the most critical part of the
> problem.

Agreed.  Sounds like overkill.

How about a per-backend shared memory area for stats, plus a global
shared memory area that each backend can add to when it exists.  That
meets most of our problem.

The only open issue is per-table stuff, and I would like to see some
circular buffer implemented to handle that, with a collection process
that has access to shared memory.  Even better, have an SQL table
updated with the per-table stats periodically.  How about a collector
process that periodically reads though the shared memory and UPDATE's
SQL tables with the information.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-17 Thread Tom Lane

Samuel Sieb <[EMAIL PROTECTED]> writes:
> Just as another suggestion, what about sending the data to a different
> computer, so instead of tying up the database server with processing the
> statistics, you have another computer that has some free time to do the
> processing.

> Some drawbacks are that you can't automatically start/restart it from the
> postmaster and it will put a little more load on the network,

... and a lot more load on the CPU.  Same-machine "network" connections
are much cheaper (on most kernels, anyway) than real network
connections.

I think all of this discussion is vast overkill.  No one has yet
demonstrated that it's not sufficient to have *one* collector process
and a lossy transmission method.  Let's try that first, and if it really
proves to be unworkable then we can get out the lily-gilding equipment.
But there is tons more stuff to do before we have useful stats at all,
and I don't think that this aspect is the most critical part of the
problem.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Performance monitor signal handler

2001-03-17 Thread Samuel Sieb

On Sat, Mar 17, 2001 at 09:33:03AM -0500, Jan Wieck wrote:
> 
> The  general  problem  remains.  We  only  have  one  central
> collector with a limited receive capacity.  The more load  is
> on  the  machine,  the  smaller it's capacity gets.  The more
> complex the DB schemas get  and  the  more  load  is  on  the
> system,  the  more interesting accurate statistics get.  Both
> factors are contraproductive. More complex schema means  more
> tables  and  thus  bigger  messages.  More  load  means  more
> messages.  Having good statistics on a toy system while  they
> get  worse  for  a  web  backend  server  that's really under
> pressure is braindead from the start.
> 
Just as another suggestion, what about sending the data to a different
computer, so instead of tying up the database server with processing the
statistics, you have another computer that has some free time to do the
processing.

Some drawbacks are that you can't automatically start/restart it from the
postmaster and it will put a little more load on the network, but it seems
to mostly solve the issues of blocked pipes and using too much cpu time
on the database server.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-17 Thread Jan Wieck

Philip Warner wrote:
> At 13:49 16/03/01 -0500, Jan Wieck wrote:
> >
> >Similar problem as with shared  memory  -  size.  If  a  long
> >running  backend  of  a multithousand table database needs to
> >send access stats per table - and had accessed them all up to
> >now - it'll be alot of wasted bandwidth.
>
> Not if you only send totals for individual counters when they change; some
> stats may never be resynced, but for the most part it will work. Also, does
> Unix allow interrupts to occur as a result of data arrivibg in a pipe? If
> so, how about:
>
> - All backends to do *blocking* IO to collector.

The  general  problem  remains.  We  only  have  one  central
collector with a limited receive capacity.  The more load  is
on  the  machine,  the  smaller it's capacity gets.  The more
complex the DB schemas get  and  the  more  load  is  on  the
system,  the  more interesting accurate statistics get.  Both
factors are contraproductive. More complex schema means  more
tables  and  thus  bigger  messages.  More  load  means  more
messages.  Having good statistics on a toy system while  they
get  worse  for  a  web  backend  server  that's really under
pressure is braindead from the start.

We don't want the backends to block,  so  that  they  can  do
THEIR work. That's to process queries, nothing else.

Pipes  seem  to  be  inappropriate  because  their  buffer is
limited to 4K on Linux and most BSD flavours. Message  queues
are too because they are limited to 2K on most BSD's. So only
sockets remain.

If we have multiple processes that try to  receive  from  the
UDP  socket,  condense  the  received  packets  into  summary
messages and send them to the central collector,  this  might
solve the problem.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #



_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Performance monitor signal handler

2001-03-17 Thread Philip Warner

At 13:49 16/03/01 -0500, Jan Wieck wrote:
>
>Similar problem as with shared  memory  -  size.  If  a  long
>running  backend  of  a multithousand table database needs to
>send access stats per table - and had accessed them all up to
>now - it'll be alot of wasted bandwidth.

Not if you only send totals for individual counters when they change; some
stats may never be resynced, but for the most part it will work. Also, does
Unix allow interrupts to occur as a result of data arrivibg in a pipe? If
so, how about:

- All backends to do *blocking* IO to collector.

- Collector to receive an interrupt when a message arrives; while in the
interrupt it reads the buffer into a local queue, and returns from the
interrupt.

- Main line code processes the queue and writes it to a memory mapped file
for durability.

- If collector dies, postmaster starts another immediately, which slears
the backlog of data in the pipe and then remaps the file.

- Each backend has its own local copy of it's counters which *possibly* to
collector can ask for when it restarts.





Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Larry Rosenman

* Larry Rosenman <[EMAIL PROTECTED]> [010316 20:47]:
> * Jan Wieck <[EMAIL PROTECTED]> [010316 16:35]:
> $ ./queuetest
> Pipe buffer is 32768 bytes
> Sys-V message queue buffer is 4096 bytes
> $ uname -a
> UnixWare lerami 5 7.1.1 i386 x86at SCO UNIX_SVR5
> $ 
> 
> I think some of these are configurable...
They both are.  FIFOBLKSIZE and MSGMNB or some such kernel tunable.

I can get more info if you need it.

LER

-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Larry Rosenman

* Jan Wieck <[EMAIL PROTECTED]> [010316 16:35]:
> Jan Wieck wrote:
> > Tom Lane wrote:
> > > Now this would put a pretty tight time constraint on the collector:
> > > fall more than 4K behind, you start losing data.  I am not sure if
> > > a UDP socket would provide more buffering or not; anyone know?
> >
> > Looks  like Linux has something around 16-32K of buffer space
> > for UDP sockets. Just from eyeballing the  fprintf(3)  output
> > of my destructively hacked postleprechaun.
> 
> Just  to  get  some  evidence  at hand - could some owners of
> different platforms compile and run  the  attached  little  C
> source please?
> 
> (The  program  tests how much data can be stuffed into a pipe
> or a Sys-V message queue before the writer would block or get
> an EAGAIN error).
> 
> My output on RedHat6.1 Linux 2.2.17 is:
> 
> Pipe buffer is 4096 bytes
> Sys-V message queue buffer is 16384 bytes
> 
> Seems Tom is (unfortunately) right. The pipe blocks at 4K.
> 
> So  a  Sys-V  message  queue,  with the ability to distribute
> messages from  the  collector  to  individual  backends  with
> kernel  support  via  "mtype"  is  four  times by unestimated
> complexity better here.  What does your system say?
> 
> I really never thought that Sys-V IPC is a good way to go  at
> all.   I  hate  it's  incompatibility to the select(2) system
> call and all these  OS/installation  dependant  restrictions.
> But I'm tempted to reevaluate it "for this case".
> 
> 
> Jan
$ ./queuetest
Pipe buffer is 32768 bytes
Sys-V message queue buffer is 4096 bytes
$ uname -a
UnixWare lerami 5 7.1.1 i386 x86at SCO UNIX_SVR5
$ 

I think some of these are configurable...

LER

-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Giles Lean


> Just  to  get  some  evidence  at hand - could some owners of
> different platforms compile and run  the  attached  little  C
> source please?

$ uname -srm
FreeBSD 4.1.1-STABLE
$ ./jan
Pipe buffer is 16384 bytes
Sys-V message queue buffer is 2048 bytes

$ uname -srm
NetBSD 1.5 alpha
$ ./jan
Pipe buffer is 4096 bytes
Sys-V message queue buffer is 2048 bytes

$ uname -srm
NetBSD 1.5_BETA2 i386
$ ./jan
Pipe buffer is 4096 bytes
Sys-V message queue buffer is 2048 bytes

$ uname -srm
NetBSD 1.4.2 i386
$ ./jan
Pipe buffer is 4096 bytes
Sys-V message queue buffer is 2048 bytes

$ uname -srm
NetBSD 1.4.1 sparc
$ ./jan
Pipe buffer is 4096 bytes
Bad system call (core dumped)   # no SysV IPC in running kernel

$ uname -srm
HP-UX B.11.11 9000/800
$ ./jan
Pipe buffer is 8192 bytes
Sys-V message queue buffer is 16384 bytes

$ uname -srm
HP-UX B.11.00 9000/813
$ ./jan
Pipe buffer is 8192 bytes
Sys-V message queue buffer is 16384 bytes

$ uname -srm
HP-UX B.10.20 9000/871
$ ./jan
Pipe buffer is 8192 bytes
Sys-V message queue buffer is 16384 bytes

HP-UX can also use STREAMS based pipes if the kernel parameter
streampipes is set.  Using STREAMS based pipes increases the pipe
buffer size by a lot:

# uname -srm 
HP-UX B.11.11 9000/800
# ./jan
Pipe buffer is 131072 bytes
Sys-V message queue buffer is 16384 bytes

# uname -srm
HP-UX B.11.00 9000/800
# ./jan
Pipe buffer is 131072 bytes
Sys-V message queue buffer is 16384 bytes

Regards,

Giles

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Tom Lane

Jan Wieck <[EMAIL PROTECTED]> writes:
> Just  to  get  some  evidence  at hand - could some owners of
> different platforms compile and run  the  attached  little  C
> source please?

HPUX 10.20:

Pipe buffer is 8192 bytes
Sys-V message queue buffer is 16384 bytes

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Jan Wieck

Jan Wieck wrote:
> Tom Lane wrote:
> > Now this would put a pretty tight time constraint on the collector:
> > fall more than 4K behind, you start losing data.  I am not sure if
> > a UDP socket would provide more buffering or not; anyone know?
>
> Looks  like Linux has something around 16-32K of buffer space
> for UDP sockets. Just from eyeballing the  fprintf(3)  output
> of my destructively hacked postleprechaun.

Just  to  get  some  evidence  at hand - could some owners of
different platforms compile and run  the  attached  little  C
source please?

(The  program  tests how much data can be stuffed into a pipe
or a Sys-V message queue before the writer would block or get
an EAGAIN error).

My output on RedHat6.1 Linux 2.2.17 is:

Pipe buffer is 4096 bytes
Sys-V message queue buffer is 16384 bytes

Seems Tom is (unfortunately) right. The pipe blocks at 4K.

So  a  Sys-V  message  queue,  with the ability to distribute
messages from  the  collector  to  individual  backends  with
kernel  support  via  "mtype"  is  four  times by unestimated
complexity better here.  What does your system say?

I really never thought that Sys-V IPC is a good way to go  at
all.   I  hate  it's  incompatibility to the select(2) system
call and all these  OS/installation  dependant  restrictions.
But I'm tempted to reevaluate it "for this case".


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #




#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 


typedef struct  test_message
{
longmtype;
charmtext[512 - sizeof(long)];
} test_message;


static int  test_pipe(void);
static int  test_msg(void);


int
main(int argc, char *argv[])
{
if(test_pipe() < 0)
return 1;

if(test_msg() < 0)
return 1;

return 0;
}


static int
test_pipe(void)
{
int p[2];
charbuf[512];
int done;
int rc;

if (pipe(p) < 0)
{
perror("pipe(2)");
return -1;
}

if (fcntl(p[1], F_SETFL, O_NONBLOCK) < 0)
{
perror("fcntl(2)");
return -1;
}

for(done = 0; ; )
{
if ((rc = write(p[1], buf, sizeof(buf))) != sizeof(buf))
{
if (rc < 0)
{
extern int errno;

if (errno == EAGAIN)
{
printf("Pipe buffer is %d bytes\n", done);
return 0;
}

perror("write(2)");
return -1;
}

fprintf(stderr, "whatever happened - rc = %d on write(2)\n", rc);
return -1;
}
done += rc;
}

fprintf(stderr, "Endless write loop returned - what's that?\n");
return -1;
}


static int
test_msg(void)
{
int mq;
test_messagemsg;
int done;

if ((mq = msgget(IPC_PRIVATE, IPC_CREAT | 0600)) < 0)
{
perror("msgget(2)");
return -1;
}

for (done = 0; ; )
{
msg.mtype = 1;
if (msgsnd(mq, &msg, sizeof(msg), IPC_NOWAIT) < 0)
{
extern int  errno;

if (errno == EAGAIN)
{
printf("Sys-V message queue buffer is %d bytes\n", done);
return 0;
}

perror("msgsnd(2)");
return -1;
}
done += sizeof(msg);
}

fprintf(stderr, "Endless write loop returned - what's that?\n");
return -1;
}





---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Jan Wieck

Tom Lane wrote:
> Now this would put a pretty tight time constraint on the collector:
> fall more than 4K behind, you start losing data.  I am not sure if
> a UDP socket would provide more buffering or not; anyone know?

Looks  like Linux has something around 16-32K of buffer space
for UDP sockets. Just from eyeballing the  fprintf(3)  output
of my destructively hacked postleprechaun.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #



_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Jan Wieck

Tom Lane wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > Does  a pipe guarantee that a buffer, written with one atomic
> > write(2), never can get intermixed with  other  data  on  the
> > readers  end?
>
> Yes.  The HPUX man page for write(2) sez:
>
>   o  Write requests of {PIPE_BUF} bytes or less will not be
>  interleaved with data from other processes doing writes on the
>  same pipe.  Writes of greater than {PIPE_BUF} bytes may have
>  data interleaved, on arbitrary boundaries, with writes by
>  other processes, whether or not the O_NONBLOCK flag of the
>  file status flags is set.
>
> Stevens' _UNIX Network Programming_ (1990) states this is true for all
> pipes (nameless or named) on all flavors of Unix, and furthermore states
> that PIPE_BUF is at least 4K on all systems.  I don't have any relevant
> Posix standards to look at, but I'm not worried about assuming this to
> be true.

That's good news - and maybe a Good Assumption (TM).

> > With message queues, this is guaranteed. Also, message queues
> > would  make  it  easy  to query the collected statistics (see
> > below).
>
> I will STRONGLY object to any proposal that we use message queues.
> We've already had enough problems with the ridiculously low kernel
> limits that are commonly imposed on shmem and SysV semaphores.
> We don't need to buy into that silliness yet again with message queues.
> I don't believe they gain us anything over pipes anyway.

   OK.

> The real problem with either pipes or message queues is that backends
> will block if the collector stops collecting data.  I don't think we
> want that.  I suppose we could have the backends write a pipe with
> O_NONBLOCK and ignore failure, however:
>
>   o  If the O_NONBLOCK flag is set, write() requests will  be
>  handled differently, in the following ways:
>
>  -  The write() function will not block the process.
>
>  -  A write request for {PIPE_BUF} or fewer bytes  will have
> the following effect:  If there is sufficient space
> available in the pipe, write() will transfer all the data
> and return the number of bytes  requested.  Otherwise,
> write() will transfer no data and return -1 with errno set
> to EAGAIN.
>
> Since we already ignore SIGPIPE, we don't need to worry about losing the
> collector entirely.

That's  not  what  the manpage said. It said that in the case
you're inside PIPE_BUF size and using O_NONBLOCK, you  either
send complete messages or nothing, getting an EAGAIN then.

So  we  could  do the same here and write to the pipe. In the
case we cannot, just count up and try  again  next  year  (or
so).

>
> Now this would put a pretty tight time constraint on the collector:
> fall more than 4K behind, you start losing data.  I am not sure if
> a UDP socket would provide more buffering or not; anyone know?

Again,   this   ain't  what  the  manpage  said.  If  there's
sufficient space available in the pipe  in  combination  with
that  PIPE_BUF  is  at least 4K doesn't necessarily mean that
the pipes buffer space is 4K.

Well,  what  I'm  missing  is  the  ability  to  filter   out
statistics reports on the backend side via msgrcv(2)s msgtype
:-(


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #



_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Tom Lane

Jan Wieck <[EMAIL PROTECTED]> writes:
> Does  a pipe guarantee that a buffer, written with one atomic
> write(2), never can get intermixed with  other  data  on  the
> readers  end?

Yes.  The HPUX man page for write(2) sez:

  o  Write requests of {PIPE_BUF} bytes or less will not be
 interleaved with data from other processes doing writes on the
 same pipe.  Writes of greater than {PIPE_BUF} bytes may have
 data interleaved, on arbitrary boundaries, with writes by
 other processes, whether or not the O_NONBLOCK flag of the
 file status flags is set.

Stevens' _UNIX Network Programming_ (1990) states this is true for all
pipes (nameless or named) on all flavors of Unix, and furthermore states
that PIPE_BUF is at least 4K on all systems.  I don't have any relevant
Posix standards to look at, but I'm not worried about assuming this to
be true.

> With message queues, this is guaranteed. Also, message queues
> would  make  it  easy  to query the collected statistics (see
> below).

I will STRONGLY object to any proposal that we use message queues.
We've already had enough problems with the ridiculously low kernel
limits that are commonly imposed on shmem and SysV semaphores.
We don't need to buy into that silliness yet again with message queues.
I don't believe they gain us anything over pipes anyway.

The real problem with either pipes or message queues is that backends
will block if the collector stops collecting data.  I don't think we
want that.  I suppose we could have the backends write a pipe with
O_NONBLOCK and ignore failure, however:

  o  If the O_NONBLOCK flag is set, write() requests will  be
 handled differently, in the following ways:

 -  The write() function will not block the process.

 -  A write request for {PIPE_BUF} or fewer bytes  will have
the following effect:  If there is sufficient space
available in the pipe, write() will transfer all the data
and return the number of bytes  requested.  Otherwise,
write() will transfer no data and return -1 with errno set
to EAGAIN.

Since we already ignore SIGPIPE, we don't need to worry about losing the
collector entirely.

Now this would put a pretty tight time constraint on the collector:
fall more than 4K behind, you start losing data.  I am not sure if
a UDP socket would provide more buffering or not; anyone know?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Alfred Perlstein

* Tom Lane <[EMAIL PROTECTED]> [010316 10:06] wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > Uh - not much time to spend if the statistics should at least
> > be  half  accurate. And it would become worse in SMP systems.
> > So that was a nifty idea, but I think it'd  cause  much  more
> > statistic losses than I assumed at first.
> 
> > Back to drawing board. Maybe a SYS-V message queue can serve?
> 
> That would be the same as a pipe: backends would block if the collector
> stopped accepting data.  I do like the "auto discard" aspect of this
> UDP-socket approach.
> 
> I think Philip had the right idea: each backend should send totals,
> not deltas, in its messages.  Then, it doesn't matter (much) if the
> collector loses some messages --- that just means that sometimes it
> has a slightly out-of-date idea about how much work some backends have
> done.  It should be easy to design the software so that that just makes
> a small, transient error in the currently displayed statistics.

MSGSND(3)  FreeBSD Library Functions Manual  MSGSND(3)


ERRORS
 msgsnd() will fail if:

 [EAGAIN]   There was no space for this message either on the
queue, or in the whole system, and IPC_NOWAIT was set
in msgflg.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Jan Wieck


Tom Lane wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > Uh - not much time to spend if the statistics should at least
> > be  half  accurate. And it would become worse in SMP systems.
> > So that was a nifty idea, but I think it'd  cause  much  more
> > statistic losses than I assumed at first.
>
> > Back to drawing board. Maybe a SYS-V message queue can serve?
>
> That would be the same as a pipe: backends would block if the collector
> stopped accepting data.  I do like the "auto discard" aspect of this
> UDP-socket approach.

Does  a pipe guarantee that a buffer, written with one atomic
write(2), never can get intermixed with  other  data  on  the
readers  end?   I know that you know what I mean, but for the
broader audience: Let's define a message to the collector  to
be  4byte-len,len-bytes.   Now  hundreds  of  backends hammer
messages into the (shared) writing end of the pipe, all  with
different sizes. Is itGUARANTEEDthata
read(4bytes),read(nbytes) sequence will  allways  return  one
complete  message  and  never  intermixed  parts of different
write(2)s?

With message queues, this is guaranteed. Also, message queues
would  make  it  easy  to query the collected statistics (see
below).

> I think Philip had the right idea: each backend should send totals,
> not deltas, in its messages.  Then, it doesn't matter (much) if the
> collector loses some messages --- that just means that sometimes it
> has a slightly out-of-date idea about how much work some backends have
> done.  It should be easy to design the software so that that just makes
> a small, transient error in the currently displayed statistics.

If we use two message queues (IPC_PRIVATE  is  enough  here),
one  into collector and one into backend direction, this'd be
an easy way to collect and query statistics.

The backends send delta stats messages to  the  collector  on
one  queue. Message queues block, by default, but the backend
could use IPC_NOWAIT and just go on and collect up,  as  long
as  it finally will use a blocking call before exiting. We'll
loose  statistics  for  backends  that  go  down  in   flames
(coredump), but who cares for statistics then?

To  query statistics, we have a set of new builtin functions.
All functions share  a  global  statistics  snapshot  in  the
backend.  If  on  function call the snapshot doesn't exist or
was generated by  another  XACT/commandcounter,  the  backend
sends  a  statistics  request  for  his  database  ID  to the
collector and waits for the messages to arrive on the  second
message  queue. It can pick up the messages meant for him via
message type, which's equal to his backend number +1, because
the  collector will send 'em as such.  For table access stats
for example, the snapshot will have slots identified  by  the
tables  OID,  so  a function pg_get_tables_seqscan_count(oid)
should be easy  to  implement.  And  setting  up  views  that
present access stats in readable format is a nobrainer.

Now  we  have communication only between the backends and the
collector.  And we're  certain  that  only  someone  able  to
SELECT from a system view will ever see this information.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Jan Wieck

Alfred Perlstein wrote:
> * Jan Wieck <[EMAIL PROTECTED]> [010316 08:08] wrote:
> > Philip Warner wrote:
> > >
> > > But I prefer the UDP/Collector model anyway; it gives use greater
> > > flexibility + the ability to keep stats past backend termination, and,as
> > > you say, removes any possible locking requirements from the backends.
> >
> > OK, did some tests...
> >
> > The  postmaster can create a SOCK_DGRAM socket at startup and
> > bind(2) it to "127.0.0.1:0", what causes the kernel to assign
> > a  non-privileged  port  number  that  then  can be read with
> > getsockname(2). No other process can have a socket  with  the
> > same port number for the lifetime of the postmaster.
> >
> > If  the  socket  get's  ready, it'll read one backend message
> > from   it   with   recvfrom(2).   The   fromaddr   mustbe
> > "127.0.0.1:xxx"  where  xxx  is  the  port  number the kernel
> > assigned to the above socket.  Yes,  this  is  his  own  one,
> > shared  with  postmaster  and  all  backends.  So  both,  the
> > postmaster and the backends can  use  this  one  UDP  socket,
> > which  the  backends  inherit on fork(2), to send messages to
> > the collector. If such  a  UDP  packet  really  came  from  a
> > process other than the postmaster or a backend, well then the
> > sysadmin has  a  more  severe  problem  than  manipulated  DB
> > runtime statistics :-)
>
> Doing this is a bad idea:
>
> a) it allows any program to start spamming localhost:randport with
> messages and screw with the postmaster.
>
> b) it may even allow remote people to mess with it, (see recent
> bugtraq articles about this)

So  it's  possible  for  a  UDP socket to recvfrom(2) and get
packets with  a  fromaddr  localhost:my_own_non_SO_REUSE_port
that really came from somewhere else?

If  that's  possible,  the  packets  must  be coming over the
network.  Oterwise it's the local superuser sending them, and
in  that case it's not worth any more discussion because root
on your system has more powerful possibilities to muck around
with  your  database. And if someone outside the local system
is doing it, it's time for some filter rules, isn't it?

> You should use a unix domain socket (at least when possible).

Unix domain UDP?

>
> > Running  a 500MHz P-III, 192MB, RedHat 6.1 Linux 2.2.17 here,
> > I've been able to loose no single message during the parallel
> > regression  test,  if each backend sends one 1K sized message
> > per query executed, and the collector simply sucks  them  out
> > of  the  socket. Message losses start if the collector does a
> > per message idle loop like this:
> >
> > for (i=0,sum=0;i<25;i++,sum+=1);
> >
> > Uh - not much time to spend if the statistics should at least
> > be  half  accurate. And it would become worse in SMP systems.
> > So that was a nifty idea, but I think it'd  cause  much  more
> > statistic losses than I assumed at first.
> >
> > Back to drawing board. Maybe a SYS-V message queue can serve?
>
> I wouldn't say back to the drawing board, I would say two steps back.
>
> What about instead of sending deltas, you send totals?  This would
> allow you to loose messages and still maintain accurate stats.

Similar problem as with shared  memory  -  size.  If  a  long
running  backend  of  a multithousand table database needs to
send access stats per table - and had accessed them all up to
now - it'll be alot of wasted bandwidth.

>
> You can also enable SIGIO on the socket, then have a signal handler
> buffer packets that arrive when not actively select()ing on the
> UDP socket.  You can then use sigsetmask(2) to provide mutual
> exclusion with your SIGIO handler and general select()ing on the
> socket.

I  already thought that priorizing the socket-drain this way:
there is a fairly big receive buffer. If the buffer is empty,
it  does  a  blocking  select(2). If it's not, it does a non-
blocking (0-timeout) one and only if the  non-blocking  tells
that  there  aren't  new  messages waiting, it'll process one
buffered message and try to receive again.

Will give it a shot.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #



_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Philip Warner

At 17:10 15/03/01 -0800, Alfred Perlstein wrote:
>> 
>> Which is why the backends should not do anything other than maintain the
>> raw data. If there is atomic data than can cause inconsistency, then a
>> dropped UDP packet will do the same.
>
>The UDP packet (a COPY) can contain a consistant snapshot of the data.
>If you have dependancies, you fit a consistant snapshot into a single
>packet.

If we were going to go the shared memory way, then yes, as soon as we start
collecting dependant data we would need locking, but IOs, locking stats,
flushes, cache hits/misses are not really in this category.

But I prefer the UDP/Collector model anyway; it gives use greater
flexibility + the ability to keep stats past backend termination, and,as
you say, removes any possible locking requirements from the backends.




Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Tom Lane

Jan Wieck <[EMAIL PROTECTED]> writes:
> Uh - not much time to spend if the statistics should at least
> be  half  accurate. And it would become worse in SMP systems.
> So that was a nifty idea, but I think it'd  cause  much  more
> statistic losses than I assumed at first.

> Back to drawing board. Maybe a SYS-V message queue can serve?

That would be the same as a pipe: backends would block if the collector
stopped accepting data.  I do like the "auto discard" aspect of this
UDP-socket approach.

I think Philip had the right idea: each backend should send totals,
not deltas, in its messages.  Then, it doesn't matter (much) if the
collector loses some messages --- that just means that sometimes it
has a slightly out-of-date idea about how much work some backends have
done.  It should be easy to design the software so that that just makes
a small, transient error in the currently displayed statistics.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Alfred Perlstein

* Jan Wieck <[EMAIL PROTECTED]> [010316 08:08] wrote:
> Philip Warner wrote:
> >
> > But I prefer the UDP/Collector model anyway; it gives use greater
> > flexibility + the ability to keep stats past backend termination, and,as
> > you say, removes any possible locking requirements from the backends.
> 
> OK, did some tests...
> 
> The  postmaster can create a SOCK_DGRAM socket at startup and
> bind(2) it to "127.0.0.1:0", what causes the kernel to assign
> a  non-privileged  port  number  that  then  can be read with
> getsockname(2). No other process can have a socket  with  the
> same port number for the lifetime of the postmaster.
> 
> If  the  socket  get's  ready, it'll read one backend message
> from   it   with   recvfrom(2).   The   fromaddr   mustbe
> "127.0.0.1:xxx"  where  xxx  is  the  port  number the kernel
> assigned to the above socket.  Yes,  this  is  his  own  one,
> shared  with  postmaster  and  all  backends.  So  both,  the
> postmaster and the backends can  use  this  one  UDP  socket,
> which  the  backends  inherit on fork(2), to send messages to
> the collector. If such  a  UDP  packet  really  came  from  a
> process other than the postmaster or a backend, well then the
> sysadmin has  a  more  severe  problem  than  manipulated  DB
> runtime statistics :-)

Doing this is a bad idea:

a) it allows any program to start spamming localhost:randport with
messages and screw with the postmaster.

b) it may even allow remote people to mess with it, (see recent
bugtraq articles about this)

You should use a unix domain socket (at least when possible).

> Running  a 500MHz P-III, 192MB, RedHat 6.1 Linux 2.2.17 here,
> I've been able to loose no single message during the parallel
> regression  test,  if each backend sends one 1K sized message
> per query executed, and the collector simply sucks  them  out
> of  the  socket. Message losses start if the collector does a
> per message idle loop like this:
> 
> for (i=0,sum=0;i<25;i++,sum+=1);
> 
> Uh - not much time to spend if the statistics should at least
> be  half  accurate. And it would become worse in SMP systems.
> So that was a nifty idea, but I think it'd  cause  much  more
> statistic losses than I assumed at first.
> 
> Back to drawing board. Maybe a SYS-V message queue can serve?

I wouldn't say back to the drawing board, I would say two steps back.

What about instead of sending deltas, you send totals?  This would
allow you to loose messages and still maintain accurate stats.

You can also enable SIGIO on the socket, then have a signal handler
buffer packets that arrive when not actively select()ing on the
UDP socket.  You can then use sigsetmask(2) to provide mutual
exclusion with your SIGIO handler and general select()ing on the
socket.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Jan Wieck

Philip Warner wrote:
>
> But I prefer the UDP/Collector model anyway; it gives use greater
> flexibility + the ability to keep stats past backend termination, and,as
> you say, removes any possible locking requirements from the backends.

OK, did some tests...

The  postmaster can create a SOCK_DGRAM socket at startup and
bind(2) it to "127.0.0.1:0", what causes the kernel to assign
a  non-privileged  port  number  that  then  can be read with
getsockname(2). No other process can have a socket  with  the
same port number for the lifetime of the postmaster.

If  the  socket  get's  ready, it'll read one backend message
from   it   with   recvfrom(2).   The   fromaddr   mustbe
"127.0.0.1:xxx"  where  xxx  is  the  port  number the kernel
assigned to the above socket.  Yes,  this  is  his  own  one,
shared  with  postmaster  and  all  backends.  So  both,  the
postmaster and the backends can  use  this  one  UDP  socket,
which  the  backends  inherit on fork(2), to send messages to
the collector. If such  a  UDP  packet  really  came  from  a
process other than the postmaster or a backend, well then the
sysadmin has  a  more  severe  problem  than  manipulated  DB
runtime statistics :-)

Running  a 500MHz P-III, 192MB, RedHat 6.1 Linux 2.2.17 here,
I've been able to loose no single message during the parallel
regression  test,  if each backend sends one 1K sized message
per query executed, and the collector simply sucks  them  out
of  the  socket. Message losses start if the collector does a
per message idle loop like this:

for (i=0,sum=0;i<25;i++,sum+=1);

Uh - not much time to spend if the statistics should at least
be  half  accurate. And it would become worse in SMP systems.
So that was a nifty idea, but I think it'd  cause  much  more
statistic losses than I assumed at first.

Back to drawing board. Maybe a SYS-V message queue can serve?


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #



_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Philip Warner

At 16:55 15/03/01 -0800, Alfred Perlstein wrote:
>* Philip Warner <[EMAIL PROTECTED]> [010315 16:46] wrote:
>> At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
>> >
>> >Lost data is probably better than incorrect data.  Either use locks
>> >or a copying mechanism.  People will depend on the data returned
>> >making sense.
>> >
>> 
>> But with per-backend data, there is only ever *one* writer to a given set
>> of counters. Everyone else is a reader.
>
>This doesn't prevent a reader from getting an inconsistant view.
>
>Think about a 64bit counter on a 32bit machine.  If you charged per
>megabyte, wouldn't it upset you to have a small chance of loosing
>4 billion units of sale?
>
>(ie, doing a read after an addition that wraps the low 32 bits
>but before the carry is done to the top most signifigant 32bits?)

I assume this means we can not rely on the existence of any kind of
interlocked add on 64 bit machines?


>Ok, what what if everything can be read atomically by itself?
>
>You're still busted the minute you need to export any sort of
>compound stat.

Which is why the backends should not do anything other than maintain the
raw data. If there is atomic data than can cause inconsistency, then a
dropped UDP packet will do the same.





Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein

* Philip Warner <[EMAIL PROTECTED]> [010315 17:08] wrote:
> At 16:55 15/03/01 -0800, Alfred Perlstein wrote:
> >* Philip Warner <[EMAIL PROTECTED]> [010315 16:46] wrote:
> >> At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
> >> >
> >> >Lost data is probably better than incorrect data.  Either use locks
> >> >or a copying mechanism.  People will depend on the data returned
> >> >making sense.
> >> >
> >> 
> >> But with per-backend data, there is only ever *one* writer to a given set
> >> of counters. Everyone else is a reader.
> >
> >This doesn't prevent a reader from getting an inconsistant view.
> >
> >Think about a 64bit counter on a 32bit machine.  If you charged per
> >megabyte, wouldn't it upset you to have a small chance of loosing
> >4 billion units of sale?
> >
> >(ie, doing a read after an addition that wraps the low 32 bits
> >but before the carry is done to the top most signifigant 32bits?)
> 
> I assume this means we can not rely on the existence of any kind of
> interlocked add on 64 bit machines?
> 
> 
> >Ok, what what if everything can be read atomically by itself?
> >
> >You're still busted the minute you need to export any sort of
> >compound stat.
> 
> Which is why the backends should not do anything other than maintain the
> raw data. If there is atomic data than can cause inconsistency, then a
> dropped UDP packet will do the same.

The UDP packet (a COPY) can contain a consistant snapshot of the data.
If you have dependancies, you fit a consistant snapshot into a single
packet.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein

* Philip Warner <[EMAIL PROTECTED]> [010315 16:46] wrote:
> At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
> >
> >Lost data is probably better than incorrect data.  Either use locks
> >or a copying mechanism.  People will depend on the data returned
> >making sense.
> >
> 
> But with per-backend data, there is only ever *one* writer to a given set
> of counters. Everyone else is a reader.

This doesn't prevent a reader from getting an inconsistant view.

Think about a 64bit counter on a 32bit machine.  If you charged per
megabyte, wouldn't it upset you to have a small chance of loosing
4 billion units of sale?

(ie, doing a read after an addition that wraps the low 32 bits
but before the carry is done to the top most signifigant 32bits?)

Ok, what what if everything can be read atomically by itself?

You're still busted the minute you need to export any sort of
compound stat.

If A, B and C need to add up to 100 you have a read race.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Philip Warner

At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
>
>Lost data is probably better than incorrect data.  Either use locks
>or a copying mechanism.  People will depend on the data returned
>making sense.
>

But with per-backend data, there is only ever *one* writer to a given set
of counters. Everyone else is a reader.



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Philip Warner

At 06:57 15/03/01 -0500, Jan Wieck wrote:
>
>And  shared  memory has all the interlocking problems we want
>to avoid.

I suspect that if we keep per-backend data in a separate area, then we
don;t need locking since there is only one writer. It does not matter if a
reader gets an inconsistent view, the same as if you drop a few UDP packets.


>What about a collector deamon, fired up by the postmaster and
>receiving UDP packets from the backends. 

This does sound appealing; it means that individual backend data (IO etc)
will survive past the termination of the backend. I'd like to see the stats
survive the death of the collector if possible, possibly even survive a
stop/start of the postmaster.


>Now whatever the backend has to tell the collector, it simply
>throws  a UDP packet into his direction. If the collector can
>catch it or not, not the backends problem.

If we get the backends to keep the stats they are sending in local counters
as well, then they can send the counter value (not delta) each time, which
would mean that the collector would not 'miss' anything - just it's
operations/sec might see a hiccough. This could have a sidebenefit that(if
wewanted to?) we could allow a client to query their own counters to get an
idea of the costs of their queries.

When we need to reset the counters that should be done explicitly, I think.



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein

* Philip Warner <[EMAIL PROTECTED]> [010315 16:14] wrote:
> At 06:57 15/03/01 -0500, Jan Wieck wrote:
> >
> >And  shared  memory has all the interlocking problems we want
> >to avoid.
> 
> I suspect that if we keep per-backend data in a separate area, then we
> don;t need locking since there is only one writer. It does not matter if a
> reader gets an inconsistent view, the same as if you drop a few UDP packets.

No, this is completely different.

Lost data is probably better than incorrect data.  Either use locks
or a copying mechanism.  People will depend on the data returned
making sense.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Jan Wieck

Tom Lane wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > What about a collector deamon, fired up by the postmaster and
> > receiving UDP packets from the backends. Under heavy load, it
> > might miss some statistic messages, well, but that's  not  as
> > bad as having locks causing backends to loose performance.
>
> Interesting thought, but we don't want UDP I think; that just opens
> up a whole can of worms about checking access permissions and so forth.
> Why not a simple pipe?  The postmaster creates the pipe and the
> collector daemon inherits one end, while all the backends inherit the
> other end.

I don't think so - though I haven't tested the following yet,
but AFAIR it's correct.

Have the postmaster creating two UDP sockets before it  forks
off the collector. It can examine the peer addresses of both,
so they don't need well known port numbers,  it  can  be  the
randomly  ones  assigned  by  the kernel. Thus, we don't need
SO_REUSE on them either.

Now, since the collector is forked off by the postmaster,  it
knows  the  peer  address  of the other socket. And since all
backends get forked off from the postmaster as well,  they'll
all  use  the  same  peer  address,  don't  they?  So all the
collector has to look at is the sender address including port
number  of  the  packets.  It needs to be what the postmaster
examined, anything else is from someone else and goes to  bit
heaven.  The  same  way the backends know where to send their
statistics.

If I'm right that in the case of fork()  all  children  share
the  same  socket  with the same peer address, then it's even
safe in the case the collector dies. The postmaster can still
hold the collectors socket and will notice that the collector
died (due to a wait() returning it's PID)  and  can  fire  up
another one. Again some packets got lost (plus all the so far
collected statistics, hmmm - aint that a cool  way  to  reset
statistic  counters - killing the collector?), but it did not
disturb any live backend in any way. They will never get  any
signal,  don't  care  about what's done with their statistics
and such. They just do their work...


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Jan Wieck

Tom Lane wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > What about a collector deamon, fired up by the postmaster and
> > receiving UDP packets from the backends. Under heavy load, it
> > might miss some statistic messages, well, but that's  not  as
> > bad as having locks causing backends to loose performance.
>
> Interesting thought, but we don't want UDP I think; that just opens
> up a whole can of worms about checking access permissions and so forth.
> Why not a simple pipe?  The postmaster creates the pipe and the
> collector daemon inherits one end, while all the backends inherit the
> other end.

I don't think so - though I haven't tested the following yet,
but AFAIR it's correct.

Have the postmaster creating two UDP sockets before it  forks
off the collector. It can examine the peer addresses of both,
so they don't need well known port numbers,  it  can  be  the
randomly  ones  assigned  by  the kernel. Thus, we don't need
SO_REUSE on them either.

Now, since the collector is forked off by the postmaster,  it
knows  the  peer  address  of the other socket. And since all
backends get forked off from the postmaster as well,  they'll
all  use  the  same  peer  address,  don't  they?  So all the
collector has to look at is the sender address including port
number  of  the  packets.  It needs to be what the postmaster
examined, anything else is from someone else and goes to  bit
heaven.  The  same  way the backends know where to send their
statistics.

If I'm right that in the case of fork()  all  children  share
the  same  socket  with the same peer address, then it's even
safe in the case the collector dies. The postmaster can still
hold the collectors socket and will notice that the collector
died (due to a wait() returning it's PID)  and  can  fire  up
another one. Again some packets got lost (plus all the so far
collected statistics, hmmm - aint that a cool  way  to  reset
statistic  counters - killing the collector?), but it did not
disturb any live backend in any way. They will never get  any
signal,  don't  care  about what's done with their statistics
and such. They just do their work...


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Tom Lane

Jan Wieck <[EMAIL PROTECTED]> writes:
> What about a collector deamon, fired up by the postmaster and
> receiving UDP packets from the backends. Under heavy load, it
> might miss some statistic messages, well, but that's  not  as
> bad as having locks causing backends to loose performance.

Interesting thought, but we don't want UDP I think; that just opens
up a whole can of worms about checking access permissions and so forth.
Why not a simple pipe?  The postmaster creates the pipe and the
collector daemon inherits one end, while all the backends inherit the
other end.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Jan Wieck

Bruce Momjian wrote:
>
> Yes, it seems storing info in shared memory and having a system table to
> access it is the way to go.

Depends,

first  of all we need to know WHAT we want to collect.  If we
talk about block read/write statistics  and  such  on  a  per
table  base, which is IMHO the most accurate thing for tuning
purposes, then we're talking about an infinite size of shared
memory perhaps.

And  shared  memory has all the interlocking problems we want
to avoid.

What about a collector deamon, fired up by the postmaster and
receiving UDP packets from the backends. Under heavy load, it
might miss some statistic messages, well, but that's  not  as
bad as having locks causing backends to loose performance.

The  postmaster  could already provide the UDP socket for the
backends, so the collector can know  the  peer  address  from
which  to  accept  statistics messages only. Any message from
another peer address is  simply  ignored.   For  getting  the
statistics  out  of  it,  the  collector  has  his own server
socket, using TCP and providing some lookup protocol.

Now whatever the backend has to tell the collector, it simply
throws  a UDP packet into his direction. If the collector can
catch it or not, not the backends problem.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #



_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Performance monitor signal handler

2001-03-13 Thread Bruce Momjian

> At 13:34 12/03/01 -0800, Alfred Perlstein wrote:
> >Is it possible
> >to have a spinlock over it so that an external utility can take a snapshot
> >of it with the spinlock held?
> 
> I'd suggest that locking the stats area might be a bad idea; there is only
> one writer for each backend-specific chunk, and it won't matter a hell of a
> lot if a reader gets inconsistent views (since I assume they will be
> re-reading every second or so). All the stats area should contain would be
> a bunch of counters with timestamps, I think, and the cost up writing to it
> should be kept to an absolute minimum.
> 
> 
> >
> >just some ideas..
> >
> 
> Unfortunatley, based on prior discussions, Bruce seems quite opposed to a
> shared memory solution.

No, I like the shared memory idea.  Such an idea will have to wait for
7.2, and second, there are limits to how much shared memory I can use. 

Eventually, I think shared memory will be the way to go.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Performance monitor signal handler

2001-03-13 Thread Alfred Perlstein

* Philip Warner <[EMAIL PROTECTED]> [010313 06:42] wrote:
> >
> >This ought to always give a consistant snapshot of the file to
> >whomever opens it.
> >
> 
> I think Tom has previously stated that there are technical reasons not to
> do IO in signal handlers, and I have philosophical problems with
> performance monitors that ask 50 backends to do file IO. I really do think
> shared memory is TWTG.

I wasn't really suggesting any of those courses of action, all I
suggested was using rename(2) to give a seperate appilcation a
consistant snapshot of the stats.

Actually, what makes the most sense (although it may be a performance
killer) is to have the backends update a system table that the external
app can query.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-13 Thread Philip Warner

>
>This ought to always give a consistant snapshot of the file to
>whomever opens it.
>

I think Tom has previously stated that there are technical reasons not to
do IO in signal handlers, and I have philosophical problems with
performance monitors that ask 50 backends to do file IO. I really do think
shared memory is TWTG.





Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-13 Thread Bruce Momjian

> >
> >This ought to always give a consistant snapshot of the file to
> >whomever opens it.
> >
> 
> I think Tom has previously stated that there are technical reasons not to
> do IO in signal handlers, and I have philosophical problems with
> performance monitors that ask 50 backends to do file IO. I really do think
> shared memory is TWTG.

The good news is that right now pgmonitor gets all its information from
'ps', and only shows the query when the user asks for it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-13 Thread Bruce Momjian

> > I think Tom has previously stated that there are technical reasons not to
> > do IO in signal handlers, and I have philosophical problems with
> > performance monitors that ask 50 backends to do file IO. I really do think
> > shared memory is TWTG.
> 
> I wasn't really suggesting any of those courses of action, all I
> suggested was using rename(2) to give a seperate appilcation a
> consistant snapshot of the stats.
> 
> Actually, what makes the most sense (although it may be a performance
> killer) is to have the backends update a system table that the external
> app can query.

Yes, it seems storing info in shared memory and having a system table to
access it is the way to go.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-13 Thread Alfred Perlstein

* Philip Warner <[EMAIL PROTECTED]> [010312 18:56] wrote:
> At 13:34 12/03/01 -0800, Alfred Perlstein wrote:
> >Is it possible
> >to have a spinlock over it so that an external utility can take a snapshot
> >of it with the spinlock held?
> 
> I'd suggest that locking the stats area might be a bad idea; there is only
> one writer for each backend-specific chunk, and it won't matter a hell of a
> lot if a reader gets inconsistent views (since I assume they will be
> re-reading every second or so). All the stats area should contain would be
> a bunch of counters with timestamps, I think, and the cost up writing to it
> should be kept to an absolute minimum.
> 
> 
> >
> >just some ideas..
> >
> 
> Unfortunatley, based on prior discussions, Bruce seems quite opposed to a
> shared memory solution.

Ok, here's another nifty idea.

On reciept of the info signal, the backends collaborate to piece
together a status file.  The status file is given a temporay name.
When complete the status file is rename(2)'d over a well known
file.

This ought to always give a consistant snapshot of the file to
whomever opens it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Performance monitor signal handler

2001-03-12 Thread Philip Warner

At 13:34 12/03/01 -0800, Alfred Perlstein wrote:
>Is it possible
>to have a spinlock over it so that an external utility can take a snapshot
>of it with the spinlock held?

I'd suggest that locking the stats area might be a bad idea; there is only
one writer for each backend-specific chunk, and it won't matter a hell of a
lot if a reader gets inconsistent views (since I assume they will be
re-reading every second or so). All the stats area should contain would be
a bunch of counters with timestamps, I think, and the cost up writing to it
should be kept to an absolute minimum.


>
>just some ideas..
>

Unfortunatley, based on prior discussions, Bruce seems quite opposed to a
shared memory solution.



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-12 Thread Alfred Perlstein

* Bruce Momjian <[EMAIL PROTECTED]> [010312 12:12] wrote:
> I was going to implement the signal handler like we do with Cancel,
> where the signal sets a flag and we check the status of the flag in
> various _safe_ places.
> 
> Can anyone think of a better way to get information out of a backend?

Why not use a static area of the shared memory segment?  Is it possible
to have a spinlock over it so that an external utility can take a snapshot
of it with the spinlock held?

Also, this could work for other stuff as well, instead of overloading
a lot of signal handlers one could just periodically poll a region of
the shared segment.

just some ideas..

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



[HACKERS] Performance monitor signal handler

2001-03-12 Thread Bruce Momjian

I was going to implement the signal handler like we do with Cancel,
where the signal sets a flag and we check the status of the flag in
various _safe_ places.

Can anyone think of a better way to get information out of a backend?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])