Re: [HACKERS] Rethinking stats communication mechanisms

2006-06-17 Thread Douglas McNaught
Greg Stark <[EMAIL PROTECTED]> writes:

> PFC <[EMAIL PROTECTED]> writes:
>
>>  - Will only be of use if the command is taking a long, long time.
>>  So, it need not be realtime ; no problem if the data comes with a
>> little  delay, or not at all if the command executes quickly.
>
> I would dispute this point. Picture a system running a very short very very
> often. It may still be the main problem, may even be taking 90+% of the cpu
> time. If you got an accurate snapshot of all the currently executing queries
> you'll see it popping up suspiciously often.

Yeah, but if you turn on query logging in that case you'll see the
bajillions of short queries, so you don't need the accurate snapshot
to diagnose that.

-Doug

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Rethinking stats communication mechanisms

2006-06-17 Thread Greg Stark

PFC <[EMAIL PROTECTED]> writes:

>   - Will only be of use if the command is taking a long, long time.
>   So, it need not be realtime ; no problem if the data comes with a
> little  delay, or not at all if the command executes quickly.

I would dispute this point. Picture a system running a very short very very
often. It may still be the main problem, may even be taking 90+% of the cpu
time. If you got an accurate snapshot of all the currently executing queries
you'll see it popping up suspiciously often.

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


R: [HACKERS] Per-server univocal identifier

2006-06-17 Thread Giampaolo Tomassoni
> I assume by 'univocal' you mean unequivocal.

Yes, sorry about that: I'm writing italish...


> Can you set it up in a table per server? or in a file? or would you 
> rather use a guuid?

A per-server table will probably be my way.


> And how is this to be made available?

Well, a function would be fine.


> And is it to be unique per machine, or per cluster (since you can have 
> many postgresql clusters on one machine).

If it is a per-machine discriminator, it will be a per-node discriminator as 
well...

Also, it will be useful to people not running a cluster (like me), since they 
only need a multi-master capability on a table for a legacy app...


> cheers
> 
> andrew

Cheers,

giampaolo


> 
> Giampaolo Tomassoni wrote:
> 
> >Dears,
> >
> >I'm looking for a way to univocally identify the server on which 
> a sql function or statement is running. My idea would be 
> something close to the value returned by a 'host -f' under linux: 
> the FQDN of the host, but even a serial code or a number would be 
> fine to me. It needs only to be immutable, I guess.
> >
> >I know there is something suitable under Oracle and, even worse, 
> under mysql...
> >
> >The purpose is mostly related to a light replication problem I 
> have, in which I need to 'emulate' a multi-master replication on a table.
> >
> >I placed a question on the IRC list and I found a couple of 
> unreplied messages asking the same thing in the pgsql-general list.
> >
> >
> >  
> >
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Per-server univocal identifier

2006-06-17 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> And is it to be unique per machine, or per cluster (since you can have 
> many postgresql clusters on one machine).

Actually, there are *lots* of ambiguities there.  For instance, if you
pg_dump and reload a cluster do you want the ID to change or stay the
same?  How about copying the $PGDATA tree to another server?  How about
redirecting the same cluster to listen on a new port number?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Per-server univocal identifier

2006-06-17 Thread Andrew Dunstan


I assume by 'univocal' you mean unequivocal.

Can you set it up in a table per server? or in a file? or would you 
rather use a guuid?


And how is this to be made available?

And is it to be unique per machine, or per cluster (since you can have 
many postgresql clusters on one machine).


cheers

andrew

Giampaolo Tomassoni wrote:


Dears,

I'm looking for a way to univocally identify the server on which a sql function 
or statement is running. My idea would be something close to the value returned 
by a 'host -f' under linux: the FQDN of the host, but even a serial code or a 
number would be fine to me. It needs only to be immutable, I guess.

I know there is something suitable under Oracle and, even worse, under mysql...

The purpose is mostly related to a light replication problem I have, in which I 
need to 'emulate' a multi-master replication on a table.

I placed a question on the IRC list and I found a couple of unreplied messages 
asking the same thing in the pgsql-general list.


 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


R: [HACKERS] Per-server univocal identifier

2006-06-17 Thread Giampaolo Tomassoni
> ...omissis...
>
> Perhaps inet_server_addr() and inet_server_port() would answer.  These
> aren't super-useful on local connections, however.

No, infact. Mine are local cons...


> How "immutable" do you want it to be exactly?  The system_identifier
> embedded in pg_control might be interesting if you want something that
> will change at initdb.

The same immutability of a 'host -f' would be fine to me.


> I don't think there's a way to look at that from
> SQL but you could write a C function to access it.

I would prefer to avoid writing an external module: that way I would have to 
put more administration effort when upgrading the postgres installation.

Well, I'll resort probably to put a unique value in a table.

Or... Can I put a custom variable in pgsql.conf?

---
Giampaolo Tomassoni - IT Consultant
Piazza VIII Aprile 1948, 4
I-53044 Chiusi (SI) - Italy
Ph: +39-0578-21100


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Per-server univocal identifier

2006-06-17 Thread Tom Lane
"Giampaolo Tomassoni" <[EMAIL PROTECTED]> writes:
> I'm looking for a way to univocally identify the server on which a sql 
> function or statement is running. My idea would be something close to the 
> value returned by a 'host -f' under linux: the FQDN of the host, but even a 
> serial code or a number would be fine to me. It needs only to be immutable, I 
> guess.

Perhaps inet_server_addr() and inet_server_port() would answer.  These
aren't super-useful on local connections, however.

How "immutable" do you want it to be exactly?  The system_identifier
embedded in pg_control might be interesting if you want something that
will change at initdb.  I don't think there's a way to look at that from
SQL but you could write a C function to access it.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Per-server univocal identifier

2006-06-17 Thread Giampaolo Tomassoni
Dears,

I'm looking for a way to univocally identify the server on which a sql function 
or statement is running. My idea would be something close to the value returned 
by a 'host -f' under linux: the FQDN of the host, but even a serial code or a 
number would be fine to me. It needs only to be immutable, I guess.

I know there is something suitable under Oracle and, even worse, under mysql...

The purpose is mostly related to a light replication problem I have, in which I 
need to 'emulate' a multi-master replication on a table.

I placed a question on the IRC list and I found a couple of unreplied messages 
asking the same thing in the pgsql-general list.

Regards,

---
Giampaolo Tomassoni - IT Consultant
Piazza VIII Aprile 1948, 4
I-53044 Chiusi (SI) - Italy
Ph: +39-0578-21100


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Rethinking stats communication mechanisms

2006-06-17 Thread Tom Lane
PFC <[EMAIL PROTECTED]> writes:
>   So, the proposal :
>   On executing a command, Backend stores the command string, then  
> overwrites the counter with (counter + 1) and with the timestamp of  
> command start.
>   Periodically, like every N seconds, a separate process reads the 
> counter,  
> then reads the data, then reads the counter again.

Well, it wouldn't be "periodic", it'd be "whenever someone reads
pg_stat_activity".  I was considering solutions like this, but I'm a
little disturbed by the possibility that the would-be reader might
loop indefinitely if the source backend is constantly changing its
entry.  Still, slow reads of pg_stat_activity might be a good tradeoff
for taking overhead out of the update operation.

BTW, I think the writer would actually need to bump the counter twice,
once before and once after it modifies its stats area.  Else there's
no way to detect that you've copied a partially-updated stats entry.

>   If the backend process itself should update its process title, and this 
>  
> operation is costly, it should only be done if the current query has been  
> running for more than T seconds.

Managing that would cost more than just doing it, I think.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Exporting type OID macros in a cleaner fashion

2006-06-17 Thread Martijn van Oosterhout
On Fri, Jun 16, 2006 at 10:58:05PM -0400, Tom Lane wrote:
> The alternative I'm currently thinking about is to build and install an
> auto-generated file comparable to fmgroids.h, containing *only* the type
> OID macro #defines extracted from pg_type.h.  This would require just a
> trivial amount of sed hacking.

This is a good idea. It would be nice to be able to have stuff useful
for the frontend available without having to pullin everything for the
backend.

> I'm not entirely clear where to install such a thing though.  The
> fmgroids.h precedent suggests server/utils/fmgroids.h, but if this is
> intended for client-side use it shouldn't go under /server.  I'm
> tempted to install it as "pgtypeoids.h" at the top level of the
> installation include directory ... but then I'm not clear which source
> directory ought to generate it.

At first glance I'd get include/catalog to build it and install it.
pgtypeoids.h sounds like a good name to me and it should be in the top
level.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Rethinking stats communication mechanisms

2006-06-17 Thread PFC



It strikes me that we are using a single communication mechanism to
handle what are really two distinct kinds of data:


Interesting.

	I recently read a paper on how to get rid of locks for this kind of  
pattern.


* For the Command String

	- Problem : need to display the currently executing command in the ps  
list.

- Will only be of use if the command is taking a long, long time.
	So, it need not be realtime ; no problem if the data comes with a little  
delay, or not at all if the command executes quickly.
	People are only interested in the currently executing command to answer  
questions like "what query has my server grinding ?"


	Point : the currently executing query is only interesting to display if  
it's currently executing. If it's not, it's in the log (if enabled).


So, the proposal :

Each backend has a shared memory area where to store :
- the currently executing command (like in your proposal).
- a timestamp
- a counter

	On executing a command, Backend stores the command string, then  
overwrites the counter with (counter + 1) and with the timestamp of  
command start.


	Periodically, like every N seconds, a separate process reads the counter,  
then reads the data, then reads the counter again.
	If the counter value changed, the process is repeated. If the counter  
value did not change, the command string did not change either, so it's  
valid, and can be used.


Other thoughts :

	If the backend process itself should update its process title, and this  
operation is costly, it should only be done if the current query has been  
running for more than T seconds. However syscalls for getting the current  
time are costly. A separate process can update a counter in shared memory  
with the current time every N seconds, and the backend can check it.


	The main point is that if this value is written to every few seconds, but  
read often by only one process ; or written often but read seldom, there  
will not be a lot of interprocessor cache trashing on it.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-06-17 Thread Andrew Dunstan



Jim Nasby wrote:


On Jun 16, 2006, at 12:01 PM, Josh Berkus wrote:

First thing as soon as I have a login, of course, is to set up a  
Buildfarm

instance.



Keep in mind that buildfarm clients and benchmarking stuff don't  
usually mix well.




On a fast machine like this a buildfarm run is not going to take very 
long. You could run those once a day at times of low demand. Or even 
once or twice a week.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Rethinking stats communication mechanisms

2006-06-17 Thread Tom Lane
In view of my oprofile results
http://archives.postgresql.org/pgsql-hackers/2006-06/msg00859.php
I'm thinking we need some major surgery on the way that the stats
collection mechanism works.

It strikes me that we are using a single communication mechanism to
handle what are really two distinct kinds of data:

* Current-state information, eg, what backends are alive and what
commands they are currently working on.  Ideally we'd like this type of
info to be 100% up-to-date.  But once a particular bit of information
(eg a command string) is obsolete, it's not of interest anymore.

* Event counts.  These accumulate and so past information is still
important.  On the other hand, it's not so critical that the info be
completely up-to-date --- the central counters can lag behind a bit,
so long as events eventually get counted.

I believe the stats code was designed with the second case in mind,
but we've abused it to handle the first case, and that's why we've
now got performance problems.

If we are willing to assume that the current-state information is of
fixed maximum size, we could store it in shared memory.  (This
suggestion already came up in the recent thread about ps_status,
and I think it's been mentioned before too --- but my point here is
that we have to separate this case from the event-counting case.)
The only real restriction we'd be making is that we can only show the
first N characters of current command string, but we're already
accepting that limitation in the existing stats code.  (And we could
make N whatever we wanted, without worrying about UDP datagram limits.)
I'm envisioning either adding fields to the PGPROC array, or perhaps
better using a separate array with an entry for each backend ID.
Backends would write their status info into this array and any
interested backend could read it out again.  The stats collector
process needn't be involved at all AFAICS.  This eliminates any
process-dispatch overhead to report command start or command
termination.  Instead we'd have some locking overhead, but contention
ought to be low enough that that's not a serious problem.  I'm assuming
a separate lock for each array entry so that backends don't contend with
each other to update their entries; contention occurs only when someone
is actively reading the information.  We should probably use LWLocks not
spinlocks because the time taken to copy a long command string into the
shared area would be longer than we ought to hold a spinlock (but this
seems a bit debatable given the expected low contention ... any
thoughts?)

The existing stats collection mechanism seems OK for event counts,
although I'd propose two changes: one, get rid of the separate
buffer process, and two, find a way to emit event reports in a
time-driven way rather than once per transaction commit.  I'm a bit
vague about how to do the latter at the moment.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread Tom Lane
Josh Berkus  writes:
>> Please dump some of the WAL segments with xlogdump so we can get a
>> feeling for what's in there.

> OK, will do on Monday's test run.   Is it possible for me to run this at the 
> end of the test run, or do I need to freeze it in the middle to get useful 
> data?

I'd just copy off a random sample of WAL segment files while the run is
proceeding.  You don't need very many, half a dozen at most.

> Also, we're toying with the idea of testing full_page_writes=off for Solaris.
> The Solaris engineers claim that it should be safe on Sol10 + Sun hardware.  
> I'm not entirely sure that's true; is there a destruction test of the bug 
> that caused us to remove that option?

The bug that made us turn it off in the 8.1 branch had nothing to do
with hardware reliability or the lack thereof.  As for testing, will
they let you yank the power cord?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread Josh Berkus
Tom,

> Please dump some of the WAL segments with xlogdump so we can get a
> feeling for what's in there.

OK, will do on Monday's test run.   Is it possible for me to run this at the 
end of the test run, or do I need to freeze it in the middle to get useful 
data?

Also, we're toying with the idea of testing full_page_writes=off for Solaris.  
The Solaris engineers claim that it should be safe on Sol10 + Sun hardware.  
I'm not entirely sure that's true; is there a destruction test of the bug 
that caused us to remove that option?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community

2006-06-17 Thread Jim Nasby

On Jun 16, 2006, at 12:01 PM, Josh Berkus wrote:
First thing as soon as I have a login, of course, is to set up a  
Buildfarm

instance.


Keep in mind that buildfarm clients and benchmarking stuff don't  
usually mix well.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community

2006-06-17 Thread Jim Nasby

On Jun 16, 2006, at 12:01 PM, Josh Berkus wrote:


Folks,

I am thrill to inform you all that Sun has just donated a fully  
loaded
T2000 system to the PostgreSQL community, and it's being setup by  
Corey

Shields at OSL (osuosl.org) and should be online probably early next
week. The system has


So this system will be hosted by Open Source Lab in Oregon.  It's  
going to

be "donated" to Software In the Public Interest, who will own for the
PostgreSQL fund.

We'll want to figure out a scheduling system to schedule  
performance and
compatibility testing on this machine; I'm not sure exactly how  
that will
work.  Suggestions welcome.  As a warning, Gavin Sherry and I have  
a bunch

of pending tests already to run.

First thing as soon as I have a login, of course, is to set up a  
Buildfarm

instance.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that  
your

   message can get through to the mailing list cleanly



--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread Tom Lane
Josh Berkus  writes:
> I would like to see some checking of this, though.  Currently I'm doing 
> testing of PostgreSQL under very large numbers of connections (2000+) and am 
> finding that there's a huge volume of xlog output ... far more than 
> comparable RDBMSes.   So I think we are logging stuff we don't really have 
> to.

Please dump some of the WAL segments with xlogdump so we can get a
feeling for what's in there.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread Josh Berkus
Tom, Paolo,

> Yeah, it's difficult to believe that multixact stuff could form a
> noticeable fraction of the total WAL load, except perhaps under really
> pathological circumstances, because the code just isn't supposed to be
> exercised often.  So I don't think this is worth pursuing.  Paolo's free
> to try to prove the opposite of course ... but I'd want to see numbers
> not speculation.

I would like to see some checking of this, though.  Currently I'm doing 
testing of PostgreSQL under very large numbers of connections (2000+) and am 
finding that there's a huge volume of xlog output ... far more than 
comparable RDBMSes.   So I think we are logging stuff we don't really have 
to.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread Tom Lane
paolo romano <[EMAIL PROTECTED]> writes:
> Concerning the prepare state of two phase commit, as I was pointing out in my 
> previous post, shared locks can safely be released once a transaction gets 
> precommitted, hence they do not have to be made durable.

The above statement is plainly wrong.  It would for example allow
violation of FK constraints.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-06-17 Thread Josh Berkus
Tom,

> 18% in s_lock is definitely bad :-(.  Were you able to determine which
> LWLock(s) are accounting for the contention?

Gavin Sherry and Tom Daly (Sun) are currently working on identifying the 
problem lock using DLWLOCK_STATS.  Any luck, Gavin?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread Heikki Linnakangas

On Sat, 17 Jun 2006, paolo romano wrote:

The original point I was moving is if there were any concrete reason 
(which still I can't see) to require Multixacts recoverability (by means 
of logging).
Concerning the prepare state of two phase commit, as I was pointing out 
in my previous post, shared locks can safely be released once a 
transaction gets precommitted, hence they do not have to be made 
durable.


No, it's not safe to release them until 2nd phase commit.

Imagine table foo and table bar. Table bar has a foreign key reference to 
foo.


1. Transaction A inserts a row to bar, referencing row R in foo. This 
acquires a shared lock on R.

2. Transaction A precommits, releasing the lock.
3. Transaction B deletes R. The new row inserted by A is not visible to 
B, so the delete succeeds.
4. Transaction A and B commit. Oops, the new row in bar references R that 
doesn't exist anymore.


Holding the lock until the true end of transaction, the 2nd phase 
of commit, blocks B from deleting R.


- Heikki

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] oprofile results for stats collector test

2006-06-17 Thread Tom Lane
I've gotten some insight into the stats collection issues by monitoring
Bruce's test case with oprofile (http://oprofile.sourceforge.net/).

Test conditions: PG CVS HEAD, built with --enable-debug --disable-cassert
(debug symbols are needed for oprofile), on current Fedora Core 5
(Linux kernel 2.6.16-1.2133_FC5 #1 SMP, gcc 4.1.1), running on dual Xeon
EM64T with hyperthreading enabled.  All configuration parameters are at
defaults except for stats_command_string.  I extended the test script to
be 24 "SELECT 1;" commands so that it would run long enough to get a
trustworthy profile.  In the profile data below, I cut off each listing
where the percentages fell below about 1% of global runtime.


With stats_command_string off:

21.7 sec to run script (with oprofile off)
22.7 sec to run script (with oprofile active)
These numbers stay pretty consistent over repeated trials.

oprofile global report:
CPU: P4 / Xeon with 2 hyper-threads, speed 2793.03 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 24
GLOBAL_POWER_E...|
  samples|  %|
--
   156544 47.2570 /usr/lib/debug/lib/modules/2.6.16-1.2133_FC5/vmlinux
84114 25.3921 /home/tgl/testversion/bin/postgres
63873 19.2818 /lib64/libc-2.4.so
13621  4.1119 /home/tgl/testversion/bin/psql
10101  3.0493 /home/tgl/testversion/lib/libpq.so.5.0
 1324  0.3997 /usr/bin/oprofiled

kernel (note percentages here are percentage of kernel time not total time):
samples  %symbol name
1932612.3454  schedule
11738 7.4982  _raw_spin_lock
11631 7.4299  __switch_to
9897  6.3222  mwait_idle
7153  4.5693  __find_first_bit
6395  4.0851  try_to_wake_up
3972  2.5373  unix_stream_recvmsg
3813  2.4357  system_call
3739  2.3885  thread_return
3612  2.3073  find_busiest_group
3446  2.2013  resched_task

postgres:
samples  %symbol name
6415  7.6266  base_yyparse
6115  7.2699  AllocSetAlloc
3525  4.1907  hash_search

libc:
samples  %symbol name
9276 14.5226  memcpy
8686 13.5989  _int_malloc
5865  9.1823  strncpy
5538  8.6703  strlen
5286  8.2758  strspn
4351  6.8120  vfprintf


With stats_command_string on:

45.2 sec to run script (with oprofile off)
47.5 sec to run script (with oprofile active)
Also repeatable.

oprofile global report:
CPU: P4 / Xeon with 2 hyper-threads, speed 2793.03 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 24
GLOBAL_POWER_E...|
  samples|  %|
--
   785756 71.9386 /usr/lib/debug/lib/modules/2.6.16-1.2133_FC5/vmlinux
   135820 12.4348 /home/tgl/testversion/bin/postgres
   105828  9.6889 /lib64/libc-2.4.so
19903  1.8222 /home/tgl/testversion/lib/libpq.so.5.0
19293  1.7663 /home/tgl/testversion/bin/psql
 7489  0.6856 /ip_conntrack
 4555  0.4170 /usr/bin/oprofiled

kernel:
samples  %symbol name
62337 7.9334  security_port_sid
61566 7.8353  schedule
37897 4.8230  _raw_spin_lock
36410 4.6338  __switch_to
26580 3.3827  mwait_idle
21797 2.7740  __find_first_bit
19118 2.4331  avc_has_perm_noaudit
18929 2.4090  sys_select
17005 2.1642  system_call
15979 2.0336  do_select
15762 2.0060  fget
14838 1.8884  try_to_wake_up
13488 1.7166  find_busiest_group
11956 1.5216  _raw_read_lock
11685 1.4871  thread_return
10639 1.3540  resched_task
10542 1.3416  copy_user_generic

postgres:
samples  %symbol name
7817  5.7554  PgstatBufferMain
7541  5.5522  base_yyparse
7247  5.3357  AllocSetAlloc
6725  4.9514  hash_search

libc:
samples  %symbol name
1501214.1853  strncpy
1226511.5896  _int_malloc
1194911.2910  memcpy


This system is running with SELinux enabled (in permissive mode).
I presume that if I turned it off, security_port_sid() and
avc_has_perm_noaudit() would drop off the radar screen but things
wouldn't change much otherwise.

Also, you'd expect to see PgstatCollectorMain() somewhere but it's
not in the oprofile report at all.  I believe that gcc sees that it's
a static singly-referenced function and inlines it into
PgstatBufferMain; so the time shown above for PgstatBufferMain should
be read as the joint time for those two functions.  Also it looks like
all of the pgstat_recv_foo functions got inlined into it too.  The
only pgstat functions that show separately in the oprofile output are

1214  0.8938  pgstat_add_backend
860   0.6332  pgstat_report_tabstat
749   0.5515  pgstat_report_activity
434   0.3195  pgstat_write_statsfile
330   0.2430  pgstat_setheader
196   0.1443  pgstat_count_xact_commit
106   0.0780  pgstat_send
910.0670  pgstat_get_db_entry

So the total time spent directly in pgstat.c is less than 9% of the
postgres executable's time, or only about 1%

Re: [HACKERS] postgresql and process titles

2006-06-17 Thread Jim Nasby

On Jun 13, 2006, at 9:42 PM, Kris Kennaway wrote:
BTW, there's another FBSD performance odditiy I've run across.  
Running


pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 &

which dumps the email_contrib table to bzip2 then to disk, the OS
won't use more than 1 CPU on an SMP system... unless the data is
cached. According to both gstat and systat -v, the system isn't I/O
bound; both are reporting the RAID10 with that table on it as only
about 10% busy. If I let that command run for a bit then cancel it
and re-start it so that the beginning of that table is in cache, it
will use one entire CPU for bzip2, which is what I'd expect to  
happen.


Hmm, odd..maybe something with the scheduler.  I'd need access to a
test case to be able to figure it out though.


well, pg_dump of any sizeable database piped through bzip2 or gzip  
should show this. Try:


pg_dump large_database | bzip2 > databasedump.sql.bz2
--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] postgresql and process titles

2006-06-17 Thread Jim Nasby

Moving to osdldbt-general and dropping Tom and Marc.

On Jun 13, 2006, at 1:18 PM, Kris Kennaway wrote:

On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:


Unless supersmack has improved substantially, you're unlikely to find
much interest. Last I heard it was a pretty brain-dead benchmark.  
DBT2/3
(http://sourceforge.net/projects/osdldbt) is much more realistic  
(based

on TPC-C and TPC-H).


Have you tried to compile this on FreeBSD?  It looks like it (dbt1 at
least) will need a moderate amount of hacking - there are some Linux
assumptions in the source and the configure script makes assumptions
about where things are installed that cannot be overridden on the
commandline.


Yeah, there's a number of issues that would need to be addressed. The  
biggest problem as I see it is that all the framework to run tests is  
pretty tied together, without any modularity. The run script itself  
launches all the stats collecting stuff, assumes it needs to happen  
on the local machine, etc.


There is consensus that this kind of thing needs to be done to  
improve dbt*, it's just a matter of doing it. Also, it's desirable to  
improve the benchmarks so that they can store results in a database,  
and set them up so that the monitoring stuff is common to all the  
different tests.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Parallel index build during COPY

2006-06-17 Thread Jim Nasby

On Jun 15, 2006, at 9:45 PM, Toru SHIMOGAKI wrote:

NTT has some ideas about index creation during a large amount of  
data loading. Our approach is the following: index tuples are  
created at the same time as heap tuples and added into heapsort. In  
addition, we use old index tuples as sorted list if the target  
table has already data. It is not necessary for data loader to sort  
all the index tuples including old ones. After only new index  
tuples are sorted, both sorted lists are merged and the whole index  
is built. It can save both CPU resources and disk accesses  
dramatically, especially if the target table has already so many  
tuples.
This approach needs to acquire a table lock, which is unlike COPY's  
lock mode, so we have developed it as another bulk load tool. We  
will talk about it in PostgreSQL Anniversary Conference at Toronto.  
Thank you for Josh’s coordination.


So does that mean you're able to do all that without hacking the back- 
end? Impressive. :)


I look forward to hearing about it.
--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread paolo romano
<[EMAIL PROTECTED]>Yeah, it's difficult to believe that multixact stuff could form anoticeable fraction of the total WAL load, except perhaps under reallypathological circumstances, because the code just isn't supposed to beexercised often.  So I don't think this is worth pursuing.  Paolo's freeto try to prove the opposite of course ... but I'd want to see numbersnot speculation.   regards, tom laneTom is right, mine are indeed just plain speculations, motivated by my original doubt concerning whether there were hidden reasons for requiring multixacts recoverability.I don't know if I'll find the time to do some performance tests, at least in the short term, but I've enjoyed to exchange my views with you all, so thanks a lot for your feedback!Just a curiosity, what kind of benchmarks
 would you use to evaluate this effect? I am quite familiar with TPC-C and TPC-W, but i am a newbie of postgresql community so i was wondering if you were using any reference benchmark Chiacchiera con i tuoi amici in tempo reale!  http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

Re: [HACKERS] Test request for Stats collector performance improvement

2006-06-17 Thread Tom Lane
Bruce Momjian  writes:
> 1)  Run this script and record the time reported:
>   ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.script

One thing you neglected to specify is that the test must be done on a
NON ASSERT CHECKING build of CVS HEAD (or recent head, at least).
On these trivial "SELECT 1" commands, an assert-checking backend is
going to spend over 50% of its time doing end-of-transaction assert
checks.  I was reminded of this upon trying to do oprofile:

CPU: P4 / Xeon with 2 hyper-threads, speed 2793.03 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 24
samples  %symbol name
129870   37.0714  AtEOXact_CatCache
6711219.1571  AllocSetCheck
16611 4.7416  AtEOXact_Buffers
10054 2.8699  base_yyparse
7499  2.1406  hash_seq_search
7037  2.0087  AllocSetAlloc
4267  1.2180  hash_search
4060  1.1589  AtEOXact_RelationCache
2537  0.7242  base_yylex
1984  0.5663  grouping_planner
1873  0.5346  LWLockAcquire
1837  0.5244  AllocSetFree
1808  0.5161  exec_simple_query
1763  0.5032  ExecutorStart
1527  0.4359  PostgresMain
1464  0.4179  MemoryContextAllocZeroAligned

Let's be sure we're all measuring the same thing.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread paolo romano
In PostgreSQL, shared locks are not taken when just reading data. They're used to enforce foreign key constraints. When inserting a row to a table with a foreign key, the row in the parent table is locked to keep another transaction from deleting it. It's not safe to release the lock before end of transaction.Releasing shared locks (whether used for plain reading or enforcing foreign keys) before transaction end would be clearly wrong.The original point I was moving is if there were any concrete reason (which still I can't see) to require Multixacts recoverability (by means of logging). Concerning the prepare state of two phase commit, as I was pointing out in my previous post, shared locks can safely be released once a transaction gets precommitted, hence they do not have to be made
 durable. Chiacchiera con i tuoi amici in tempo reale!  http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

Re: [HACKERS] PG_MODULE_MAGIC

2006-06-17 Thread Thomas Hallgren

Tom Lane wrote:

No, each major release (8.2, 8.3, etc).  There are hardly ever any major
releases where you wouldn't need a new compilation anyway ...

  

True. I'm all in favor of a magic used this way. It will save me some grief.

Regards,
Thomas Hallgren


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Also, multixacts are only used when two transactions hold a shared lock 
> on the same row.

Yeah, it's difficult to believe that multixact stuff could form a
noticeable fraction of the total WAL load, except perhaps under really
pathological circumstances, because the code just isn't supposed to be
exercised often.  So I don't think this is worth pursuing.  Paolo's free
to try to prove the opposite of course ... but I'd want to see numbers
not speculation.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PG_MODULE_MAGIC

2006-06-17 Thread Tom Lane
Thomas Hallgren <[EMAIL PROTECTED]> writes:
> A module magic patch was added recently and I'm a bit uncertain what the 
> implications are 
> for the external PL modules. Does it affect them at all?

Yes.

> Will I need to provide separate 
> binaries for each bug fix release even though the API's do not change?

No, each major release (8.2, 8.3, etc).  There are hardly ever any major
releases where you wouldn't need a new compilation anyway ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread Heikki Linnakangas

On Sat, 17 Jun 2006, paolo romano wrote:

* Reduced I/O Activity: during transaction processing: current workloads 
are typically dominated by reads (rather than updates)... and reads give 
rise to multixacts (if there are at least two transactions reading the 
same page or if an explicit lock request is performed through 
heap_lock_tuple). And (long) transactions can read a lot of tuples, 
which directly translates into (long) multixact logging sooner or later. 
To accurately estimate the possible performance gain one should perform 
some profiling, but at first glance ISTM that there are good 
potentialities.


Read-only transactions don't acquire shared locks. And updating 
transcations emit WAL records anyway; the additional I/O caused by 
multixact records is negligable.


Also, multixacts are only used when two transactions hold a shared lock 
on the same row.


* Reduced Recovery Time: because of  shorter logs & less data 
structures to rebuild... and reducing recovery time helps improving 
system availability so should not be overlooked.


I doubt the multixact stuff makes much difference compared to all other 
WAL traffic.


In fact, logging the multixact stuff could be skipped when no two-phase 
transactions are involved. The problem is, you don't know if a transaction is one 
phase or two phase before you see COMMIT or PREPARE TRANSACTION.


- Heikki

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread Heikki Linnakangas

On Sat, 17 Jun 2006, paolo romano wrote:

When a transaction enters (successfully) the prepared state it only 
retains its exclusive locks and releases any shared locks (i.e. 
multixacts)... or, at least, that's how it should be in principle 
according to serializiaton theory, i haven't yet checked out if this is 
what is done in postgresql .


In PostgreSQL, shared locks are not taken when just reading data. They're 
used to enforce foreign key constraints. When inserting a row to a table 
with a foreign key, the row in the parent table is locked to 
keep another transaction from deleting it. It's not safe to release the 
lock before end of transaction.


- Heikki

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Preventing DELETE and UPDATE without a WHERE clause?

2006-06-17 Thread Mark Woodward
> On 6/16/06, Mark Woodward <[EMAIL PROTECTED]> wrote:
>> > Chris Campbell <[EMAIL PROTECTED]> writes:
>> >> I heard an interesting feature request today: preventing the
>> >> execution of a DELETE or UPDATE query that does not have a WHERE
>> clause.
>> >
>> > These syntaxes are required by the SQL spec.  Furthermore, it's easy
>> > to imagine far-more-probable cases in which the system wouldn't detect
>> > that you'd made a mistake, eg
>> >
>> >   DELETE FROM tab WHERE key > 1
>> >
>> > where you meant to type
>> >
>> >   DELETE FROM tab WHERE key > 1000
>> >
>> > I suggest counseling your client to learn how to use BEGIN/ROLLBACK.
>> > This proposal strikes me as falling squarely within the rule about
>> > "design a system that even a fool can use, and only a fool will want
>> > to use it".
>> >
>> Just a theory, couldn't a trigger be set up that would case the query to
>> tank if it touches too many rows?
>>
>
> i haven't tried but maybe a FOR STATEMENT trigger AFTER the event can
> ask ROW_COUNT using GET DIAGNOSTICS?

Well, if you *can't" do it in a trigger, maybe that's a valid modification
for Hackers to consider.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] PG_MODULE_MAGIC

2006-06-17 Thread Thomas Hallgren
A module magic patch was added recently and I'm a bit uncertain what the implications are 
for the external PL modules. Does it affect them at all? Will I need to provide separate 
binaries for each bug fix release even though the API's do not change? Exactly how is the 
magic determined?


Regards,
Thomas Hallgren


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] MultiXacts & WAL

2006-06-17 Thread paolo romano
Tom Lane <[EMAIL PROTECTED]> ha scritto:  paolo romano <[EMAIL PROTECTED]> writes:> The point i am missing is the need to be able to completely recover> multixacts offsets and members data. These carry information about> current transactions holding shared locks on db tuples, which should> not be essential for recovery purposes.This might be optimizable if we want to assume that multixacts will neverbe used for any purpose except holding locks, but that seems a bit shortsighted.  Is there any actually significant advantage to not loggingthis information?   regards, tom laneI can see two main advantages: * Reduced I/O Activity: during transaction processing: current workloads are typically dominated by reads (rather than
 updates)... and reads give rise to multixacts (if there are at least two transactions reading the same page or if an explicit lock request is performed through heap_lock_tuple). And (long) transactions can read a lot of tuples, which directly translates into (long) multixact logging sooner or later. To accurately estimate the possible performance gain one should perform some profiling, but at first glance ISTM that there are good potentialities. * Reduced Recovery Time: because of  shorter logs & less data structures to rebuild... and reducing recovery time helps improving system availability so should not be overlooked.Regards,   Paolo Chiacchiera con i tuoi amici in tempo reale!  http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com