date:20071026

Re: [HACKERS] 8.3 GSS Issues

2007-10-26 Thread Magnus Hagander

On Thu, Oct 25, 2007 at 05:39:37PM -0700, Henry B. Hotz wrote:
 
 On Oct 25, 2007, at 3:27 PM, Stephen Frost wrote:
 
 * Henry B. Hotz ([EMAIL PROTECTED]) wrote:

 What you're asking for is basically a krb_match_realm parameter, or
 do I
 understand you wrong?

 I'm asking for name matching to be done i.a.w. the gssapi
 recommendations.  That's all I want, but it's actually necessary
 for this feature to be at all usable in my environment.  If we don't
 then I suggest we pull this feature until it can be done correctly.

I know what you want, that's fairly obvious. I'm only asking about *how* to
do it the best way.


 What the krb5 method does is IMO a documented bug.  The realm name  
 is part
 of the name.
 
 As I explained at some length you cannot assume the username (first
 component of the principal) has any meaning by itself, except in  
 small
 deployments with no external trust agreements.  Kerberos (and AD) are
 designed to support larger infrastructures with multiple  
 organizations.
 
 This isn't unexpected for PG as the current krb5 support does  
 this.  I'm
 not a big fan of it but at the same time I don't feel it's  
 justification
 to drop it from 8.3.  Having it only allow the default realm would  
 be an
 option which could work in 8.3, imv.
 
 I don't think the fact that the existing krb5 code does the wrong  
 thing (and can't be used in an environment with cross-realm  
 agreements) is justification for doing the wrong thing in a new  
 capability.  The code in my original patch would do the latter  
 (default realm only).
 
 More precisely:  if you do a gss_import_name() on smith and  
 [EMAIL PROTECTED] you get the same internal representation, and  
 gss_compare_name() will tell you they're the same.  Also  
 gss_compare_name() will tell you [EMAIL PROTECTED] is different  
 from either of the first two.

Wouldn't using a specific parameter like krb_match_realm=YOUR.REALM that
you set in the config file be more flexible? In that it actually allows
scenarios like server/resource domains (not sure how common they are in
unix krb setups, but they're certainly not unfamiliar in the Windows AD
world)?

 If we don't use gss_compare_name(), or some similar mechanism, to  
 compare connection names to PG usernames, then I don't think GSSAPI  
 support should be included in 8.3.

I think that's a horrible idea, given that it works perfectly fine the way
it is now for the vast majority of users.

That said, we should certainly fix it in one way or another for 8.3. But if
that fails, I see no reason at all to pull the feature.

 Longer term (since it's likely too
 late to be accepted now), as I think has been discussed in the  
 past, PG
 could really use a .k5login-esque, either admin-only (ala  
 pg_hba.conf /
 ident map) or per-user (some sort of ALTER ROLE that a user could  
 do on
 himself?), mapping functionality.
 
 There has been discussion of a general mapping layer between  
 authentication names and authorization/role names.  I think that's  
 the way to go.  I haven't thought about who or where the  
 administration of the mapping ought to be.

Yeah, I agree that something like that would be a good long-term solution.

 For a proper discussion of this topic I recommend the section
 starting on page 64 of Sun's Security for Developers Guide, document
 816-4863.  Note that there is a discussion of how to do compares
 efficiently.  IIRC my patch did things the easy way described on
 page 67.  In the long run it's possible we'd want to do it the fast
 way described on page 69, but that's merely an optimization and might
 not be needed.

Do you have an URL for this? Or is it a book one has t buy?

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] module archive

2007-10-26 Thread Adrian Maier

On 10/25/07, Andrew Dunstan [EMAIL PROTECTED] wrote:

  From time to time people have raised the idea of a CPAN-like mechanism
 for downloading, building and installing extensions and the like (types,
 functions, sample dbs, anything not requiring Postgres itself to be
 rebuilt), and I have been thinking on this for the last few days. What
 sort of requirements would people have of such a mechanism? How do
 people envision it working?

From a user's point of view,  it would work like this :

$ pgsql-pkg   list
Installed packages :
--
pgpool-II  v.1.2.1
plR  v.8.2.0.5

$ cd /usr/src/pgsql-packages/
$ cvs  update

$ cd languages/plR
$ make upgrade

$ cd ../../compatibility/orafce
$ make install

$ pgsql-pkg   list
Installed packages :
--
pgpool-II v.1.2.1
plR v.8.2.0.7
orafce v.2.1.2


A starting point worth considering could be pkgsrc , the packaging system used
by NetBSD   ( http://www.netbsd.org/docs/software/packages.html ) .
Pkgsrc is not limited to NetBSD :  it works on other OS'es  as well
(Linux, *BSD,
Solaris, HP-UX, ... ) .

The pkgsrc is  based on a directory tree organized in categories. For
each package
there is a directory that contains the Makefile, the description, and sometimes
patches .  The user can do :  make build , make install ,  make upgrade , etc.
The sources are downloaded from the server where the project is originally
hosted : NetBSD is not duplicating the source archives on their server.
If necessary,  the build system can apply patches before compiling the
package.  This allows the NetBSD folks to adapt the package without requiring
modifications to the original sources


So,  the mechanism for PostgreSQL extensions could work this way :
(A)  The source tree would be an adapted version of pkgsrc :
- the extensions are organized into directories (based on categories)

- each extension would be a directory that basically contains a Makefile
that contains enough information for downloading the source tarball,
optionally apply patches,  and install the package(extension).

- probably it will be necessary to create a separate source tree for each
PostgreSQL version , so that the build system wouldn't be cluttered with
keeping information about which package version is compatible with
which PostgreSQL version.

- with such a system it is possible to include an extension even if the
maintainer of that particular project doesn't modify anything for making it
compatible with the extensions system .

(B) Some extensions are only sql scripts that install new objects into
a database,
while others could also contain binaries, libraries, or maybe other
stuff as well.
Therefore, the installation probably has to be two-step :
step 1 :   'make install' would copy the files in a new subdirectory
called  extensions  :
/usr/local/pgsql/extensions/bin
/usr/local/pgsql/extensions/lib
/usr/local/pgsql/extensions/install_scripts
/usr/local/pgsql/extensions/remove_scripts
/usr/local/pgsql/extensions/examples
and add the extension to the list of 'installed extensions'.

step 2 : for activating the extension inside a particular database
the user would
execute the extension's install_script.

The first step is installing the extension at the database cluster
level ,  while the
second step is installing the extension inside a particular database.
I think that
this distinction is an important one :  the user will need full control in
installing the extensions only in the databases where she/he wants to.

(C) It would be nice to have a table containing information about the extensions
currently installed .


Cheers,
Adrian Maier

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Gregory Stark

Josh Berkus [EMAIL PROTECTED] writes:

 Actually, 32 made a significant difference as I recall ... do you still have 
 the figures for that, Jignesh?

Well it made a difference but it didn't remove the bottleneck, it just moved
it. IIRC under that benchmark Jignesh was able to run with x sessions
efficiently with 8 clog buffers, x + 100 or so sessions with 16 clog buffers
and x + 200 or so sessions with 32 clog buffers.

It happened that x + 200 was  the number of sessions he wanted to run the
benchmark at so it helped the benchmark results quite a bit. But that was just
an artifact of how many sessions the benchmark needed. A user who needs 1200
sessions or who has a different transaction load might find he needs more clog
buffers to alleviate the bottleneck. And of course most (all?) normal users
use far fewer sessions and won't run into this bottleneck at all.

Raising NUM_CLOG_BUFFERS just moves around the arbitrary bottleneck. This
benchmark is useful in that it gives us an idea where the bottleneck lies for
various values of NUM_CLOG_BUFFERS but it doesn't tell us what value realistic
users are likely to bump into.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF

2007-10-26 Thread Heikki Linnakangas

Sebastien FLAESCH wrote:
 Just looked at the new features of 8.3 and realized that positioned
 updates/deletes is now possible with this new release...
 
 We would use that if we could define the cursor name with a libpq function.

I don't understand. When you open a cursor with DECLARE CURSOR, you give
it a name. Doesn't that do what you want?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] module archive

2007-10-26 Thread Martijn van Oosterhout

On Thu, Oct 25, 2007 at 12:31:30PM -0400, Andrew Dunstan wrote:
 
 From time to time people have raised the idea of a CPAN-like mechanism 
 for downloading, building and installing extensions and the like (types, 
 functions, sample dbs, anything not requiring Postgres itself to be 
 rebuilt), and I have been thinking on this for the last few days. What 
 sort of requirements would people have of such a mechanism? How do 
 people envision it working?

There's been some discussion before:
http://archives.postgresql.org/pgsql-patches/2006-05/msg00327.php

The most useful feature IMHO would be if the installed modules could be
recognised as a block (using pg_depends) so that:

1. pg_dump can see them and emit only INSTALL module
2. Uninstall removes everything that was installed

This only covers installtion though, not building or compiling.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] module archive

2007-10-26 Thread Peter Eisentraut

Am Donnerstag, 25. Oktober 2007 schrieb Andrew Dunstan:
  From time to time people have raised the idea of a CPAN-like mechanism
 for downloading, building and installing extensions and the like (types,
 functions, sample dbs, anything not requiring Postgres itself to be
 rebuilt), and I have been thinking on this for the last few days. What
 sort of requirements would people have of such a mechanism? How do
 people envision it working?

Downloading, building, and installing extensions is actually fairly 
standardized already (well, perhaps there are 2 or 3 standards, but CPAN has 
that as well).  I think the inhibitions relate more to the management of what 
is installed.

I imagine we need a package manager inside of PostgreSQL to manage 
installation, setup, removal, dependencies and so on.  Much like rpm or dpkg 
really.  That should replace the current run this .sql file mechanism, much 
like rpm and dpkg replaced the run make install and trust me mechanism.  I 
have some of this mapped out in my head if there is interest.

We'd also need easy integration with the real rpm and dpkg, so that 
distribution packages can be built easily and I can run

apt-get install postgresql extension1 extension2

and it's all set up.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF

2007-10-26 Thread Sebastien FLAESCH


Hi hackers,

Just looked at the new features of 8.3 and realized that positioned
updates/deletes is now possible with this new release...

We would use that if we could define the cursor name with a libpq function.

Something similar to ODBC's SQLSetCursorName() function...

For now we must use OIDs to emulate WHERE CURRENT OF, but since 8.1,
OIDs are no more created by default and require additional configuration
when setting up a PostgreSQL server.

If I missed something, please point me to the docs where I can find this.

Understand this is not an isolated project: We are a Development Tool
vendor and have a bunch of customers migrating legacy applications from
Informix to PostgreSQL...

Thanks a lot!
Sebastien FLAESCH
Database Interfaces
Four J's Development Tools

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:
 As far as Load Test is concerned, i have tried to provide all the relevant
 details. Please inform me, if i have left any.

Thanks!

How large were the tables?

Did you run all the queries concurrently? At this point, I think it'd be
better to run them separately so that you can look at the impact on each
kind of operation in isolation.

What kind of an I/O system does the server have?

It'd be interesting to get the cache hit/miss ratios, as well as the
output of iostat (or similar) during the test. How much of the benefit
is due to reduced random I/O?

What does the numbers look like if the the tables are small enough to
fit in RAM?

You should do some tuning, the PostgreSQL default configuration is not
tuned for maximum performance. At least increase checkpoint_segments and
checkpoint_timeout and shared_buffers. Though I noticed that you're
running on Windows; I don't think anyone's done any serious performance
testing or tuning on Windows yet, so I'm not sure how you should tune that.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Gokulakannan Somasundaram

On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  As far as Load Test is concerned, i have tried to provide all the
 relevant
  details. Please inform me, if i have left any.

 Thanks!

 How large were the tables?


It is in the Performance test report.  They contain 2 million records.  6
columns wide, 3 text and 3 numeric. same set of tables used for both tests,
after refresh from a file

Did you run all the queries concurrently? At this point, I think it'd be
 better to run them separately so that you can look at the impact on each
 kind of operation in isolation.

Performance tests are run against a workload and i have taken the workload
of a small scale partitioning setup. Running the queries individually has
already been done and the count of logical reads have been verified. I have
already suggested that. For some reason, i am not able to convince that for
simple index scans, Logical reads are a good measure of performance.


What kind of an I/O system does the server have?


Its a normal desktop system. The model no. is ST3400633A, 7200 RPM


It'd be interesting to get the cache hit/miss ratios, as well as the
 output of iostat (or similar) during the test. How much of the benefit
 is due to reduced random I/O?

Good suggestion. i have run the test against Windows. Let me try perfmon in
the next performance test, to monitor the performance test.


What does the numbers look like if the the tables are small enough to
 fit in RAM?


I don't know whether this is a valid production setup, against which we need
to benchmark. But if you insist, i will do that and get back to you next
time.


You should do some tuning, the PostgreSQL default configuration is not
 tuned for maximum performance. At least increase checkpoint_segments and
 checkpoint_timeout and shared_buffers. Though I noticed that you're
 running on Windows; I don't think anyone's done any serious performance
 testing or tuning on Windows yet, so I'm not sure how you should tune
 that.


What we are trying to do here, is to try and compare the performance of two
indexing structures. AFAIK, the performance test done to compare two
software implementations should not have parameter settings, favorable to
one. I have not done any settings change favorable to thick index. But i
have a limited setup, from which i am trying to contribute. So please don't
ask me to run the tests against large scale servers.

 I think a better idea would be to form a Performance testing Workload mix (
Taking into account the QoS Parameters used in the normal database, purging
frequency, typical workload models used in the industry), with freedom in
hardware/software can be drawn. That might solve some of the Load test
riddles.



-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:
 On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
 Gokulakannan Somasundaram wrote:
 As far as Load Test is concerned, i have tried to provide all the
 relevant
 details. Please inform me, if i have left any.
 Thanks!

 How large were the tables?
 
 It is in the Performance test report.  They contain 2 million records.  6
 columns wide, 3 text and 3 numeric. same set of tables used for both tests,
 after refresh from a file

I meant in megabytes. How wide is the data in the text and numeric fields?

 Did you run all the queries concurrently? At this point, I think it'd be
 better to run them separately so that you can look at the impact on each
 kind of operation in isolation.

 Performance tests are run against a workload and i have taken the workload
 of a small scale partitioning setup. Running the queries individually has
 already been done and the count of logical reads have been verified. I have
 already suggested that. For some reason, i am not able to convince that for
 simple index scans, Logical reads are a good measure of performance.

I wouldn't expect any performance gain for simple, not index-only,
scans. They have to hit the heap anyway.

 What does the numbers look like if the the tables are small enough to
 fit in RAM?
 
 I don't know whether this is a valid production setup, against which we need
 to benchmark. But if you insist, i will do that and get back to you next
 time.

A lot of people run databases that fit in RAM. And a lot of people
don't. Both cases are interesting. I'm particularly curious about that
because you've argued that the number of logical reads is important,
even if they don't become physical reads. Hannu also suggested that
swapping pages in/out of shared_buffers is relatively expensive; if
that's the case, we should see index-only scans performing much better
regular index scans, even when there's no physical I/O.

 You should do some tuning, the PostgreSQL default configuration is not
 tuned for maximum performance. At least increase checkpoint_segments and
 checkpoint_timeout and shared_buffers. Though I noticed that you're
 running on Windows; I don't think anyone's done any serious performance
 testing or tuning on Windows yet, so I'm not sure how you should tune
 that.
 
 What we are trying to do here, is to try and compare the performance of two
 indexing structures. AFAIK, the performance test done to compare two
 software implementations should not have parameter settings, favorable to
 one. I have not done any settings change favorable to thick index. 

The tuning I suggested is just basic tuning any knowledgeable Postgres
DBA will do. It's not particularly in favor of any indexing scheme. With
the default checkpoint settings, for example, the system is going to be
busy doing checkpoints all the time if you have a reasonable rate of
updates.

 But i
 have a limited setup, from which i am trying to contribute. So please don't
 ask me to run the tests against large scale servers.

Understood.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Magnus Hagander

On Mon, Oct 22, 2007 at 01:19:24PM -0700, Trevor Talbot wrote:
 On 10/22/07, Magnus Hagander [EMAIL PROTECTED] wrote:
  Trevor Talbot wrote:
 
   I'd probably take the approach of combining win32_waitpid() and
   threads.  You'd end up with 1 thread per 64 backends; when something
   interesting happens the thread could push the info onto a queue, which
   the new win32_waitpid() would check.  Use APCs to add new backends to
   threads with free slots.
 
  I was planning to make it even easier and let Windows do the job for us,
  just using RegisterWaitForSingleObject(). Does the same - one thread per
  64 backends, but we don't have to deal with the queueing ourselves.
 
 Oh, good call -- I keep forgetting the native thread pool exists.

Taking this one to -hackers once and for all now...

Can you try the attached patch? See how many backends you can get up to.

This patch changes from using a single thread for each backend started to
using the builtin threadpool functionality. It also replaces the pid/handle
arrays with an i/o completion port. The net result is also, imho, much more
readable code :-)

Beware - there's still plenty of debugging code in there :-)

//Magnus
Index: src/backend/postmaster/postmaster.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v
retrieving revision 1.542
diff -c -r1.542 postmaster.c
*** src/backend/postmaster/postmaster.c 26 Sep 2007 22:36:30 -  1.542
--- src/backend/postmaster/postmaster.c 26 Oct 2007 11:46:45 -
***
*** 331,344 
  #ifdef EXEC_BACKEND
  
  #ifdef WIN32
- static void win32_AddChild(pid_t pid, HANDLE handle);
- static void win32_RemoveChild(pid_t pid);
  static pid_t win32_waitpid(int *exitstatus);
! static DWORD WINAPI win32_sigchld_waiter(LPVOID param);
  
! static pid_t *win32_childPIDArray;
! static HANDLE *win32_childHNDArray;
! static unsigned long win32_numChildren = 0;
  
  HANDLEPostmasterHandle;
  #endif
--- 331,347 
  #ifdef EXEC_BACKEND
  
  #ifdef WIN32
  static pid_t win32_waitpid(int *exitstatus);
! static void WINAPI pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN 
TimerOrWaitFired);
  
! static HANDLE win32ChildQueue;
! 
! typedef struct 
! {
!   HANDLE waitHandle;
!   HANDLE procHandle;
!   DWORD  procId;
! } win32_deadchild_waitinfo;
  
  HANDLEPostmasterHandle;
  #endif
***
*** 899,914 
  #ifdef WIN32
  
/*
!* Initialize the child pid/HANDLE arrays for signal handling.
 */
!   win32_childPIDArray = (pid_t *)
!   malloc(mul_size(NUM_BACKENDARRAY_ELEMS, sizeof(pid_t)));
!   win32_childHNDArray = (HANDLE *)
!   malloc(mul_size(NUM_BACKENDARRAY_ELEMS, sizeof(HANDLE)));
!   if (!win32_childPIDArray || !win32_childHNDArray)
ereport(FATAL,
!   (errcode(ERRCODE_OUT_OF_MEMORY),
!errmsg(out of memory)));
  
/*
 * Set up a handle that child processes can use to check whether the
--- 902,913 
  #ifdef WIN32
  
/*
!* Initialize I/O completion port used to deliver list of dead children.
 */
!   win32ChildQueue = CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 
1);
!   if (win32ChildQueue == NULL)
ereport(FATAL,
!   (errmsg(could not create I/O completion port for child 
queue)));
  
/*
 * Set up a handle that child processes can use to check whether the
***
*** 2072,2083 
  #define LOOPHEADER()  (exitstatus = status.w_status)
  #else /* WIN32 */
  #define LOOPTEST()((pid = win32_waitpid(exitstatus))  0)
!   /*
!* We need to do this here, and not in CleanupBackend, since this is
!* to be called on all children when we are done with them. Could move
!* to LogChildExit, but that seems like asking for future trouble...
!*/
! #define LOOPHEADER()  (win32_RemoveChild(pid))
  #endif   /* WIN32 */
  #endif   /* HAVE_WAITPID */
  
--- 2071,2077 
  #define LOOPHEADER()  (exitstatus = status.w_status)
  #else /* WIN32 */
  #define LOOPTEST()((pid = win32_waitpid(exitstatus))  0)
! #define LOOPHEADER()  
  #endif   /* WIN32 */
  #endif   /* HAVE_WAITPID */
  
***
*** 3332,3343 
int i;
int j;
charcmdLine[MAXPGPATH * 2];
-   HANDLE  childHandleCopy;
-   HANDLE  waiterThread;
HANDLE  paramHandle;
BackendParameters *param;
SECURITY_ATTRIBUTES sa;
charparamHandleStr[32];
  
/* Make sure caller set up argv properly */
Assert(argc = 3);
--- 3326,3336 
int i;
int j;
charcmdLine[MAXPGPATH * 2];

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Magnus Hagander

On Fri, Oct 26, 2007 at 05:25:39AM -0700, Trevor Talbot wrote:
 On 10/26/07, Magnus Hagander [EMAIL PROTECTED] wrote:
 
  Can you try the attached patch? See how many backends you can get up to.
 
  This patch changes from using a single thread for each backend started to
  using the builtin threadpool functionality. It also replaces the pid/handle
  arrays with an i/o completion port. The net result is also, imho, much more
  readable code :-)
 
 The patch looks good; I'm not set up to build yet, but I should be
 able to test it sometime in the next week.

I've uploaded a set of binary files to
http://www.hagander.net/pgsql/pgsql_83_snapshot_win32child.zip.
You'll need to get the dependency DLLs elsewhere, but you may have them
already.

//Magnus


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

2007-10-26 Thread Gokulakannan Somasundaram

On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
  Gokulakannan Somasundaram wrote:
  As far as Load Test is concerned, i have tried to provide all the
  relevant
  details. Please inform me, if i have left any.
  Thanks!
 
  How large were the tables?
 
  It is in the Performance test report.  They contain 2 million
 records.  6
  columns wide, 3 text and 3 numeric. same set of tables used for both
 tests,
  after refresh from a file

 I meant in megabytes. How wide is the data in the text and numeric fields?



I have observed the size of PGDATA\base folder for size details
Size of Tables : 367 MB
Size of Tables + thin indexes :  616 MB
Size of Tables + thick indexes : 720 MB

The numbers were simply  running between 1 and 2 million in a serial
fashion. I think i made a mistake here. this would have helped thin indexes
in range scans, since the data is clustered at the table, the bitmap heap
scan would have been more effective. So i hope thick indexes will be more
effective, if  uncluster the data, since the thin index has to goto more
table buffers. The test columns are approx 10 characters in length.



 Did you run all the queries concurrently? At this point, I think it'd be
  better to run them separately so that you can look at the impact on
 each
  kind of operation in isolation.
 
  Performance tests are run against a workload and i have taken the
 workload
  of a small scale partitioning setup. Running the queries individually
 has
  already been done and the count of logical reads have been verified. I
 have
  already suggested that. For some reason, i am not able to convince that
 for
  simple index scans, Logical reads are a good measure of performance.

 I wouldn't expect any performance gain for simple, not index-only,
 scans. They have to hit the heap anyway.


I just feel the above test didn't do much I/Os and yet the index only scans
are faster with thick indexes. since the size of RAM is 1GB and the size of
the data is only 616MB, i hope most of them might have been OS cached. May
be i am missing something here.


 What does the numbers look like if the the tables are small enough to
  fit in RAM?
 
  I don't know whether this is a valid production setup, against which we
 need
  to benchmark. But if you insist, i will do that and get back to you next
  time.

 A lot of people run databases that fit in RAM. And a lot of people
 don't. Both cases are interesting. I'm particularly curious about that
 because you've argued that the number of logical reads is important,
 even if they don't become physical reads. Hannu also suggested that
 swapping pages in/out of shared_buffers is relatively expensive; if
 that's the case, we should see index-only scans performing much better
 regular index scans, even when there's no physical I/O.


So the above test has fit into the RAM. Now do we need a test with tables
that won't fit into RAM. i feel if the thick indexes were effective with
data that would fit into RAM, then it will definitely be more effective with
data that wouldn't fit into RAM. There is one performance bug, with updates
where the caching strategy for BTStack didn't go effective for the Varlena
structures. i will fix that bug next time. Also calls to HOT related stuff
can be avoided, if it happens to be a thick index, I think these two
changes, if made would further improve the performance of thick indexes.


 You should do some tuning, the PostgreSQL default configuration is not
  tuned for maximum performance. At least increase checkpoint_segments
 and
  checkpoint_timeout and shared_buffers. Though I noticed that you're
  running on Windows; I don't think anyone's done any serious performance
  testing or tuning on Windows yet, so I'm not sure how you should tune
  that.
 
  What we are trying to do here, is to try and compare the performance of
 two
  indexing structures. AFAIK, the performance test done to compare two
  software implementations should not have parameter settings, favorable
 to
  one. I have not done any settings change favorable to thick index.

 The tuning I suggested is just basic tuning any knowledgeable Postgres
 DBA will do. It's not particularly in favor of any indexing scheme. With
 the default checkpoint settings, for example, the system is going to be
 busy doing checkpoints all the time if you have a reasonable rate of
 updates.


The inserts and updates were at the rate of 10 every 2 seconds (there in the
performance report) and the update was affecting two rows. I i haven't got
any warning to increase the checkpoint during the test.
But my doubt is  if checkpoint has caused so much of overhead, as we think
of, how can the performance of thick indexes exceed thin indexes in index
only scans?
As you might have observed all the statistics (Even the 90 and 95th
percentile/median) were in milliseconds. So that might give a hint about the
stress on the

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Trevor Talbot

On 10/26/07, Magnus Hagander [EMAIL PROTECTED] wrote:

 Can you try the attached patch? See how many backends you can get up to.

 This patch changes from using a single thread for each backend started to
 using the builtin threadpool functionality. It also replaces the pid/handle
 arrays with an i/o completion port. The net result is also, imho, much more
 readable code :-)

The patch looks good; I'm not set up to build yet, but I should be
able to test it sometime in the next week.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Tom Lane

Greg Smith [EMAIL PROTECTED] writes:
 On Fri, 26 Oct 2007, ITAGAKI Takahiro wrote:
 Mixed usage of buffered and direct i/o is legal, but enforces complexity 
 to kernels. If we simplify it, things would be more relaxed. For 
 example, dropping zero-filling and only use direct i/o. Is it possible?

 It's possible, but performance suffers considerably.  I played around with 
 this at one point when looking into doing all database writes as sync 
 writes.  Having to wait until the entire 16MB WAL segment made its way to 
 disk before more WAL could be written can cause a nasty pause in activity, 
 even with direct I/O sync writes.  Even the current buffered zero-filled 
 write of that size can be a bit of a drag on performance for the clients 
 that get caught behind it, making it any sort of sync write will be far 
 worse.

This ties into a loose end we didn't get to yet: being more aggressive
about creating future WAL segments.  ISTM there is no good reason for
clients ever to have to wait for WAL segment creation --- the bgwriter,
or possibly the walwriter, ought to handle that in the background.  But
we only check for the case once per checkpoint and we don't create a
segment unless there's very little space left.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF

2007-10-26 Thread Sebastien FLAESCH


Thanks Heikki for this quick answer.

Actually we do the following lipq calls:

- PQPrepare(... SELECT ... FROM ... FOR UPDATE ... )
- PQexecPrepared(...)
- PQntuples(...) / PQgetvalue(...)

i.e. we don't use the DECLARE CURSOR instruction, we just prepare/execute
the plain SELECT statement (with potential parameters)...

I can't remember why but there was some limitation or problems to use the
DECLARE CURSOR in our context... must dig in my mail archive to give you
more details...

FYI we do actually a PostgreSQL driver for our runtime VM...

Does a simple PQPrepare() with a SELECT statement not create a cursor on
the server side? If yes, would it not be possible to pass a cursor name
as in ODBC?

Best regards,
Seb

Heikki Linnakangas wrote:

Sebastien FLAESCH wrote:

Just looked at the new features of 8.3 and realized that positioned
updates/deletes is now possible with this new release...

We would use that if we could define the cursor name with a libpq function.


I don't understand. When you open a cursor with DECLARE CURSOR, you give
it a name. Doesn't that do what you want?




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah

I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is just 
avoiding the symptom to a later value.. I promise to look more into it 
before making any recommendations to increase NUM_CLOG_BUFFERs.



Because though iGen  showed improvements in that area by increasing 
num_clog_buffers , EAStress had shown no improvements.. Plus the reason 
I think this is not the problem in 8.3beta1 since the Lock Output 
clearly does not show CLOGControlFile as to be the issue which I had 
seen in earlier case.  So I dont think that increasing NUM_CLOG_BUFFERS 
will change thing here.


Now I dont understand the code pretty well yet I see three hotspots and 
not sure if they are related to each other
* ProcArrayLock waits  - causing Waits  as reported by 
83_lockwait.d script
* SimpleLRUReadPage - causing read IOs as reported by 
iostat/rsnoop.d

* GetSnapshotData - causing CPU utilization  as reported by hotuser

But I will shut up and do more testing.

Regards,
Jignesh



Tom Lane wrote:

Josh Berkus [EMAIL PROTECTED] writes:
  
Actually, 32 made a significant difference as I recall ... do you still have 
the figures for that, Jignesh?



I'd want to see a new set of test runs backing up any call for a change
in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
benchmarks using code from a few months back shouldn't carry a lot of
weight.

regards, tom lane
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF

2007-10-26 Thread Sebastien FLAESCH


Tom Lane wrote:

Sebastien FLAESCH [EMAIL PROTECTED] writes:

Does a simple PQPrepare() with a SELECT statement not create a cursor on
the server side?


No.  A prepared statement is just a query plan, not a query-in-progress.


Yes of course, I meant PQprepare() + PQexecPrepared() ...



The Bind/Execute messages sent by PQexecPrepared create something akin
to a cursor, but libpq doesn't expose any API for fetching one row at a
time in that context, so there's no way to use the current row anyway.


OK... that makes total sense (sorry I jump from one database to another
and sometimes I forget implementation details of a specific driver).

Sounds like I need to rework my driver to enable row by row fetching
with the DECLARE CURSOR + FETCH commands... right?

As I wrote: I knew these commands before, but for some reason I did not
use that solution because I faced problems.

I started with PostgreSQL 7.1 so maybe there was some limitation that
does no more exist in 8.x ...

Maybe it was because there is not real OPEN command, so you can't
distinguish the preparation phase from the execution phase with the
DECLARE CURSOR instruction...

Or maybe because cursors could only exist inside a transaction block?

Just found this in the 7.1 doc:
http://www.postgresql.org/docs/7.1/static/sql-declare.html

ERROR: DECLARE CURSOR may only be used in begin/end transaction blocks
This error occurs if the cursor is not declared within a transaction block.


Anyway...

Thanks a lot guys for these quick answers.
That changes from Oracle or SQL Server support.
Seb

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah

The problem I saw was first highlighted by EAStress runs with PostgreSQL 
on Solaris with 120-150 users. I just replicated that via my smaller 
internal benchmark that we use here to recreate that problem.


EAStress should be just fine to highlight it.. Just put pg_clog on 
O_DIRECT or something so that all IOs go to disk making it easier to 
observe.


In the meanwhile I will try to get more information.


Regards,
Jignesh


Tom Lane wrote:

Gregory Stark [EMAIL PROTECTED] writes:
  

Didn't we already go through this? He and Simon were pushing to bump up
NUM_CLOG_BUFFERS and you were arguing that the test wasn't representative and
some other clog.c would have to be reengineered to scale well to larger
values. 



AFAIR we never did get any clear explanation of what the test case is.
I guess it must be write-mostly, else lazy XID assignment would have
helped this by reducing the rate of XID consumption.

It's still true that I'm leery of a large increase in the number of
buffers without reengineering slru.c.  That code was written on the
assumption that there were few enough buffers that a linear search
would be fine.  I'd hold still for 16, or maybe even 32, but I dunno
how much impact that will have for such a test case.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Jonah H. Harris

On 10/26/07, Tom Lane [EMAIL PROTECTED] wrote:
 This ties into a loose end we didn't get to yet: being more aggressive
 about creating future WAL segments.  ISTM there is no good reason for
 clients ever to have to wait for WAL segment creation --- the bgwriter,
 or possibly the walwriter, ought to handle that in the background.

Agreed.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF

2007-10-26 Thread Sebastien FLAESCH


Forget this one, just missing the WITH HOLD option...
Must teach myself a bit more before sending further mails.
Seb

Sebastien FLAESCH wrote:

Ok...

Tested with 8.2.3:

Actually you can't DECLARE a cursor outside a transaction:

test1= declare c1 cursor for select * from dbit2;
ERROR:  DECLARE CURSOR may only be used in transaction blocks

That's the main reason why we don't use DECLARE CURSOR...

I understand we could use DECLARE CURSOR when a FOR UPDATE is
detected, to allow UPDATE/DELETE WHERE CURRENT OF - these only
make sense inside a transaction...

But for normal cursors, we need to support multiple active result
sets that can last outside a transaction block.

Basically, we need all what you can do with ODBC cursors.

Anyway, thanks for your help.

Seb

Tom Lane wrote:

Sebastien FLAESCH [EMAIL PROTECTED] writes:

Does a simple PQPrepare() with a SELECT statement not create a cursor on
the server side?


No.  A prepared statement is just a query plan, not a query-in-progress.

The Bind/Execute messages sent by PQexecPrepared create something akin
to a cursor, but libpq doesn't expose any API for fetching one row at a
time in that context, so there's no way to use the current row anyway.

regards, tom lane







---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF

2007-10-26 Thread Sebastien FLAESCH


Ok...

Tested with 8.2.3:

Actually you can't DECLARE a cursor outside a transaction:

test1= declare c1 cursor for select * from dbit2;
ERROR:  DECLARE CURSOR may only be used in transaction blocks

That's the main reason why we don't use DECLARE CURSOR...

I understand we could use DECLARE CURSOR when a FOR UPDATE is
detected, to allow UPDATE/DELETE WHERE CURRENT OF - these only
make sense inside a transaction...

But for normal cursors, we need to support multiple active result
sets that can last outside a transaction block.

Basically, we need all what you can do with ODBC cursors.

Anyway, thanks for your help.

Seb

Tom Lane wrote:

Sebastien FLAESCH [EMAIL PROTECTED] writes:

Does a simple PQPrepare() with a SELECT statement not create a cursor on
the server side?


No.  A prepared statement is just a query plan, not a query-in-progress.

The Bind/Execute messages sent by PQexecPrepared create something akin
to a cursor, but libpq doesn't expose any API for fetching one row at a
time in that context, so there's no way to use the current row anyway.

regards, tom lane




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF

2007-10-26 Thread Tom Lane

Sebastien FLAESCH [EMAIL PROTECTED] writes:
 Does a simple PQPrepare() with a SELECT statement not create a cursor on
 the server side?

No.  A prepared statement is just a query plan, not a query-in-progress.

The Bind/Execute messages sent by PQexecPrepared create something akin
to a cursor, but libpq doesn't expose any API for fetching one row at a
time in that context, so there's no way to use the current row anyway.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [PERFORM] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah



Hi George,

I have seen the 4M/sec problem first actually during an EAStress type 
run with only 150 connections.


I will try to do more testing today that Tom has requested.

Regards,
Jignesh


Gregory Stark wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:

  

CLOG data is not cached in any PostgreSQL shared memory segments and hence
becomes the bottleneck as it has to constantly go to the filesystem to get
the read data.



This is the same bottleneck you discussed earlier. CLOG reads are cached in
the Postgres shared memory segment but only NUM_CLOG_BUFFERS are which
defaults to 8 buffers of 8kb each. With 1,000 clients and the transaction rate
you're running you needed a larger number of buffers.

Using the filesystem buffer cache is also an entirely reasonable solution
though. That's surely part of the logic behind not trying to keep more of the
clog in shared memory. Do you have any measurements of how much time is being
spent just doing the logical I/O to the buffer cache for the clog pages? 4MB/s
seems like it's not insignificant but your machine is big enough that perhaps
I'm thinking at the wrong scale.

I'm really curious whether you see any benefit from the vxid read-only
transactions. I'm not sure how to get an apples to apples comparison though.
Ideally just comparing it to CVS HEAD from immediately prior to the vxid patch
going in. Perhaps calling some function which forces an xid to be allocated
and seeing how much it slows down the benchmark would be a good substitute.

  


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah


Tom,

Here is what I did:

I started aggregating all read information:

First I also had added group by pid(arg0,arg1, pid) and the counts 
were all coming as 1


Then I just grouped by filename and location (arg0,arg1 of reads) and 
the counts came back as


# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname==postgres/
{
   @read[fds[arg0].fi_pathname, arg1] = count();
}


# ./read.d
dtrace: script './read.d' matched 1 probe
^C

 /export/home0/igen/pgdata/pg_clog/0014   
-27530282934721
 /export/home0/igen/pgdata/pg_clog/0014   
-27530282770881
 /export/home0/igen/pgdata/pg_clog/0015   
-27530282443202
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028268896   14
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028260704   25
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028252512   27
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028277088   28
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028293472   37



FYI  I pressed ctrl-c within like less than a second

So to me this seems that multiple processes are reading the same page 
from different pids. (This was with about 600 suers active.


Aparently we do have a problem that we are reading the same buffer 
address again.  (Same as not being cached anywhere or not finding it in 
cache anywhere).


I reran lock wait script on couple of processes and did not see 
CLogControlFileLock  as a problem..


# ./83_lwlock_wait.d 14341

Lock IdMode   Count
  WALInsertLock   Exclusive   1
  ProcArrayLock   Exclusive  16

Lock Id   Combined Time (ns)
  WALInsertLock   383109
  ProcArrayLock198866236

# ./83_lwlock_wait.d 14607

Lock IdMode   Count
  WALInsertLock   Exclusive   2
  ProcArrayLock   Exclusive  15

Lock Id   Combined Time (ns)
  WALInsertLock55243
  ProcArrayLock 69700140

#

What will help you find out why it is reading the same page again?


-Jignesh



Jignesh K. Shah wrote:
I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is 
just avoiding the symptom to a later value.. I promise to look more 
into it before making any recommendations to increase NUM_CLOG_BUFFERs.



Because though iGen  showed improvements in that area by increasing 
num_clog_buffers , EAStress had shown no improvements.. Plus the 
reason I think this is not the problem in 8.3beta1 since the Lock 
Output clearly does not show CLOGControlFile as to be the issue which 
I had seen in earlier case.  So I dont think that increasing 
NUM_CLOG_BUFFERS will change thing here.


Now I dont understand the code pretty well yet I see three hotspots 
and not sure if they are related to each other
* ProcArrayLock waits  - causing Waits  as reported by 
83_lockwait.d script
* SimpleLRUReadPage - causing read IOs as reported by 
iostat/rsnoop.d

* GetSnapshotData - causing CPU utilization  as reported by hotuser

But I will shut up and do more testing.

Regards,
Jignesh



Tom Lane wrote:

Josh Berkus [EMAIL PROTECTED] writes:
 
Actually, 32 made a significant difference as I recall ... do you 
still have the figures for that, Jignesh?



I'd want to see a new set of test runs backing up any call for a change
in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
benchmarks using code from a few months back shouldn't carry a lot of
weight.

regards, tom lane
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Magnus Hagander

  Taking this one to -hackers once and for all now...
  
  Can you try the attached patch? See how many backends you can get up to.
 
 Regression tests run just fine, and I've run multiple pgbench runs with
 3 and 4 sessions of 100 connections each*, with pgAdmin monitoring
 things at the same time. Saw up to 403 simultanteous connections in
 pg_stat_activity, and the system remained stable and responsive, albeit
 somewhat slower than normal.

What was the memory space consumption of the postmaster process, and compared 
to without the patch? 

VM size in taskmgr should show that I think, and should show a much smaller 
footprint now..

/Magnus


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Dave Page

Magnus Hagander wrote:
 Taking this one to -hackers once and for all now...
 
 Can you try the attached patch? See how many backends you can get up to.

Regression tests run just fine, and I've run multiple pgbench runs with
3 and 4 sessions of 100 connections each*, with pgAdmin monitoring
things at the same time. Saw up to 403 simultanteous connections in
pg_stat_activity, and the system remained stable and responsive, albeit
somewhat slower than normal.

So, 400 connections on a 2.33GHz MacBook Pro running XP Pro with 2GB RAM
- thats not too shabby :-)

/D


* For those that weren't peering over Magnus' or Greg's shoulder during
various IM discussions over the last few days, I've found that the ~125
connection ceiling I was hitting when running from a command prompt was
actually an as yet unsolved problem in pgbench, not the server. Multiple
pgbench sessions seem to run just fine if kept to around 100 connections
each.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah


Also to give perspective on the equivalent writes on CLOG

I used the following script which runs for 10 sec to track all writes to 
the clog directory and here is what it came up with... (This is with 500 
users running)


# cat write.d
#!/usr/sbin/dtrace -s
syscall::write:entry
/execname==postgres  
dirname(fds[arg0].fi_pathname)==/export/home0/igen/pgdata/pg_clog/

{
   @write[fds[arg0].fi_pathname,arg1] = count();
}
tick-10sec
{
exit(0);
}

# ./write.d
dtrace: script './write.d' matched 2 probes
CPU IDFUNCTION:NAME
 3   1026  :tick-10sec

 /export/home0/igen/pgdata/pg_clog/001E   
-27530282770881

#
I modified read.d to do a 5sec read
# ./read.d
dtrace: script './read.d' matched 3 probes
CPU IDFUNCTION:NAME
 0  1   :BEGIN
 0   1027   :tick-5sec

 /export/home0/igen/pgdata/pg_clog/001F   
-27530282688961
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282525121
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282852802
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282770883
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282361283
 /export/home0/igen/pgdata/pg_clog/001E   
-27530282852805
 /export/home0/igen/pgdata/pg_clog/001E   
-27530282361289
 /export/home0/igen/pgdata/pg_clog/001E   
-2753028277088   13
 /export/home0/igen/pgdata/pg_clog/001E   
-2753028268896   15
 /export/home0/igen/pgdata/pg_clog/001E   
-2753028252512   27

#

So the ratio of reads vs writes to clog files is pretty huge..


-Jignesh



Jignesh K. Shah wrote:

Tom,

Here is what I did:

I started aggregating all read information:

First I also had added group by pid(arg0,arg1, pid) and the counts 
were all coming as 1


Then I just grouped by filename and location (arg0,arg1 of reads) and 
the counts came back as


# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname==postgres/
{
   @read[fds[arg0].fi_pathname, arg1] = count();
}


# ./read.d
dtrace: script './read.d' matched 1 probe
^C

 /export/home0/igen/pgdata/pg_clog/0014   
-27530282934721
 /export/home0/igen/pgdata/pg_clog/0014   
-27530282770881
 /export/home0/igen/pgdata/pg_clog/0015   
-27530282443202
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028268896   14
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028260704   25
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028252512   27
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028277088   28
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028293472   37



FYI  I pressed ctrl-c within like less than a second

So to me this seems that multiple processes are reading the same page 
from different pids. (This was with about 600 suers active.


Aparently we do have a problem that we are reading the same buffer 
address again.  (Same as not being cached anywhere or not finding it 
in cache anywhere).


I reran lock wait script on couple of processes and did not see 
CLogControlFileLock  as a problem..


# ./83_lwlock_wait.d 14341

Lock IdMode   Count
  WALInsertLock   Exclusive   1
  ProcArrayLock   Exclusive  16

Lock Id   Combined Time (ns)
  WALInsertLock   383109
  ProcArrayLock198866236

# ./83_lwlock_wait.d 14607

Lock IdMode   Count
  WALInsertLock   Exclusive   2
  ProcArrayLock   Exclusive  15

Lock Id   Combined Time (ns)
  WALInsertLock55243
  ProcArrayLock 69700140

#

What will help you find out why it is reading the same page again?


-Jignesh



Jignesh K. Shah wrote:
I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is 
just avoiding the symptom to a later value.. I promise to look more 
into it before making any recommendations to increase NUM_CLOG_BUFFERs.



Because though iGen  showed improvements in that area by increasing 
num_clog_buffers , EAStress had shown no improvements.. Plus the 
reason I think this is not the problem in 8.3beta1 since the Lock 
Output clearly does not show CLOGControlFile as to be the issue which 
I had seen in earlier case.  So I dont think that increasing 
NUM_CLOG_BUFFERS will change thing here.


Now I dont understand the code pretty well yet I see three hotspots 
and not sure if they are related to each other
* ProcArrayLock waits  - causing Waits  as reported by 
83_lockwait.d script

Re: [HACKERS] 8.3 GSS Issues

2007-10-26 Thread Henry B. Hotz



On Oct 26, 2007, at 12:56 AM, Magnus Hagander wrote:


On Thu, Oct 25, 2007 at 05:39:37PM -0700, Henry B. Hotz wrote:


On Oct 25, 2007, at 3:27 PM, Stephen Frost wrote:


* Henry B. Hotz ([EMAIL PROTECTED]) wrote:


What you're asking for is basically a krb_match_realm parameter,  
or do I understand you wrong?


I'm asking for name matching to be done i.a.w. the gssapi
recommendations.  That's all I want, but it's actually necessary
for this feature to be at all usable in my environment.  If we don't
then I suggest we pull this feature until it can be done correctly.


I know what you want, that's fairly obvious. I'm only asking about  
*how* to

do it the best way.


What the krb5 method does is IMO a documented bug.  The realm  
name is part of the name.


As I explained at some length you cannot assume the username  
(first component of the principal) has any meaning by itself,  
except in small deployments with no external trust agreements.   
Kerberos (and AD) are designed to support larger infrastructures  
with multiple organizations.


This isn't unexpected for PG as the current krb5 support does  
this.  I'm not a big fan of it but at the same time I don't feel  
it's justification to drop it from 8.3.  Having it only allow the  
default realm would be an option which could work in 8.3, imv.


I don't think the fact that the existing krb5 code does the wrong
thing (and can't be used in an environment with cross-realm
agreements) is justification for doing the wrong thing in a new
capability.  The code in my original patch would do the latter
(default realm only).

More precisely:  if you do a gss_import_name() on smith and
[EMAIL PROTECTED] you get the same internal representation, and
gss_compare_name() will tell you they're the same.  Also
gss_compare_name() will tell you [EMAIL PROTECTED] is different
from either of the first two.


Wouldn't using a specific parameter like krb_match_realm=YOUR.REALM  
that
you set in the config file be more flexible? In that it actually  
allows
scenarios like server/resource domains (not sure how common they  
are in
unix krb setups, but they're certainly not unfamiliar in the  
Windows AD

world)?


Yes and no.  It certainly would have made it easier to test my  
original patch since the server was in a test realm and I couldn't  
use my normal production identity.  I'd imagine deployments where the  
users are in a different realm from the servers are somewhat common.


The counter is that (if done naively) it would prevent you from  
supporting users from multiple realms at all.  I never completely  
tested this, but I think with my original patch you could define both  
smith (== [EMAIL PROTECTED]) and [EMAIL PROTECTED] as users  
to PG.  They wouldn't be the same user (which you might want), but  
you could support both of them.


Is there any (other) code in PG that would barf on long usernames  
that contain @ and/or .?



If we don't use gss_compare_name(), or some similar mechanism, to
compare connection names to PG usernames, then I don't think GSSAPI
support should be included in 8.3.


I think that's a horrible idea, given that it works perfectly fine  
the way

it is now for the vast majority of users.

That said, we should certainly fix it in one way or another for  
8.3. But if

that fails, I see no reason at all to pull the feature.


If this isn't fixed then PG will never be a supported infrastructure  
service at JPL the way MySQL currently is.  I had hoped to use the  
GSSAPI support as a feature to pry some people away from MySQL, but  
without the ability to integrate into a multi-realm infrastructure  
this won't fly.  Of course even with proper support it still may  
never happen, so that isn't a threat.


Longer term (since it's likely too late to be accepted now), as I  
think has been discussed in the past, PG could really use  
a .k5login-esque, either admin-only (ala pg_hba.conf / ident map)  
or per-user (some sort of ALTER ROLE that a user could do on  
himself?), mapping functionality.


There has been discussion of a general mapping layer between
authentication names and authorization/role names.  I think that's
the way to go.  I haven't thought about who or where the
administration of the mapping ought to be.


Yeah, I agree that something like that would be a good long-term  
solution.


For those on the periphery: the PG protocol already carries the PG  
username, and the Kerberos and GSSAPI auth methods carry the  
authentication name.  If you define the PG username to be the  
authorization name then you have a pretty standard architecture.


If anyone has used a kerberized ssh then you have a similar split.   
You can kinit as A and then ssh [EMAIL PROTECTED].  If you have put A  
into ~B/.k5login then you get in without a password prompt.



For a proper discussion of this topic I recommend the section
starting on page 64 of Sun's Security for Developers Guide, document
816-4863.  Note that there is a discussion of

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Dave Page

Magnus Hagander wrote:
 VM size in taskmgr should show that I think, and should show a much
 smaller footprint now..

With patch -4,492K
Without patch:  28,224K

Thats with 3 x 100 pgbench connections.

/D


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Magnus Hagander

Dave Page wrote:
 Magnus Hagander wrote:
 VM size in taskmgr should show that I think, and should show a much
 smaller footprint now..
 
 With patch -4,492K
 Without patch:  28,224K
 
 Thats with 3 x 100 pgbench connections.

That's nice!

But. That can't be address space usage, it has to be actual memory
usage. Since each thread should chew up 4Mb of address space, and
there's at least two threads in there :-) So looking at the VM column
was obviously not correct.
* looks up some docs*
Right. You need to look at VM size in *process explorer*. VM size in
task manager has nothing to do with VM size, it's the private bytes :-S
And there is no way to see that info from task manager, I think. PE is
your friend.


Anyway. Other than a refresher on those, I'd be interested in two other
important parts:
* How many threads does it reach when you have 300 active backends?
* Is there a handle leak? meaning once your 300 backends have exited,
does the number of handles in the process drop down to the same value it
had before?

(sorry, wish I was in a position to run these tests myself, but I'm not
right now)

//Magnus


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Tom Lane

Jignesh K. Shah [EMAIL PROTECTED] writes:
 So the ratio of reads vs writes to clog files is pretty huge..

It looks to me that the issue is simply one of not having quite enough
CLOG buffers.  Your first run shows 8 different pages being fetched and
the second shows 10.  Bearing in mind that we pin the latest CLOG page
into buffers, there are only NUM_CLOG_BUFFERS-1 buffers available for
older pages, so what we've got here is thrashing for the available
slots.

Try increasing NUM_CLOG_BUFFERS to 16 and see how it affects this test.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Magnus Hagander

Dave Page wrote:
 Magnus Hagander wrote:
 Right. You need to look at VM size in *process explorer*. VM size in
 task manager has nothing to do with VM size, it's the private bytes :-S
 And there is no way to see that info from task manager, I think. PE is
 your friend.


 Anyway. Other than a refresher on those, I'd be interested in two other
 important parts:
 * How many threads does it reach when you have 300 active backends?
 * Is there a handle leak? meaning once your 300 backends have exited,
 does the number of handles in the process drop down to the same value it
 had before?
 
 Without patch:
 
 VM:   1,322,792K
 Idle threads: 6   
 Peak threads: 306 
 Handles at start: 576
 Handles at end:   576
 
 With patch:
 
 VM:   98,088K 
 Idle threads: 3   
 Peak threads: 7
 Handles at start: 576
 Handles at end:   585 (585 again after second run).

Ah, now we're talking. That's the kind of reduction I was looking for :-)

I think the difference in handles is because the threadpool keeps some
things around. As long as it stays at 585 and comes back down after a
second run, we're fine at that - there's no leak.

Attached is an updated version of the patch, currently being tested by
both me and Dave. If it passes our tests, I'll apply this so it gets
included for broader testing in beta2.


//Magnus
Index: src/backend/postmaster/postmaster.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v
retrieving revision 1.542
diff -c -r1.542 postmaster.c
*** src/backend/postmaster/postmaster.c	26 Sep 2007 22:36:30 -	1.542
--- src/backend/postmaster/postmaster.c	26 Oct 2007 20:09:35 -
***
*** 331,344 
  #ifdef EXEC_BACKEND
  
  #ifdef WIN32
- static void win32_AddChild(pid_t pid, HANDLE handle);
- static void win32_RemoveChild(pid_t pid);
  static pid_t win32_waitpid(int *exitstatus);
! static DWORD WINAPI win32_sigchld_waiter(LPVOID param);
  
! static pid_t *win32_childPIDArray;
! static HANDLE *win32_childHNDArray;
! static unsigned long win32_numChildren = 0;
  
  HANDLE		PostmasterHandle;
  #endif
--- 331,347 
  #ifdef EXEC_BACKEND
  
  #ifdef WIN32
  static pid_t win32_waitpid(int *exitstatus);
! static void WINAPI pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired);
  
! static HANDLE win32ChildQueue;
! 
! typedef struct 
! {
! 	HANDLE waitHandle;
! 	HANDLE procHandle;
! 	DWORD  procId;
! } win32_deadchild_waitinfo;
  
  HANDLE		PostmasterHandle;
  #endif
***
*** 899,914 
  #ifdef WIN32
  
  	/*
! 	 * Initialize the child pid/HANDLE arrays for signal handling.
  	 */
! 	win32_childPIDArray = (pid_t *)
! 		malloc(mul_size(NUM_BACKENDARRAY_ELEMS, sizeof(pid_t)));
! 	win32_childHNDArray = (HANDLE *)
! 		malloc(mul_size(NUM_BACKENDARRAY_ELEMS, sizeof(HANDLE)));
! 	if (!win32_childPIDArray || !win32_childHNDArray)
  		ereport(FATAL,
! (errcode(ERRCODE_OUT_OF_MEMORY),
!  errmsg(out of memory)));
  
  	/*
  	 * Set up a handle that child processes can use to check whether the
--- 902,913 
  #ifdef WIN32
  
  	/*
! 	 * Initialize I/O completion port used to deliver list of dead children.
  	 */
! 	win32ChildQueue = CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 1);
! 	if (win32ChildQueue == NULL)
  		ereport(FATAL,
! 			(errmsg(could not create I/O completion port for child queue)));
  
  	/*
  	 * Set up a handle that child processes can use to check whether the
***
*** 2072,2083 
  #define LOOPHEADER()	(exitstatus = status.w_status)
  #else	/* WIN32 */
  #define LOOPTEST()		((pid = win32_waitpid(exitstatus))  0)
! 	/*
! 	 * We need to do this here, and not in CleanupBackend, since this is
! 	 * to be called on all children when we are done with them. Could move
! 	 * to LogChildExit, but that seems like asking for future trouble...
! 	 */
! #define LOOPHEADER()	(win32_RemoveChild(pid))
  #endif   /* WIN32 */
  #endif   /* HAVE_WAITPID */
  
--- 2071,2077 
  #define LOOPHEADER()	(exitstatus = status.w_status)
  #else	/* WIN32 */
  #define LOOPTEST()		((pid = win32_waitpid(exitstatus))  0)
! #define LOOPHEADER()
  #endif   /* WIN32 */
  #endif   /* HAVE_WAITPID */
  
***
*** 3332,3343 
  	int			i;
  	int			j;
  	char		cmdLine[MAXPGPATH * 2];
- 	HANDLE		childHandleCopy;
- 	HANDLE		waiterThread;
  	HANDLE		paramHandle;
  	BackendParameters *param;
  	SECURITY_ATTRIBUTES sa;
  	char		paramHandleStr[32];
  
  	/* Make sure caller set up argv properly */
  	Assert(argc = 3);
--- 3326,3336 
  	int			i;
  	int			j;
  	char		cmdLine[MAXPGPATH * 2];
  	HANDLE		paramHandle;
  	BackendParameters *param;
  	SECURITY_ATTRIBUTES sa;
  	char		paramHandleStr[32];
+ 	win32_deadchild_waitinfo *childinfo;
  
  	/* Make sure caller set up argv properly */
  	Assert(argc = 3);
***
*** 3345,3359

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Dave Page

Magnus Hagander wrote:
 Right. You need to look at VM size in *process explorer*. VM size in
 task manager has nothing to do with VM size, it's the private bytes :-S
 And there is no way to see that info from task manager, I think. PE is
 your friend.
 
 
 Anyway. Other than a refresher on those, I'd be interested in two other
 important parts:
 * How many threads does it reach when you have 300 active backends?
 * Is there a handle leak? meaning once your 300 backends have exited,
 does the number of handles in the process drop down to the same value it
 had before?

Without patch:

VM: 1,322,792K
Idle threads:   6   
Peak threads:   306 
Handles at start:   576
Handles at end: 576

With patch:

VM: 98,088K 
Idle threads:   3   
Peak threads:   7
Handles at start:   576
Handles at end: 585 (585 again after second run).

/D


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] win32 threads patch vs beta2 - what to do?

2007-10-26 Thread Magnus Hagander

Magnus Hagander wrote:
 Dave Page wrote:
 Magnus Hagander wrote:
 Right. You need to look at VM size in *process explorer*. VM size in
 task manager has nothing to do with VM size, it's the private bytes :-S
 And there is no way to see that info from task manager, I think. PE is
 your friend.


 Anyway. Other than a refresher on those, I'd be interested in two other
 important parts:
 * How many threads does it reach when you have 300 active backends?
 * Is there a handle leak? meaning once your 300 backends have exited,
 does the number of handles in the process drop down to the same value it
 had before?
 Without patch:

 VM:  1,322,792K
 Idle threads:6   
 Peak threads:306 
 Handles at start:576
 Handles at end:  576

 With patch:

 VM:  98,088K 
 Idle threads:3   
 Peak threads:7
 Handles at start:576
 Handles at end:  585 (585 again after second run).
 
 Ah, now we're talking. That's the kind of reduction I was looking for :-)
 
 I think the difference in handles is because the threadpool keeps some
 things around. As long as it stays at 585 and comes back down after a
 second run, we're fine at that - there's no leak.
 
 Attached is an updated version of the patch, currently being tested by
 both me and Dave. If it passes our tests, I'll apply this so it gets
 included for broader testing in beta2.

So of course, for all good patches, problems turn up :-(

This patch doesn't work on mingw, at least not on all versions. The
reason is that, as usual, the mingw libraries are not complete. We've
dealt with this before, by dynamically loading the functions. We know
this works. But I don't have time to fix that tonight, and I'll be
offline much of this weekend.

Now, given these great improvements, I'd very much like this in beta2,
so it can get proper testing. This leaves us with a couple of choices:

1) Delay beta2 until the beginning of next week. I'll get this fixed
sunday evening or monday at the latest. I know how to fix it, it's just
that I don't have the time right now :( (This assumes that the plan
still is to wrap beta2 today)

2) Apply the patch as-is. Then beta2 will work fine with the msvc build,
which is used for the binary distribution. But it's broken on mingw
until fixed, which of course includes the buildfarm stuff. Again, we
know how to fix this.

2b) I apply this as-is, and someone else cleans up mingw before beta2 is
wrapped.

3) We don't apply this, and wait until beta3 to have it tested.
Depending on how many betas we end up with, that may leave us with very
little testing before release.



2b is of course the best here, but then someone has to step up and
volunteer to do that.

I'm leaning towards applying the patch now, and hoping for 2b to happen.
If it doesn't happen, the choice between 1 and 2 can be made when the
time to wrap the beta approaches (at which point I will be offline, so I
escape :-P)


The patch that would go in is the one previously posted plus a couple of
minor edits in comments as suggested on IM by Alvaro.


Comments?

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] win32 threads patch vs beta2 - what to do?

2007-10-26 Thread Dave Page

Magnus Hagander wrote:
 I'm leaning towards applying the patch now, and hoping for 2b to happen.

I think we should live with the mingw BF breakage for a day or two. The
patch is clearly an important improvement, but it should be as widely
tested as possible.

/D

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] win32 threads patch vs beta2 - what to do?

2007-10-26 Thread Joshua D. Drake

On Fri, 26 Oct 2007 21:58:03 +0100
Dave Page [EMAIL PROTECTED] wrote:

 Magnus Hagander wrote:
  I'm leaning towards applying the patch now, and hoping for 2b to
  happen.
 
 I think we should live with the mingw BF breakage for a day or two.
 The patch is clearly an important improvement, but it should be as
 widely tested as possible.

I think this makes sense. I certainly don't want to hold back Beta2 and
this patch so far is an obvious improvement.

Sincerely,

Joshua D. Drake

 
 /D
 
 ---(end of
 broadcast)--- TIP 2: Don't 'kill -9' the
 postmaster
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



signature.asc
Description: PGP signature

Re: [HACKERS] win32 threads patch vs beta2 - what to do?

2007-10-26 Thread Tom Lane

Dave Page [EMAIL PROTECTED] writes:
 Magnus Hagander wrote:
 I'm leaning towards applying the patch now, and hoping for 2b to happen.

 I think we should live with the mingw BF breakage for a day or two. The
 patch is clearly an important improvement, but it should be as widely
 tested as possible.

If Dave is intending to build the Windows installer as soon as beta2 is
wrapped, then I agree with this plan.  But my understanding was we'd
missed his window for today and so that wouldn't happen till Monday.
Assuming that's true, the idea Bruce and I had discussed on the phone
was:

1. Wrap official beta2 tonight, so that other packagers can work from it
over the weekend;

2. Magnus fixes his patch Sunday and applies it then;

3. Dave builds the Windows installer Monday *from CVS tip*.

So the Windows version would be beta2-plus-a-little but we'd neither
hold up work on other platforms nor break anything in buildfarm.

Just an alternative thought.  In any case I agree that we want
Windows testing of beta2 to include this patch.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah



I  changed  CLOG Buffers to 16

Running the test again:
# ./read.d
dtrace: script './read.d' matched 2 probes
CPU IDFUNCTION:NAME
 0   1027   :tick-5sec

 /export/home0/igen/pgdata/pg_clog/0024   
-27530282192961
 /export/home0/igen/pgdata/pg_clog/0025   
-27530282111041

# ./read.d
dtrace: script './read.d' matched 2 probes
CPU IDFUNCTION:NAME
 1   1027   :tick-5sec

# ./read.d
dtrace: script './read.d' matched 2 probes
CPU IDFUNCTION:NAME
 1   1027   :tick-5sec

# ./read.d
dtrace: script './read.d' matched 2 probes
CPU IDFUNCTION:NAME
 0   1027   :tick-5sec

 /export/home0/igen/pgdata/pg_clog/0025   
-27530281947201



So Tom seems to be correct that it is a case of CLOG Buffer thrashing. 
But since I saw the same problem with two different workloads, I think 
people hitting this problem is pretty high.


Also I am bit surprised that CLogControlFile did not show up as being 
hot.. Maybe because not much writes are going on .. Or maybe since I did 
not trace all 500 users to see their hot lock status..



Dmitri has another workload to test, I might try that out later on to 
see if it causes similar impact or not.


Of course I havent seen my throughput go up yet since I am already CPU 
bound... But this is good since the number of IOPS to the disk are 
reduced (and hence system calls).



If I take this as my baseline number.. I can then proceed to hunt other 
bottlenecks



Whats the view of the community?

Hunt down CPU utilizations or Lock waits next?

Your votes are crucial on where I put my focus.

Another thing Josh B told me to check out was the wal_writer_delay setting:

I have done two settings with almost equal performance (with the CLOG 16 
setting) .. One with 100ms and other default at 200ms.. Based on the 
runs it seemed that the 100ms was slightly better than the default .. 
(Plus the risk of loosing data is reduced from 600ms to 300ms)


Thanks.

Regards,
Jignesh




Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  

So the ratio of reads vs writes to clog files is pretty huge..



It looks to me that the issue is simply one of not having quite enough
CLOG buffers.  Your first run shows 8 different pages being fetched and
the second shows 10.  Bearing in mind that we pin the latest CLOG page
into buffers, there are only NUM_CLOG_BUFFERS-1 buffers available for
older pages, so what we've got here is thrashing for the available
slots.

Try increasing NUM_CLOG_BUFFERS to 16 and see how it affects this test.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq
  


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Tom Lane

Magnus Hagander [EMAIL PROTECTED] writes:
 Attached is an updated version of the patch, currently being tested by
 both me and Dave. If it passes our tests, I'll apply this so it gets
 included for broader testing in beta2.

One question: what's this about?
  
 + #define _WIN32_WINNT 0x0500

This looks like it might be tightening our assumptions about which
Windows flavors we can run on.  I'm not necessarily against that,
but it should be publicly discussed if it's happening.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] win32 threads patch vs beta2 - what to do?

2007-10-26 Thread Magnus Hagander

Tom Lane wrote:
 Dave Page [EMAIL PROTECTED] writes:
 Magnus Hagander wrote:
 I'm leaning towards applying the patch now, and hoping for 2b to happen.
 
 I think we should live with the mingw BF breakage for a day or two. The
 patch is clearly an important improvement, but it should be as widely
 tested as possible.
 
 If Dave is intending to build the Windows installer as soon as beta2 is
 wrapped, then I agree with this plan.  But my understanding was we'd
 missed his window for today and so that wouldn't happen till Monday.
 Assuming that's true, the idea Bruce and I had discussed on the phone
 was:
 
 1. Wrap official beta2 tonight, so that other packagers can work from it
 over the weekend;
 
 2. Magnus fixes his patch Sunday and applies it then;
 
 3. Dave builds the Windows installer Monday *from CVS tip*.
 
 So the Windows version would be beta2-plus-a-little but we'd neither
 hold up work on other platforms nor break anything in buildfarm.
 
 Just an alternative thought.  In any case I agree that we want
 Windows testing of beta2 to include this patch.

If we do that, then what we label as beta2 in the installer is *not* the
same as someone who has built beta2 from source. Can't say I like that
one, and I know Dave doesn't like it (he said so before going offline).

I'd rather see msvc working and mingw broken for beta2 really, because
then we *know* that if someone says they're doing beta2 on mingw they're
misinformed...

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Magnus Hagander

Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 Attached is an updated version of the patch, currently being tested by
 both me and Dave. If it passes our tests, I'll apply this so it gets
 included for broader testing in beta2.
 
 One question: what's this about?
   
 + #define _WIN32_WINNT 0x0500
 
 This looks like it might be tightening our assumptions about which
 Windows flavors we can run on.  I'm not necessarily against that,
 but it should be publicly discussed if it's happening.

It enables Windows 2000-specific headers. We already require Windows
2000 to run, so it doesn't restrict us anymore than we already are. It
just exposes those parts of the header files.

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Andrew Sullivan

On Fri, Oct 26, 2007 at 08:34:49AM -0400, Tom Lane wrote:
 we only check for the case once per checkpoint and we don't create a
 segment unless there's very little space left.

Sort of a filthy hack, but what about always having an _extra_
segment around?  The bgwriter could do that, no?

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] win32 threads patch vs beta2 - what to do?

2007-10-26 Thread Magnus Hagander

Joshua D. Drake wrote:
 On Fri, 26 Oct 2007 21:58:03 +0100
 Dave Page [EMAIL PROTECTED] wrote:
 
 Magnus Hagander wrote:
 I'm leaning towards applying the patch now, and hoping for 2b to
 happen.
 I think we should live with the mingw BF breakage for a day or two.
 The patch is clearly an important improvement, but it should be as
 widely tested as possible.
 
 I think this makes sense. I certainly don't want to hold back Beta2 and
 this patch so far is an obvious improvement.

I read the consensus of this thread as apply the patch as-is and let's
fix mingw as soon as we can, so this is what I've done. If I got it
wrong, feel free to back out :-)

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Tom Lane

Jeff Davis [EMAIL PROTECTED] writes:
 What's happening is that you have a checkpoint_timeout of 5 minutes, and
 that checkpoint must write a checkpoint record in the WAL, prompting the
 archiving.

 If you want it to happen less frequently, it's often safe to have
 checkpoint timeout set to something larger by a reasonable amount. 

I think you're confusing checkpoint_timeout and archive_timeout...

I seem to recall this behavior having been discussed before, but I
can't find it in the archives right now.  What is happening is that
after each checkpoint_timeout, we test to see if we need to write
a new checkpoint; which is determined by whether anything's been
inserted into WAL since the start of the last checkpoint.  And after
each archive_timeout, we test to see if we need to flush the current WAL
segment out to the archive; which is determined by whether the write
pointer is currently exactly at the start of a segment or not.

Which would be fine, except that the has anything been inserted
since last checkpoint test is actually done by seeing if the WAL
insert pointer has moved.  Which it will have, if we did an archive
flush.  And that means that each of these activities makes it look
to the other one like something has happened, and so you get a
checkpoint record every checkpoint_timeout seconds, and then we flush
the entire WAL file (containing only that record), even if the database
is in reality completely idle.  Lather, rinse, repeat.

In the prior discussion that I seem to remember, we didn't think of
a decent solution, and it kinda fell off the radar since zero-activity
isn't too interesting to a lot of folks.  However, chewing on it again
I think I've come up with a good idea that will fix this and actually
simplify the code a bit:

* Add a boolean flag insertedXLog to XLogCtlInsert, which means at
least one WAL record has been inserted since start of last checkpoint.
Also add a flag completedCkpt somewhere in XLogCtlData, which means
checkpoint successfully completed; this second flag is only used by
checkpoint so it can be considered as being protected by the
CheckpointLock.  At startup we can initialize insertedXLog = false,
completedCkpt = true.

* XLogInsert sets insertedXLog to true while holding WALInsertLock,
*except* when inserting either a checkpoint record or an xlog switch
record; in those cases it doesn't change the flag.

* CreateCheckpoint replaces its current rather complex test (lines
5693-5703 in CVS-tip xlog.c) with if insertedXLog is clear and
completedCkpt is set, we need not checkpoint.  If it does have
to perform a checkpoint, it clears both flags before releasing
WALInsertLock.

* After successful completion of a checkpoint, completedCkpt gets set.

Because insertedXLog is cleared at the same time the checkpoint's REDO
pointer is determined, this will correctly implement the requirement of
detecting whether anything has been inserted since the last REDO point.
This replaces the current indirect test involving comparing the last
checkpoint's REDO pointer to its own address.  However we have to not
set insertedXLog when we finally do insert the checkpoint record, thus
the special case is needed in XLogInsert.  The other special case of
ignoring xlog switch is what's needed to fix the bug, and is obviously
OK because an xlog switch doesn't represent a checkpointable change.

The reason we need the completedCkpt flag is that if a checkpoint fails
partway through, it would nonetheless have cleared insertedXLog, and
we don't want that to cause us to not retry the checkpoint next time.

This is slightly warty but it certainly seems a lot clearer than the
current test in lines 5693-5703.  The couple of lines to be added to
XLogInsert should have negligible performance impact.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Tom Lane

Jeff Davis [EMAIL PROTECTED] writes:
 Keep in mind that even in the current system, your configuration is
 variable based on the checkpoint_timeout setting.

Yeah, and he has to keep this less than archive_timeout in order for
it to work the way he wants, which is probably not good for performance.
(Sane settings of checkpoint_timeout are probably higher, not lower,
than what people are likely to use for archive_timeout.)

I think my recommendation to Kevin would be to force some trivial
transaction to occur a little before each expected archive_timeout,
so that there will be something to be archived.  This would have the
additional advantage that the monitor is checking that the database is
actually responding to queries, whereas just noting that it's spitting
out WAL files doesn't really prove that --- especially not if mere
no-op checkpoints can cause WAL files to be emitted.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Kevin Grittner

 On Fri, Oct 26, 2007 at  5:47 PM, in message [EMAIL PROTECTED],
Tom Lane [EMAIL PROTECTED] wrote: 
 
 And after
 each archive_timeout, we test to see if we need to flush the current WAL
 segment out to the archive; which is determined by whether the write
 pointer is currently exactly at the start of a segment or not.
 
Hmmm...  We would actually prefer to get the WAL file at the
specified interval.  We have software to ensure that the warm
standby instances are not getting stale, and that's pretty simple
with the current behavior.  We don't have a bandwidth or storage
space issue because we zero out the unused portion of the WAL file
and gzip it -- an empty file's about 16 KB.  Checking that the whole
system is healthy gets a lot more complicated if we stop sending
empty WAL files.
 
Could this at least be a configurable option?
 
-Kevin
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Tom Lane

Kevin Grittner [EMAIL PROTECTED] writes:
 On Fri, Oct 26, 2007 at  5:47 PM, in message [EMAIL PROTECTED],
 Tom Lane [EMAIL PROTECTED] wrote: 
 And after
 each archive_timeout, we test to see if we need to flush the current WAL
 segment out to the archive; which is determined by whether the write
 pointer is currently exactly at the start of a segment or not.
 
 Hmmm...  We would actually prefer to get the WAL file at the
 specified interval.

Well, if it's a feature not a bug, that's fine with me.  I wonder though
how predictable the behavior will really be with 8.3's distributed
checkpoints ... you might need to find another way anyhow.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Jeff Davis

On Fri, 2007-10-26 at 18:06 -0500, Kevin Grittner wrote:
  On Fri, Oct 26, 2007 at  5:47 PM, in message [EMAIL PROTECTED],
 Tom Lane [EMAIL PROTECTED] wrote: 
  
  And after
  each archive_timeout, we test to see if we need to flush the current WAL
  segment out to the archive; which is determined by whether the write
  pointer is currently exactly at the start of a segment or not.
  
 Hmmm...  We would actually prefer to get the WAL file at the
 specified interval.  We have software to ensure that the warm
 standby instances are not getting stale, and that's pretty simple
 with the current behavior.  We don't have a bandwidth or storage
 space issue because we zero out the unused portion of the WAL file
 and gzip it -- an empty file's about 16 KB.  Checking that the whole
 system is healthy gets a lot more complicated if we stop sending
 empty WAL files.
  
 Could this at least be a configurable option?
  

A good point.

Keep in mind that even in the current system, your configuration is
variable based on the checkpoint_timeout setting.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Jeff Davis

On Fri, 2007-10-26 at 18:47 -0400, Tom Lane wrote:
 Jeff Davis [EMAIL PROTECTED] writes:
  What's happening is that you have a checkpoint_timeout of 5 minutes, and
  that checkpoint must write a checkpoint record in the WAL, prompting the
  archiving.
 
  If you want it to happen less frequently, it's often safe to have
  checkpoint timeout set to something larger by a reasonable amount. 
 
 I think you're confusing checkpoint_timeout and archive_timeout...

Thanks for clarifying it. The user-visible behavior, as I understand it,
is that the time between archiving on an idle database is:
MAX(archive_timeout,checkpoint_timeout)

[ of course, there's no guarantee that the archive_command succeeds in
that time ]

It looks like checkpoint_timeout was the limiting factor, in his case.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Greg Smith


On Fri, 26 Oct 2007, Andrew Sullivan wrote:


Sort of a filthy hack, but what about always having an _extra_
segment around?  The bgwriter could do that, no?


Now it could.  The bgwriter in =8.2 stops executing when there's a 
checkpoint going on, and needing more WAL segments because a checkpoint is 
taking too long is one of the major failure cases where proactively 
creating additional segments would be most helpful.


The 8.3 bgwriter keeps running even during checkpoints, so it's feasible 
to add such a feature now.  But that only became true well into the 8.3 
feature freeze, after some changes Heikki made just before the load 
distributed checkpoint patch was commited.  Before that, it was hard to 
implement this feature; afterwards, it was too late to fit the change into 
the 8.3 release.  Should be easy enough to add to 8.4 one day.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Jeff Davis

On Fri, 2007-10-26 at 18:06 -0500, Kevin Grittner wrote:
 Hmmm...  We would actually prefer to get the WAL file at the
 specified interval.  We have software to ensure that the warm
 standby instances are not getting stale, and that's pretty simple
 with the current behavior.  We don't have a bandwidth or storage

Another thought: when you say it's pretty simple, what do you do now?
My monitoring scripts for this particular situation employ some pretty
ugly code.

I think if this did get changed, I would change my script to monitor the
pg_current_xlog_location() of the primary database and compare to the
last restored log file... entry in the standby database's log.

I would think if the current location does not end in all zeros, you
should expect a new WAL segment to be archived soon. Although this
assumes that an idle database would not advance that location at all,
and I'm still trying to understand Tom's proposal well enough to know
whether that would be true or not.

If this doesn't get changed, I think we should archive every
archive_timeout seconds, rather than 
MAX(archive_timeout,checkpoint_timeout), which is less obvious.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Tom Lane

Jeff Davis [EMAIL PROTECTED] writes:
 I would think if the current location does not end in all zeros, you
 should expect a new WAL segment to be archived soon. Although this
 assumes that an idle database would not advance that location at all,
 and I'm still trying to understand Tom's proposal well enough to know
 whether that would be true or not.

With my proposal, after the last activity, you'd get a checkpoint, and
then at the next archive_timeout we'd advance the pointer to a segment
boundary and archive the old segment, and then nothing more would happen
until the next WAL-loggable update.  So yeah, the master's
pg_current_xlog_location could be expected to sit at a segment boundary
while it was idle.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Tom Lane

Greg Smith [EMAIL PROTECTED] writes:
 The 8.3 bgwriter keeps running even during checkpoints, so it's feasible 
 to add such a feature now.

I wonder though whether the walwriter wouldn't be a better place for it.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Feature Freeze date for 8.4

2007-10-26 Thread Josh Berkus

Tom,

 This seems pretty entirely orthogonal to the commit-fest proposal.
 I see no reason to think that snapshots taken at those times would
 be any better than any other nightly snapshot, nor any reason
 to memorialize them in an archive.

I can see that.  And it would be pretty hard to keep members of the public 
from treating the persistent snapshots as releases, and then flooding the 
lists when they break.

So, scratch an idea with unintended consequences.  I'll just have to keep 
the snapshots on our CVS slave.

I do hope that the commit-fests will lead to more testing, though, just 
because people waiting for a specific feature will want to try out the 
initial commit of that feature.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Avoiding planning redundant backwards merges

2007-10-26 Thread Tom Lane

While fooling around with the planner performance bug reported here:
http://archives.postgresql.org/pgsql-bugs/2007-10/msg00173.php
I noticed that even after fixing the bug, 8.3 seemed to plan this
many-way join about 50% slower than 8.2 :-(.  After some digging
I understand the reason: 8.3 will almost always consider both a
forward and a backward indexscan on each potentially useful index.
This results in twice as many pre-sorted paths available to
match_unsorted_outer(), and therefore about twice as much work
generating paths that are exactly equivalent cost-wise but generate
opposite output orders.

These extra paths are not without use, now that we have the ability to
declare DESC index columns: the only way to produce an indexed mergejoin
between an ASC and a DESC index is to scan one of them backwards.  So I
don't want to lobotomize the code entirely.  But we're paying a pretty
high price for that capability.

The idea I'm toying with is to make pathkeys_useful_for_merging()
consider only ASC pathkeys as useful for merging --- that is, only
pathkeys with pk_strategy = BTLessStrategyNumber.  This would mean that
only forward scans on ASC indexes and backward scans on DESC indexes
would be considered to have interesting sort orders, and therefore
in cases without any ORDER BY clause to worry about, the other indexscan
path would not survive the initial competition in add_path.  It'd be
seen as having the same cost and worse ordering, and would be dropped.

Now the tricky point in this is that if there's an ORDER BY on the
query, then you might want a backwards mergejoin after all, if
that means you come out with the right ordering for the ORDER BY.
However, if there is such an ORDER BY, then pathkeys_useful_for_ordering
will pick up on it and the opposite-direction indexscan will survive
after all.  (The number of pathkeys we keep for a path is the larger
of these two functions' estimates, and paths with unequal pathkeys
do not compete in add_path.)  We'll plan out all the alternatives
from both starting points, and eventually figure out at the top level
that one of them avoids a final sort and is therefore cheaper.  So
in this not-too-common case, we'll be slower than 8.2 but also produce
a better plan; I don't feel too bad about that.

I admit this seems a bit Rube Goldbergian, but then again there is a
whole lot of the planner's behavior that is emergent from the interplay
of little pieces rather than being explicitly coded in one place.

Comments, better ideas?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Greg Smith


On Fri, 26 Oct 2007, Tom Lane wrote:


The 8.3 bgwriter keeps running even during checkpoints, so it's feasible
to add such a feature now.

I wonder though whether the walwriter wouldn't be a better place for it.


I do, too, but that wasn't available until too late in the 8.3 cycle to 
consider adding this feature to there either.


There's a couple of potential to-do list ideas that build on the changes 
in this area in 8.3:


-Aggressively pre-allocate WAL segments 
-Space out checkpoint fsync requests in addition to disk writes
-Consider re-inserting a smarter bgwriter all-scan that writes sorted by 
usage count during idle periods


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] URGENT HELP about 'duration' stats

2007-10-26 Thread Camilo Porto

Hi,

I
have been doing my master degree research  and  I am using Postgresql
8.2.5. I would appreciate if anyone could help me, ASAP, concerning the
following question:


How can I obtain the time spent by the Postgresql  EXECUTOR  during a given 
time interval? 
 
I am trying to get the above information in the following way::
 
Using
the BenchmarkSQL Software, I submit random transactions (by simulating
the TPC-C benchmark) during a time interval (say 30 seconds). 

PostgreSQL then generates statistics for each phase that
a SQL Command is executed. In thje following, it is shown the time
duration of the EXECUTE Phase of some SQL Command:
 
.
.
.
.
2007-10-11 15:49:
05.221 BRT [46994] LOG:  duration: 0.012 ms  execute S_1: BEGIN
2007-10-11
15:49:05.430 BRT [46994] LOG:  duration: 94.073 ms  execute
unnamed: SELECT c_discount, c_last, c_credit, w_tax  FROM
customer, warehouse WHERE

 w_id = $1 AND w_id = c_w_id AND c_d_id = $2 AND c_id = $3 
2007-10-11
15:49:05.462 BRT [46994] LOG:  duration: 13.662 ms  execute
unnamed: SELECT d_next_o_id, d_tax FROM district WHERE d_id =
$1 AND d_w_id = $2 FO

R UPDATE
2007-10-11 15:49:05.562 BRT [46994] LOG:  duration:
98.818 ms  execute unnamed: INSERT INTO NEW_ORDER (no_o_id,
no_d_id, no_w_id) VALUES ( $1, $2, $3)
2007-10-11 15:49:05.579 BRT [46994] LOG:  duration: 
15.459 ms  execute unnamed: UPDATE district SET d_next_o_id = d_next_o_id + 1 
 WHERE d_id = $1 AND d_w 
_id = $2
2007-10-11
15:49:05.748 BRT [46994] LOG:  duration: 122.387 ms  execute
unnamed: INSERT INTO OORDER  (o_id, o_d_id, o_w_id, o_c_id,
o_entry_d, o_ol_cnt, o

_all_local) VALUES ($1, $2, $3, $4, $5, $6, $7) 
2007-10-11
15:49:05.779 BRT [46994] LOG:  duration: 29.787 ms  execute
unnamed: SELECT i_price, i_name , i_data FROM item WHERE i_id =
$1
2007-10-11 15:49:
05.846 BRT [46994] LOG:  duration: 64.963 ms  execute unnamed:
SELECT s_quantity, s_data, s_dist_01, s_dist_02, s_dist_03, s_dist_04,
s_d 
ist_05,s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10 FROM stock 
WHERE s_i_id = $1 AND s_w_id = $2 FOR UPDATE

2007-10-11 15:49:05.882 BRT [46994] LOG:  duration: 33.865 ms 
execute unnamed: SELECT i_price, i_name , i_data FROM item
WHERE i_id = $1 
2007-10-11 15:49:05.931 BRT [46994] LOG: 
duration: 48.075 ms  execute unnamed: SELECT s_quantity,
s_data, s_dist_01, s_dist_02, s_dist_03, s_dist_04, s_d

ist_05,s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10 FROM stock 
WHERE s_i_id = $1 AND s_w_id = $2 FOR UPDATE 
2007-10-11
15:49:05.962 BRT [46994] LOG:  duration: 28.782 ms  execute
unnamed: SELECT i_price, i_name , i_data FROM item WHERE i_id =
$1

2007-10-11
15:49:06.002 BRT [46994] LOG:  duration: 38.026 ms  execute
unnamed: SELECT s_quantity, s_data, s_dist_01, s_dist_02,
s_dist_03, s_dist_04, s_d 
ist_05,s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10 FROM stock 
WHERE s_i_id = $1 AND s_w_id = $2 FOR UPDATE

2007-10-11 15:49:06.027 BRT [46994] LOG:  duration: 23.220 ms 
execute unnamed: SELECT i_price, i_name , i_data FROM item
WHERE i_id = $1 
2007-10-11 15:49:06.043 BRT [46994] LOG: 
duration: 14.308 ms  execute unnamed: SELECT s_quantity,
s_data, s_dist_01, s_dist_02, s_dist_03, s_dist_04, s_d

ist_05,s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10 FROM stock 
WHERE s_i_id = $1 AND s_w_id = $2 FOR UPDATE 
.
.
.
.

Trying to answer my initial question, I am doing something like this:


1. Through the statements TIMESTAMP, I calculate the total time
interval in which all the statments had been executed: (END
TIMESTAMP(2007-10-11 15:49: 06.043) - BEGIN TIMESTAMP (2007-10-11
15:49:05.221))
2. Then I sum all the 'duration' fields of all
EXECUTOR Sections of all statements executed. This, in thesis, give me
the amount of time that the executor had been used during the total
time interval in which the statements had been executed)


The problem I have encountered is that the sum of executor's
duration time is, *sometimes*, bigger than the total time interval in
which the statements had been executed!! And this makes no sense!


In some tests I have done, I have found these results:
- Total Time interval which BenchmarkSQL submitted queries = 30 seconds

- sum of executor's duration time during this period of time = 2 minutes and 36 
seconds!! (5x longer than total time) 


How could the sum of duration time of executor be longer than the sum of 
duration of the statements that use it?


Can anyone, please, help me to solve this? Many thanks, Camilo
_
Veja mapas e encontre as melhores rotas para fugir do trânsito com o Live 
Search Maps!
http://www.livemaps.com.br/index.aspx?tr=true

Re: [HACKERS] URGENT HELP about 'duration' stats

2007-10-26 Thread Tom Lane

Camilo Porto [EMAIL PROTECTED] writes:
 The problem I have encountered is that the sum of executor's
 duration time is, *sometimes*, bigger than the total time interval in
 which the statements had been executed!! And this makes no sense!

Umm ... why not?  If you have, say, two queries executing in parallel
for 1 second, they'll each report a duration: of 1 second, thus summing
to 2 seconds, but the elapsed time was only 1 second.

If you don't see that always, then your benchmark program isn't trying
very hard to run more than one query in parallel ...

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

59 matches

Mail list logo