Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries

2006-09-12 Thread Albe Laurenz
Kevin Brown wrote:
 Let me expand a little on some of the peculiarities of
 shared libraries on AIX:
 
 - A normal AIX shared library is called libXX.a
   It is an 'ar' archive that contains the shared object(s).
 
 Ah, so the problem really boils down to funny naming conventions.
 If they use .a for both shared and static libraries, how does
anyone
 tell the difference?
 
 It sounds to me like there is no difference.  Notice how his example
 ldd output shows dependencies on specific .o entries within the
 various .a files that reside on the system, rather than on the .a
 files as a whole.  If those entries had been statically linked then
 they wouldn't have shown up in the ldd output at all.

That is not entirely true.
The difference between a static and a shared library on AIX
is that the *.o files in a dynamic library are dynamic objects,
produced by the linker (what is called *.so in Linux), and the
*.o files in a static library are the output of the compiler
(what is called *.o in Linux).

What IS true is that you can do a static build against a dynamic
library. Against a static library you can only do static builds.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Lock partitions

2006-09-12 Thread Simon Riggs
On Mon, 2006-09-11 at 11:29 -0400, Tom Lane wrote:
 Mark Wong [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  It would be nice to see some results from the OSDL tests with, say, 4,
  8, and 16 lock partitions before we forget about the point though.
  Anybody know whether OSDL is in a position to run tests for us?
 
  Yeah, I can run some dbt2 tests in the lab.  I'll get started on it. 
  We're still a little bit away from getting the automated testing for 
  PostgreSQL going again though.
 
 Great, thanks.  The thing to twiddle is LOG2_NUM_LOCK_PARTITIONS in
 src/include/storage/lwlock.h.  You need a full backend recompile
 after changing it, but you shouldn't need to initdb, if that helps.

IIRC we did that already and the answer was 16...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Simon Riggs
On Mon, 2006-09-11 at 06:20 -0700, Say42 wrote:
 I intend to play with some optimizer aspects. Just for fun. 

Cool. If you think its fun (it is), you're half way there.

 I'm a
 novice in the DBMS development so I can not promise any available
 results but if it can be useful even as yet another failed attempt I
 will try.

This type of work is 90% analysis, 10% coding. You'll need to do a lot
of investigation, lots of discussion and listening.

 That's what I want to do:
 1. Replace not very useful indexCorrelation with indexClustering.

An opinion such as not very useful isn't considered sufficient
explanation or justification for a change around here.

 2. Consider caching of inner table in a nested loops join during
 estimation total cost of the join.
 
 More details:
 1. During analyze we have sample rows. For every N-th sample row we can
 scan indices on qual like 'value = index_first_column' and fetch first
 N row TIDs. To estimate count of fetched heap pages is not hard. To
 take the index clustering value just divide the pages count by the
 sample rows count.
 2. It's more-more harder and may be impossible to me at all. The main
 ideas:
 - split page fetches cost and CPU cost into different variables and
 don't summarize it before join estimation.
 - final path cost estimation should be done in the join cost estimation
 and take into account number of inner table access (=K). CPU cost is
 directly proportionate to K but page fetches can be estimated by
 Mackert and Lohman formula using the total tuples count (K *
 inner_table_selectivity * inner_table_total_tuples).

I'd work on one thing at a time and go into it deeply.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fixed length data types issue

2006-09-12 Thread Simon Riggs
On Mon, 2006-09-11 at 14:25 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Is this an 8.2 thing?
 
 You are joking, no?

Confirming, using an open question, and a smile.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Buildfarm vs. Linux Distro classification

2006-09-12 Thread Gregory Stark

Alvaro Herrera [EMAIL PROTECTED] writes:

 Christopher Browne wrote:

 It seems to me that there is some value in putting together a script
 that tries to identify some of the interesting bits of the toolchain.

 Yeah; but why not just a bunch of commands, some of which are expected
 to work on any particular machine, and save the whole output as a single
 string?  It's not very clean, but should get the important details.  To
 support a new machine, just add more commands to the script.

 A simple version of this, based on your Mark 0, could be:

 uname -a
 $CC --version
 $CC -V
 $CC -v
 ls -l /lib/libc.so*

 No need to comment/uncomment anything.

I would have said ldd postgres would work on any ELF system and show you all
the library so versions it depends on. I guess that only helps if it actually
builds and then fails the regression tests -- not if the build fails.

On Debian it would be useful to do something like below. Though note that a)
this depends on having a postgres package installed which the build machines
may not have and b) it shows the libraries that package depends on not the
versions of the *-dev packages installed.


[EMAIL PROTECTED]:~$ reportbug --offline --template postgresql-8.1 2/dev/null 
| sed '1,/^-- System Information/d'
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.17.7-swsusp2
Locale: LANG=en_GB, LC_CTYPE=en_GB (charmap=ISO-8859-1)

Versions of packages postgresql-8.1 depends on:
ii  libc62.3.6.ds1-4 GNU C Library: Shared libraries
ii  libcomerr2   1.39-1  common error description library
ii  libkrb53 1.4.3-8 MIT Kerberos runtime libraries
ii  libpam0g 0.79-3.1Pluggable Authentication Modules l
ii  libpq4   8.1.4-5 PostgreSQL C client library
ii  libssl0.9.8  0.9.8b-2SSL shared libraries
ii  postgresql-client-8.18.1.4-4 front-end programs for PostgreSQL 
ii  postgresql-common57  manager for PostgreSQL database cl

postgresql-8.1 recommends no packages.

-- no debconf information



-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Bitmap index status

2006-09-12 Thread Heikki Linnakangas

Hi,

What's the status of the bitmap index patch? Have you worked on it since 
the last posted patch 
(http://archives.postgresql.org/pgsql-patches/2006-08/msg3.php)?


I've started to review it, to get it into CVS early in the 8.3 cycle. I 
just want to make sure that I'm working on the latest version.


Beside the issues already discussed, I found two minor bugs:
* pg_am says that bitmap am supports unique indexes, while it doesn't. 
Second,
* race condition in _bitmap_inserttuple if two backends try to insert 
the same, new value. If they both find that there's no lov item for the 
key, and try to create one, one backend will get a duplicate key error 
on the lov index.


Also, vacuum actually does a reindex, which seems awfully wasteful. That 
needs to be looked at.


--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes:

 That's what I want to do:
 1. Replace not very useful indexCorrelation with indexClustering.

 An opinion such as not very useful isn't considered sufficient
 explanation or justification for a change around here.

There's been some previous discussion about how correlation was not really
what we wanted to be measuring. But that discussion was in regards to
cross-column correlation. In that case we're trying to predict how selective
a clause will be. If we read x% of the table due to a restriction on X what
percentage of the values of Y will be represented?

In this case I think we do need to know correlation or something like it.
That's because what we're trying to predict is how close to sequential the i/o
accesses will be. If there's no correlation between index order and disk order
then they'll be random. If they're highly correlated then accesses will be
close to sequential.

It's possible there's some sort of block-wise correlated measure which would
be even better for our needs. We don't care if all the high values are towards
the start and low values towards the end as long as each section is in order,
for example.

It's also possible that we could use something like what you describe to
predict how many physical i/os will happen altogether. If the table is highly
clustered but disordered then the io will be random access but the cache will
be more effective than if the table is highly correlated but not clustered
(though it would take a large table to make that possible I think).

In short I think what's needed is someone to review a lot of different stats
metrics for correlation and clustering and do some analysis of how each would
be useful for cost modelling. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Fixed length data types issue

2006-09-12 Thread Gregory Stark

Alvaro Herrera [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 
 Well char doesn't have quite the same semantics as CHAR(1). If that's the
 consensus though then I can work on either fixing char semantics to match
 CHAR(1) or adding a separate type instead.

 What semantics?  

The main bit that comes to mind is 32::CHAR(1) give you '3' but 32::char
gives you ' '.

Really it makes more sense if you think of char is a 1 byte integer type
with some extra text casts and operators to make C programmers happy, not a 1
byte character type.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-12 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Stefan Kaltenbrunner [EMAIL PROTECTED] writes:

btw - the hashjoin is bad was more or less based on the observation
that nearly all of the cpu is burned in hash-related functions in the
profile (when profiling over a longer period of time those accumulate
even more % of the time than in the short profile I included in the
original report)


[ shrug... ]  Two out of the three functions you mentioned are not used
by hash join, and anyway the other plan probably has a comparable
execution density in sort-related functions; does that make it bad?


hmm sorry for that - I should have checked the source before I made that 
assumption :-(




It's possible that the large time for ExecScanHashBucket has something
to do with skewed usage of the hash buckets due to an unfortunate data
distribution, but that's theorizing far in advance of the data.


http://www.kaltenbrunner.cc/files/4/

has preliminary data of the dbt3/scaling 10 run I did which seems to 
imply we have at least 4 queries in there that take an excessive amount 
of time (query 5 is the one I started the complaint with).
However those results have to be taken with a graint of salt since there 
is an appearant bug in the dbt3 code which seems to rely on 
add_missing_from=on (as can be seen in some of the errorlogs of the 
database) and towards the end of the throughput run I did some of the 
explain analyzes for the report (those are the small 100% spikes in the 
graph due to the box using the second CPU to run them).

I will redo those tests later this week though ...

Stefan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Simon Riggs wrote:
 This type of work is 90% analysis, 10% coding. You'll need to do a lot
 of investigation, lots of discussion and listening.

I absolutely agree with you and I am not about to rush into coding
right now. First of all I'm going to dig a lot in the PG sources,
readme's and so on. It's a good school of coding and DBMS internals
understanding.

  That's what I want to do:
  1. Replace not very useful indexCorrelation with indexClustering.

 An opinion such as not very useful isn't considered sufficient
 explanation or justification for a change around here.

Sometimes the indexCorrelation even wrongful. There are many examples
of overestimation of index scan cost (data well-clustered but not
ordered - correlation is low) and some cases of underestimation when
tuples look like well ordered with high degree of correlation, but
index scan actually causes random page fetches (1-3-2-4-6-5, for
example. On server without RAID it is VERY slow. 25 times slower than
bitmap index scan). If we have special clustering measure we can more
precisely estimate pages count.
The next step could be to introduce 'ordering'  as a measure of
pages access sequentiality. Without the 'ordering' all we can
assume that pages are fetched in random order. Anyhow, if index access
cost is overestimated we can set random_page_cost=2. (Is it true in a
production database with smart RAID?)
Moreover, I think problem is more complex. With assumption that index
access is always random we dip in another problem: overestimation of
master table index scan. If it is small enough PG can choose seq scan
instead of index scan even if the last one actually much cheaper
because of caching. That is why caching should be taking into account
during joining cost calculation.

  2. Consider caching of inner table in a nested loops join during
  estimation total cost of the join.

 I'd work on one thing at a time and go into it deeply.

Good news. So I'm very interested in what you think about my ideas.
Is it wrong or too naive?


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries

2006-09-12 Thread Albe Laurenz
Tom Lane wrote:
 In our case, we have libpq.a and libpq.so in the same directory,
 so unless you link with -brtl you will get a static link
 (because libpq.a is a static library).
 
 I wonder whether we ought to suppress building (or at least
installing)
 our .a libraries at all on AIX.  Adding -btrl to LDFLAGS would help
 within the context of our own build, but external clients that link
 to libpq without saying that are going to get undesirable results.
 
 I think there's a reasonable argument that by installing a .a file
that
 isn't a shared library, we are violating the platform's conventions.

The natural way in AIX would be:
- Create libpq.so
- Create libpq.a by 'rm -f libpq.a; ar -rc libpq.a libpq.so'
- Install only libpq.a

For a static build on AIX, you have to specify all the libraries and
give the linker -bstatic and -bI:/lib/syscalls.exp

 Should -brtl be added to src/template/aix?
 
 Sounds that way, but that'll only help for psql and other stuff built
 within our build.  Could you try this against CVS tip:
 
 * add -brtl to LDFLAGS in the template
 * Remove the AIX-specific hack on $(libpq) at lines 349-354 of
   src/Makefile.global.in
 * see if it configures and builds

I have done that (see the attached patch) and it works fine.
I don't have the native AIX C compiler, so I could only test
it with gcc.

I have taken the liberty to modify the static link line
in Makefile.global.in to contain the LDAP libraries, I hope
that's appropriate.

Yours,
Laurenz Albe


aix.link.patch
Description: aix.link.patch

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] dump / restore functionality

2006-09-12 Thread Naz Gassiep
At risk of being chastised for reviving old issues, I was wondering, 
what are the chances were of getting the dump / restore selectivity into 
8.2 ? I am referring to the idea that, instead of the current 2 parts, a 
dump could be broken up into 3 parts, namely tables, data and everything 
else, so that data from one dump could be mixed and matched with schema 
defs from another dump easily and scriptably.


I think the previous discussion concluded that the functionality would 
be best implemented as a selective restore, rather than a breakable dump 
due to the risk of inconsistent restores, so you could restore just the 
tables, data or everything else components from a given dump.


Did this item make it onto the to-do list? If so, did anyone pick this 
up or will I be waiting until a future as-yet-undefined date?


More generally, is there a publicly accessible place one can see the 
to-do items, who has adopted which ones and what the status is on them? 
Sorry for asking this, but I am still a rather new participant in here.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Lock partitions

2006-09-12 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2006-09-11 at 11:29 -0400, Tom Lane wrote:
 Great, thanks.  The thing to twiddle is LOG2_NUM_LOCK_PARTITIONS in
 src/include/storage/lwlock.h.  You need a full backend recompile
 after changing it, but you shouldn't need to initdb, if that helps.

 IIRC we did that already and the answer was 16...

No, no one has shown me any numbers from any real tests (anything
more than pgbench on a Dell PC ...).

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


AIX shared libraries (was Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries)

2006-09-12 Thread Tom Lane
Albe Laurenz [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I think there's a reasonable argument that by installing a .a file that
 isn't a shared library, we are violating the platform's conventions.

 The natural way in AIX would be:
 - Create libpq.so
 - Create libpq.a by 'rm -f libpq.a; ar -rc libpq.a libpq.so'
 - Install only libpq.a

Hm.  This seems possible with some moderate hacking on Makefile.shlib
(certainly it'd be no more invasive than the existing Windows-specific
platform variants).  However, looking at what's already in
Makefile.shlib for AIX makes me doubt the above claim a bit, because
AFAICS libpq.so is produced from libpq.a on that platform.  Is it
possible that the rules have changed across AIX versions, and that the
code in there now is needful for older versions?

Another issue with installing only .a is that there's no provision
for versioning in .a library names ... what happens to someone who
needs two generations of libpq on his machine?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] currentItemData currentMarkData

2006-09-12 Thread Heikki Linnakangas
It has bothered me for some time that we have currentItemData and 
currentMarkData fields in IndexScanDescData, while it's really indexam's 
private data. They should be in the indexam opaque structure.


At the moment, they're used in gist and hash access methods, and they're 
not used in b-tree and gin. The new bitmap index code uses them, but 
only as an internal boolean flag indicating that the scan has been 
initialized, so that should be fixed anyway.


How about removing those fields from IndexScanDesc?

There's also is_multiscan field which is set but never read.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Bitmap index status

2006-09-12 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 What's the status of the bitmap index patch? Have you worked on it since 
 the last posted patch 
 (http://archives.postgresql.org/pgsql-patches/2006-08/msg3.php)?

Gavin and Jie have made major changes since that version (or at least
they'd better have something to show for the month since then ;-)).
I wouldn't recommend reviewing the patch until they post something
current ...

 Also, vacuum actually does a reindex, which seems awfully wasteful. That 
 needs to be looked at.

Yikes.  I imagine they've not tried to do anything about that; if you
want to help, maybe you could take that subproblem?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] dump / restore functionality

2006-09-12 Thread Peter Eisentraut
Am Dienstag, 12. September 2006 15:22 schrieb Naz Gassiep:
 At risk of being chastised for reviving old issues, I was wondering,
 what are the chances were of getting the dump / restore selectivity into
 8.2 ?

Zero, because feature freeze is over.

 Did this item make it onto the to-do list? If so, did anyone pick this
 up or will I be waiting until a future as-yet-undefined date?

If you find this feature interesting, you are free to drive the development 
yourself, independent of it appearing on any list.  To avoid tears later on, 
look for a consensus about the merit of the feature first, though.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Peter Eisentraut
Am Dienstag, 12. September 2006 12:48 schrieb Say42:
 That is why caching should be taking into account
 during joining cost calculation.

If you know of a more effective way to do that beyond the effective_cache_size 
parameter that we have now, let us know.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 It's possible there's some sort of block-wise correlated measure
 which would be even better for our needs.

Actually, it seems obvious to me that the correlation measure ought to
ignore within-block ordering, but right now it does not.  OTOH it's not
clear how important that is, as on a decent-size table you'll probably
not have more than one sample row in a block anyway.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] dump / restore functionality

2006-09-12 Thread Naz Gassiep




Zero, because feature freeze is over.
  

Aah yes, fair enough
If you find this feature interesting, you are free to drive the development 
yourself, independent of it appearing on any list.  To avoid tears later on, 
look for a consensus about the merit of the feature first, though
This has been discussed already, and there was a not insignificant 
amount of support from it, IIRC Tom Lane agreed that such functionality 
would be useful.


Tom, are you aware if this item made it onto the to-do list?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] dump / restore functionality

2006-09-12 Thread Tom Lane
Naz Gassiep [EMAIL PROTECTED] writes:
 At risk of being chastised for reviving old issues, I was wondering, 
 what are the chances were of getting the dump / restore selectivity into 
 8.2 ?

None, but feel free to start coding for 8.3.

 I am referring to the idea that, instead of the current 2 parts, a 
 dump could be broken up into 3 parts, namely tables, data and everything 
 else, so that data from one dump could be mixed and matched with schema 
 defs from another dump easily and scriptably.

That seems like a rather spectacular overstatement of the likely
benefits, not to mention a misdescription of what was discussed.

AFAIR what was discussed was separating
- schema stuff needed before loading data
- table data
- schema stuff needed after loading data
where the last category boils down to indexes and then foreign keys.
All the other stuff such as functions really needs to be in the
first part ... or at least there's no visible benefit to delaying
loading it.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] currentItemData currentMarkData

2006-09-12 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 It has bothered me for some time that we have currentItemData and 
 currentMarkData fields in IndexScanDescData, while it's really indexam's 
 private data. They should be in the indexam opaque structure.

Can't get very excited about it, but if you want.

 There's also is_multiscan field which is set but never read.

It has been needed in the past and could plausibly be needed again
sometime --- an index AM's rescan method might want to know what kind
of scan it's setting up for.  Of course, if we get rid of amgetmulti
entirely as has been discussed, this'd go away too.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] dump / restore functionality

2006-09-12 Thread Joshua D. Drake



That seems like a rather spectacular overstatement of the likely
benefits, not to mention a misdescription of what was discussed.

AFAIR what was discussed was separating


Yes, that is what was discussed.


- schema stuff needed before loading data
- table data
- schema stuff needed after loading data
where the last category boils down to indexes and then foreign keys.
All the other stuff such as functions really needs to be in the
first part ... or at least there's no visible benefit to delaying
loading it.


Right. This breakdown I still think would be useful. An additional item 
that would be useful is to allow pg_restore to restore plain text dumps.


Sincerely,

Joshua D. Drake





regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Lock partitions

2006-09-12 Thread Strong, David
We can pass on what we've seen when running tests here with different
BufMapping and LockMgr partition sizes.

We use a TPC-C inspired benchmark. Currently it is configured to run 25
backend processes. The test runs for 16 minutes as this is the minimum
amount of time we can run and obtain useful information. This gives us
24,000 seconds (25 * 16 * 60) of processing time.

The following timings have been rounded to the nearest second and
represent the amount of time amongst all backend processes to acquire
and release locks. For example, a value of 2500 seconds would mean each
backend process (25) took ~100 seconds to acquire or release a lock.
Although, in reality, the time spent locking or releasing each partition
entry is not uniform and there are some definite hotspot entries. We can
pass on some of the lock output if anyone is interested.

When using 16 buffer and 16 lock partitions, we see that BufMapping
takes 809 seconds to acquire locks and 174 seconds to release locks. The
LockMgr takes 362 seconds to acquire locks and 26 seconds to release
locks.

When using 128 buffer and 128 lock partitions, we see that BufMapping
takes 277 seconds (532 seconds improvement) to acquire locks and 78
seconds (96 seconds improvement) to release locks. The LockMgr takes 235
seconds (127 seconds improvement) to acquire locks and 22 seconds (4
seconds improvement) to release locks.

Overall, 128 BufMapping partitions improves locking/releasing by 678
seconds, 128 LockMgr partitions improves locking/releasing by 131
seconds.

With the improvements in the various locking times, one might expect an
improvement in the overall benchmark result. However, a 16 partition run
produces a result of 198.74 TPS and a 128 partition run produces a
result of 203.24 TPS.

Part of the time saved from BufMapping and LockMgr partitions is
absorbed into the WALInsertLock lock. For a 16 partition run, the total
time to lock/release the WALInsertLock lock is 5845 seconds. For 128
partitions, the WALInsertLock lock takes 6172 seconds, an increase of
327 seconds. Perhaps we have our WAL configured incorrectly?

Other static locks are also affected, but not as much as the
WALInsertLock lock. For example, the ProcArrayLock lock increases from
337 seconds to 348 seconds. The SInvalLock lock increases from 317
seconds to 331 seconds.

Due to expansion of time in other locks, a 128 partition run only spends
403 seconds less in locking than a 16 partition run.

We can generate some OProfile statistics, but most of the time saved is
probably absorbed into functions such as HeapTupleSatisfiesSnapshot and
PinBuffer which seem to have a very high overhead.

David

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Simon Riggs
Sent: Tuesday, September 12, 2006 1:37 AM
To: Tom Lane
Cc: Mark Wong; Bruce Momjian; PostgreSQL-development
Subject: Re: [HACKERS] Lock partitions

On Mon, 2006-09-11 at 11:29 -0400, Tom Lane wrote:
 Mark Wong [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  It would be nice to see some results from the OSDL tests with, say,
4,
  8, and 16 lock partitions before we forget about the point though.
  Anybody know whether OSDL is in a position to run tests for us?
 
  Yeah, I can run some dbt2 tests in the lab.  I'll get started on it.

  We're still a little bit away from getting the automated testing for

  PostgreSQL going again though.
 
 Great, thanks.  The thing to twiddle is LOG2_NUM_LOCK_PARTITIONS in
 src/include/storage/lwlock.h.  You need a full backend recompile
 after changing it, but you shouldn't need to initdb, if that helps.

IIRC we did that already and the answer was 16...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: AIX shared libraries (was Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries)

2006-09-12 Thread Albe Laurenz
Tom Lane wrote:
 The natural way in AIX would be:
 - Create libpq.so
 - Create libpq.a by 'rm -f libpq.a; ar -rc libpq.a libpq.so'
 - Install only libpq.a
 
 Hm.  This seems possible with some moderate hacking on Makefile.shlib
 (certainly it'd be no more invasive than the existing Windows-specific
 platform variants).  However, looking at what's already in
 Makefile.shlib for AIX makes me doubt the above claim a bit, because
 AFAICS libpq.so is produced from libpq.a on that platform.  Is it
 possible that the rules have changed across AIX versions, and that the
 code in there now is needful for older versions?

I don't think that this behaviour has changed. I remember it from
AIX 4.3.2.

Of course libpq.so is created from (the static) libpq.a.
But once you have the dynamic library, you can link statically
against it.

 Another issue with installing only .a is that there's no provision
 for versioning in .a library names ... what happens to someone who
 needs two generations of libpq on his machine?

Use different directories and set LIBPATH?
I don't know if there is a canonical way to do that. I'll investigate.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Bitmap index status

2006-09-12 Thread Jie Zhang
Hi Heikki,

Gavin and I are trying to merge our changes together this week. We will post
a new patch by the end of this week. This patch will include some style
fixes, bug fixes, and the stream bitmap implementation.

I will look into the problems you have mentioned in this email. Yes, vacuum
currently does a reindex now. Gavin and I just talked about this yesterday.
We are looking into ways to improve this. One way is not to do reindex for
each vacuum. We maintain a list of updated tids along with the bitmap index.
Only when this list goes to a certain point, vacuum will re-build the index.

Thanks,
Jie

On 9/12/06 2:43 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Hi,
 
 What's the status of the bitmap index patch? Have you worked on it since
 the last posted patch
 (http://archives.postgresql.org/pgsql-patches/2006-08/msg3.php)?
 
 I've started to review it, to get it into CVS early in the 8.3 cycle. I
 just want to make sure that I'm working on the latest version.
 
 Beside the issues already discussed, I found two minor bugs:
 * pg_am says that bitmap am supports unique indexes, while it doesn't.
 Second,
 * race condition in _bitmap_inserttuple if two backends try to insert
 the same, new value. If they both find that there's no lov item for the
 key, and try to create one, one backend will get a duplicate key error
 on the lov index.
 
 Also, vacuum actually does a reindex, which seems awfully wasteful. That
 needs to be looked at.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Lock partitions

2006-09-12 Thread Tom Lane
Strong, David [EMAIL PROTECTED] writes:
 When using 16 buffer and 16 lock partitions, we see that BufMapping
 takes 809 seconds to acquire locks and 174 seconds to release locks. The
 LockMgr takes 362 seconds to acquire locks and 26 seconds to release
 locks.

 When using 128 buffer and 128 lock partitions, we see that BufMapping
 takes 277 seconds (532 seconds improvement) to acquire locks and 78
 seconds (96 seconds improvement) to release locks. The LockMgr takes 235
 seconds (127 seconds improvement) to acquire locks and 22 seconds (4
 seconds improvement) to release locks.

While I don't see any particular penalty to increasing
NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very
significant penalty (increasing PGPROC size as well as the work needed
during LockReleaseAll, which is executed at every transaction end).
I think 128 lock partitions is probably verging on the ridiculous
... particularly if your benchmark only involves touching half a dozen
tables.  I'd be more interested in comparisons between 4 and 16 lock
partitions.  Also, please vary the two settings independently rather
than confusing the issue by changing them both at once.

 With the improvements in the various locking times, one might expect an
 improvement in the overall benchmark result. However, a 16 partition run
 produces a result of 198.74 TPS and a 128 partition run produces a
 result of 203.24 TPS.

 Part of the time saved from BufMapping and LockMgr partitions is
 absorbed into the WALInsertLock lock. For a 16 partition run, the total
 time to lock/release the WALInsertLock lock is 5845 seconds. For 128
 partitions, the WALInsertLock lock takes 6172 seconds, an increase of
 327 seconds. Perhaps we have our WAL configured incorrectly?

I fear this throws your entire measurement procedure into question.  For
a fixed workload the number of acquisitions of WALInsertLock ought to be
fixed, so you shouldn't see any more contention for WALInsertLock if the
transaction rate didn't change materially.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Lock partitions

2006-09-12 Thread Strong, David
Tom,

Thanks for the feedback. We'll run a few tests with differing buffer and
lock partition sizes in the range you're interested in and we'll let you
know what we see.

Our workload is not fixed, however. Our benchmark does not follow the
strict TPC-C guideline of using think times etc. We throw as many
transactions at the database as we can. So, when any time is freed up,
we will fill it with another transaction. We simply want to stress as
much as we can. As one bottleneck is removed, the time saved obviously
flows to the next. 

Postgres 8.2 moves some of the time that used to be consumed by single
BufMappingLock and LockMGRLock locks to the WALInsertLock lock. We have
run tests where we made XLogInsert a NOP, because we wanted to see where
the next bottleneck would be, and some of the time occupied by
WALInsertLock lock was absorbed by the SInvalLock lock. We have not
tried to remove the SInvalLock lock to see where time flows to next, but
we might.

David

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 12, 2006 9:40 AM
To: Strong, David
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Lock partitions 

Strong, David [EMAIL PROTECTED] writes:
 When using 16 buffer and 16 lock partitions, we see that BufMapping
 takes 809 seconds to acquire locks and 174 seconds to release locks.
The
 LockMgr takes 362 seconds to acquire locks and 26 seconds to release
 locks.

 When using 128 buffer and 128 lock partitions, we see that BufMapping
 takes 277 seconds (532 seconds improvement) to acquire locks and 78
 seconds (96 seconds improvement) to release locks. The LockMgr takes
235
 seconds (127 seconds improvement) to acquire locks and 22 seconds (4
 seconds improvement) to release locks.

While I don't see any particular penalty to increasing
NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very
significant penalty (increasing PGPROC size as well as the work needed
during LockReleaseAll, which is executed at every transaction end).
I think 128 lock partitions is probably verging on the ridiculous
... particularly if your benchmark only involves touching half a dozen
tables.  I'd be more interested in comparisons between 4 and 16 lock
partitions.  Also, please vary the two settings independently rather
than confusing the issue by changing them both at once.

 With the improvements in the various locking times, one might expect
an
 improvement in the overall benchmark result. However, a 16 partition
run
 produces a result of 198.74 TPS and a 128 partition run produces a
 result of 203.24 TPS.

 Part of the time saved from BufMapping and LockMgr partitions is
 absorbed into the WALInsertLock lock. For a 16 partition run, the
total
 time to lock/release the WALInsertLock lock is 5845 seconds. For 128
 partitions, the WALInsertLock lock takes 6172 seconds, an increase of
 327 seconds. Perhaps we have our WAL configured incorrectly?

I fear this throws your entire measurement procedure into question.  For
a fixed workload the number of acquisitions of WALInsertLock ought to be
fixed, so you shouldn't see any more contention for WALInsertLock if the
transaction rate didn't change materially.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Simplifying standby mode

2006-09-12 Thread Simon Riggs
On Wed, 2006-09-06 at 12:01 -0400, Bruce Momjian wrote:
 Simon Riggs wrote:
  1. Notes on restartable recovery

Previously submitted

  2. Notes on standby functionality
  3. discussion on rolling your own record-level polling using
  pg_xlogfile_name_offset()

Given below, but not in SGML yet. Looking for general pointers/feedback
before I drop those angle-brackets in place.


Warm Standby Servers for High Availability 
==

Overview


Continuous Archiving can also be used to create a High Availability (HA)
cluster configuration with one or more Standby Servers ready to take
over operations in the case that the Primary Server fails. This
capability is more widely known as Warm Standby Log Shipping.

The Primary and Standby Server work together to provide this capability,
though the servers are only loosely coupled. The Primary Server operates
in Continuous Archiving mode, while the Standby Server operates in a
continuous Recovery mode, reading the WAL files from the Primary. No
changes to the database tables are required to enable this capability,
so it offers a low administration overhead in comparison with other
replication approaches. This configuration also has a very low
performance impact on the Primary server.

Directly moving WAL or log records from one database server to another
is typically described as Log Shipping. PostgreSQL implements file-based
Log Shipping, meaning WAL records are batched one file at a time. WAL
files can be shipped easily and cheaply over any distance, whether it be
to an adjacent system, another system on the same site or another system
on the far side of the globe. The bandwidth required for this technique
varies according to the transaction rate of the Primary Server.
Record-based Log Shipping is also possible with custom-developed
procedures, discussed in a later section. Future developments are likely
to include options for synchronous and/or integrated record-based log
shipping.

It should be noted that the log shipping is asynchronous, i.e. the WAL
records are shipped after transaction commit. As a result there can be a
small window of data loss, should the Primary Server suffer a
catastrophic failure. The window of data loss is minimised by the use of
the archive_timeout parameter, which can be set as low as a few seconds
if required. A very low setting can increase the bandwidth requirements
for file shipping.

The Standby server is not available for access, since it is continually
performing recovery processing. Recovery performance is sufficiently
good that the Standby will typically be only minutes away from full
availability once it has been activated. As a result, we refer to this
capability as a Warm Standby configuration that offers High
Availability. Restoring a server from an archived base backup and
rollforward can take considerably longer and so that technique only
really offers a solution for Disaster Recovery, not HA.

Other mechanisms for High Availability replication are available, both
commercially and as open-source software.  

In general, log shipping between servers running different release
levels will not be possible. However, it may be possible for servers
running different minor release levels e.g. 8.2.1 and 8.2.2 to
inter-operate successfully. No formal support for that is offered and
there may be minor releases where that is not possible, so it is unwise
to rely on that capability.

Planning


On the Standby server all tablespaces and paths will refer to similarly
named mount points, so it is important to create the Primary and Standby
servers so that they are as similar as possible, at least from the
perspective of the database server. Furthermore, any CREATE TABLESPACE
commands will be passed across as-is, so any new mount points must be
created on both servers before they are used on the Primary. Hardware
need not be the same, but experience shows that maintaining two
identical systems is easier than maintaining two dissimilar ones over
the whole lifetime of the application and system.

There is no special mode required to enable a Standby server. The
operations that occur on both Primary and Standby servers are entirely
normal continuous archiving and recovery tasks. The primary point of
contact between the two database servers is the archive of WAL files
that both share: Primary writing to the archive, Standby reading from
the archive. Care must be taken to ensure that WAL archives for separate
servers do not become mixed together or confused.

The magic that makes the two loosely coupled servers work together is
simply a restore_command that waits for the next WAL file to be archived
from the Primary. The restore_command is specified in the recovery.conf
file on the Standby Server. Normal recovery processing would request a
file from the WAL archive, causing an error if the file was unavailable.
For Standby processing it is normal for the next file to be unavailable,
so we 

Re: [HACKERS] Simplifying standby mode

2006-09-12 Thread Andrew Dunstan

Simon Riggs wrote:


In general, log shipping between servers running different release
levels will not be possible. However, it may be possible for servers
running different minor release levels e.g. 8.2.1 and 8.2.2 to
inter-operate successfully. No formal support for that is offered and
there may be minor releases where that is not possible, so it is unwise
to rely on that capability.

  


My memory is lousy at the best of times, but when have we had a minor 
release that would have broken this due to changed format? OTOH, the 
Primary and Backup servers need the same config settings (e.g. 
--enable-integer-datetimes), architecture, compiler, etc, do they not? 
Probably working from an identical set of binaries would be ideal.


cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Simplifying standby mode

2006-09-12 Thread Simon Riggs
On Tue, 2006-09-12 at 13:25 -0400, Andrew Dunstan wrote:
 Simon Riggs wrote:
 
  In general, log shipping between servers running different release
  levels will not be possible. However, it may be possible for servers
  running different minor release levels e.g. 8.2.1 and 8.2.2 to
  inter-operate successfully. No formal support for that is offered and
  there may be minor releases where that is not possible, so it is unwise
  to rely on that capability.

 My memory is lousy at the best of times, but when have we had a minor 
 release that would have broken this due to changed format? OTOH, the 
 Primary and Backup servers need the same config settings (e.g. 
 --enable-integer-datetimes), architecture, compiler, etc, do they not? 
 Probably working from an identical set of binaries would be ideal.

Not often, which is why I mention the possibility of having
interoperating minor release levels at all. If it was common, I'd just
put a blanket warning on doing that.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Release notes

2006-09-12 Thread Bruce Momjian
I again will not be able to complete the release notes today as
promised.  My next target date is Monday, August 18.  Sorry.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Release notes

2006-09-12 Thread Andrew Dunstan

Bruce Momjian wrote:

I again will not be able to complete the release notes today as
promised.  My next target date is Monday, August 18.  Sorry.

  


Will that be in a few years, or are you traveling backwards in time? ;-)

cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Release notes

2006-09-12 Thread Bruce Momjian
Andrew Dunstan wrote:
 Bruce Momjian wrote:
  I again will not be able to complete the release notes today as
  promised.  My next target date is Monday, August 18.  Sorry.
 

 
 Will that be in a few years, or are you traveling backwards in time? ;-)

Sorry, September 18.  I will probably be done before then, but it seems
best to set a date I know I will hit.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Release notes

2006-09-12 Thread Michael Fuhr
On Tue, Sep 12, 2006 at 02:31:22PM -0400, Bruce Momjian wrote:
 I again will not be able to complete the release notes today as
 promised.  My next target date is Monday, August 18.  Sorry.

The next Monday, August 18, is in 2008.  Surely that'll be
enough time ;-)

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Release notes

2006-09-12 Thread Dave Page
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Michael Fuhr
 Sent: 12 September 2006 19:57
 To: Bruce Momjian
 Cc: PostgreSQL-development
 Subject: Re: [HACKERS] Release notes
 
 On Tue, Sep 12, 2006 at 02:31:22PM -0400, Bruce Momjian wrote:
  I again will not be able to complete the release notes today as
  promised.  My next target date is Monday, August 18.  Sorry.
 
 The next Monday, August 18, is in 2008.  Surely that'll be
 enough time ;-)

Someone will have to speak to Denis about getting Bruce more community
time :-)

Regards, Dave.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Release notes

2006-09-12 Thread Bruce Momjian
Dave Page wrote:
  
 
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Michael Fuhr
  Sent: 12 September 2006 19:57
  To: Bruce Momjian
  Cc: PostgreSQL-development
  Subject: Re: [HACKERS] Release notes
  
  On Tue, Sep 12, 2006 at 02:31:22PM -0400, Bruce Momjian wrote:
   I again will not be able to complete the release notes today as
   promised.  My next target date is Monday, August 18.  Sorry.
  
  The next Monday, August 18, is in 2008.  Surely that'll be
  enough time ;-)
 
 Someone will have to speak to Denis about getting Bruce more community
 time :-)

It is more family activity that is causing my delays.  I was hoping to
carve out last weekend to work on it, but I couldn't.  I wish I could
blame Denis.  ;-)

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Simplifying standby mode

2006-09-12 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes:

 My memory is lousy at the best of times, but when have we had a minor 
 release that would have broken this due to changed format? OTOH, the 
 Primary and Backup servers need the same config settings (e.g. 
 --enable-integer-datetimes), architecture, compiler, etc, do they not? 
 Probably working from an identical set of binaries would be ideal.

 Not often, which is why I mention the possibility of having
 interoperating minor release levels at all. If it was common, I'd just
 put a blanket warning on doing that.

I don't know that it's happened in the past but I wouldn't be surprised.

Consider that the bug being fixed in the point release may well be a bug in
WAL log formatting. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Simplifying standby mode

2006-09-12 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Simon Riggs [EMAIL PROTECTED] writes:
 My memory is lousy at the best of times, but when have we had a minor 
 release that would have broken this due to changed format?

 Not often, which is why I mention the possibility of having
 interoperating minor release levels at all. If it was common, I'd just
 put a blanket warning on doing that.

 I don't know that it's happened in the past but I wouldn't be surprised.
 Consider that the bug being fixed in the point release may well be a bug in
 WAL log formatting. 

This would be the exception, not the rule, and should not be documented
as if it were the rule.  It's not really different from telling people
to expect a forced initdb at a minor release: you are simply
misrepresenting the project's policy.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Release notes

2006-09-12 Thread Joshua D. Drake

Bruce Momjian wrote:

Dave Page wrote:
 


-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Michael Fuhr

Sent: 12 September 2006 19:57
To: Bruce Momjian
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Release notes

On Tue, Sep 12, 2006 at 02:31:22PM -0400, Bruce Momjian wrote:

I again will not be able to complete the release notes today as
promised.  My next target date is Monday, August 18.  Sorry.

The next Monday, August 18, is in 2008.  Surely that'll be
enough time ;-)

Someone will have to speak to Denis about getting Bruce more community
time :-)


It is more family activity that is causing my delays.  I was hoping to
carve out last weekend to work on it, but I couldn't.  I wish I could
blame Denis.  ;-)


Bah!! who needs family ;)

--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] UUID datatype progress

2006-09-12 Thread Gevik Babakhani
Folks,

During past days I have been coding the UUID datatype which has been fun
and great learning curve for me. Here is the current progress:

- basic i/o functions are done.
- various records for pg_class,pg_operator,pg_amop etc, are added
- the additions pass the regression check :)
- btree and hash indexes go as expected (as far as I could test)
- currently 3 inputs and 1 general output are supported 
  (as suggested by Tom)

remaining uuid todo list:
- binary receive and send functions are yet in development.
- Additional testing, testing and again testing.
- cast and convert functions have to be done
- develop some kind of a new_guid() function (if required)
- develop a SERIAL like type (has to be discussed yet)

If everything goes okay, I should be able to submit a beta patch for
reviewing during next two weeks (and get shot at :))

Regards,
Gevik

 








---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Simplifying standby mode

2006-09-12 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 This would be the exception, not the rule, and should not be documented
 as if it were the rule.  It's not really different from telling people
 to expect a forced initdb at a minor release: you are simply
 misrepresenting the project's policy.

Well it's never been a factor before so I'm not sure there is a policy. Is
there now a policy that WAL files like database formats are as far as possible
not going to be changed in minor versions?

This means if there's a bug fix that affects WAL records the new point release
will generally have to be patched to recognise the broken WAL records and
process them correctly rather than simply generate corrected records. That
could be quite a burden.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Simplifying standby mode

2006-09-12 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Well it's never been a factor before so I'm not sure there is a
 policy. Is there now a policy that WAL files like database formats are
 as far as possible not going to be changed in minor versions?

 This means if there's a bug fix that affects WAL records the new point
 release will generally have to be patched to recognise the broken WAL
 records and process them correctly rather than simply generate
 corrected records. That could be quite a burden.

Let's see, so if we needed a bug fix that forced a tuple header layout
change or datatype representation change or page header change, your
position would be what exactly?

The project policy has always been that we don't change on-disk formats
in minor releases.  I'm not entirely clear why you are so keen on
carving out an exception for WAL data.

While I can imagine bugs severe enough to make us violate that policy,
our track record of not having to is pretty good.  And I don't see any
reason at all to suppose that such a bug would be more likely to affect
WAL (and only WAL) than any other part of our on-disk structures.

But having said all that, I'm not sure why we are arguing about it in
this context.  There was an upthread mention that we ought to recommend
using identical executables on master and slave PITR systems, and I
think that's a pretty good recommendation in any case, because of the
variety of ways in which you could screw yourself through configuration
differences.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] dump / restore functionality

2006-09-12 Thread Naz Gassiep



None, but feel free to start coding for 8.3.My coding skills are still nascent, 
but I shall do my best.

My coding skills are still pretty nascent, but I shall do my best.


That seems like a rather spectacular overstatement of the likely
benefits, not to mention a misdescription of what was discussed.
  
Once again I get pulled over by the semantics police :) Yes, you are 
right, that's what was discussed, and that is the functionality I am 
hoping for, as it would allow scripting the merging of a schema from one 
database with the table data from another.


Did this make it into the to-do list for 8.3 ?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] dump / restore functionality

2006-09-12 Thread Alvaro Herrera
Naz Gassiep wrote:
 
 None, but feel free to start coding for 8.3.My coding skills are still 
 nascent, but I shall do my best.
 My coding skills are still pretty nascent, but I shall do my best.
 
 That seems like a rather spectacular overstatement of the likely
 benefits, not to mention a misdescription of what was discussed.
   
 Once again I get pulled over by the semantics police :) Yes, you are 
 right, that's what was discussed, and that is the functionality I am 
 hoping for, as it would allow scripting the merging of a schema from one 
 database with the table data from another.
 
 Did this make it into the to-do list for 8.3 ?

Don't worry about the to-do list too much.  If you care about it, post a
patch; if you keep a link to the archives pointing at this discussion,
you can later bang us over our heads if we reject the patch.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] dump / restore functionality

2006-09-12 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Naz Gassiep wrote:
 Did this make it into the to-do list for 8.3 ?

 Don't worry about the to-do list too much.

In particular, if you're imagining that being in the TODO list will
in itself cause anyone to work on it, you're much mistaken about this
community operates.  Scratching your own itch is the general rule.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] UUID datatype progress

2006-09-12 Thread Joshua D. Drake




If everything goes okay, I should be able to submit a beta patch for
reviewing during next two weeks (and get shot at :))


/me starts loading...



Regards,
Gevik

 









---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-12 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Could we see the actual EXPLAIN ANALYZE results for the slow plan?

 http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt

Well, indeed it seems that the hash join is just an innocent bystander:
the bulk of the runtime (all but about 120 sec in fact) is spent here:

 -  Nested Loop  (cost=13.65..1719683.85 rows=12000672 
width=49) (actual time=60.325..24923860.713 rows=11897899 loops=1)
   -  Merge Join  (cost=0.00..10248.66 rows=2 
width=41) (actual time=16.654..2578.060 rows=19837 loops=1)
   ...
   -  Bitmap Heap Scan on lineitem  (cost=13.65..77.16 
rows=665 width=16) (actual time=13.492..1254.535 rows=600 loops=19837)
 Recheck Cond: (lineitem.l_suppkey = 
supplier.s_suppkey)
 -  Bitmap Index Scan on i_l_suppkey  
(cost=0.00..13.65 rows=665 width=0) (actual time=10.662..10.662 rows=600 
loops=19837)
   Index Cond: (lineitem.l_suppkey = 
supplier.s_suppkey)

I suppose that the profile result you showed was taken during the
startup transient where it was computing the hashtables that this loop's
results are joined to ... but that's not where the problem is.  The
problem is repeating that bitmap scan on lineitem for nearly 2
different l_suppkeys.

Apparently we've made the planner a bit too optimistic about the savings
that can be expected from repeated indexscans occurring on the inside of
a join.  The other plan uses a different join order and doesn't try to
join lineitem until it's got orders.o_orderkey, whereupon it does a
mergejoin against an indexscan on lineitem:

 -  Index Scan using i_l_orderkey on lineitem 
(cost=0.00..2715943.34 rows=60003360 width=16) (actual time=32.868..123668.380 
rows=59991868 loops=1)

The runtimes for the remainders of the plans are roughly comparable, so
it's the cost of joining lineitem that is hurting here.

Is lineitem sorted (or nearly sorted) by l_orderkey?  Part of the
problem could be overestimating the cost of this indexscan.

What are the physical sizes of lineitem and its indexes, and how do
those compare to your RAM?  What are you using for planner settings
(particularly effective_cache_size)?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pgbench is badly broken since July

2006-09-12 Thread Tatsuo Ishii
 I have just realized that the recent patches in pgbench have altered its
 behavior in a way that destroys reproducibility of results --- I'm
 seeing reported TPS numbers about twice what they were before that.
 I'd love to say we did something in the past month that made the backend
 2X faster, but sadly, tain't so :-(
 
 I've only done limited investigation but what seems to be happening
 is that the -s scale factor is no longer being applied to adjust the
 range of the random variables, leading to only part of the database
 being touched.  For example, with -s = 10, the older code updates all 10
 branches rows but CVS HEAD only seems to touch bid = 1.

As far as I concern the bug is there, but in a different way what you
said. It seems -s works. Problem is, pgbench does not get the
default scaling factor (that is same as the number of rows in branches
table), instead alway uses the default scaling factor 1. I have fixed
the problem in CVS HEAD.

 I see that the code still thinks it is scaling the numbers, but the
 mechanism for doing that has changed drastically, and I suspect that's
 the problem.  I assume there's some garden-variety bug in there.
 
 At a more fundamental level, the recent patches seem to mean that
 pg_dump is doing noticeably more work to issue its queries than it was
 before, and I wonder whether this isn't going to destroy cross-version
 reproducibility in its own way.  Have you done any timing studies to
 verify that the new and old code have comparable performance?  I don't
 mind enhancing pgbench's functionality for non-default tests, but I
 think it's important that the default case remain comparable over time.
 If there is a noticeable speed difference then I'd vote for going back
 to the klugy old code for adjusting the default script's values.

Let me do further performance testing.

 And at the nitpicking level, is :tps a good name for the variable that
 reflects the scaling factor?  It seems awfully easy to confuse that with
 the TPS numbers that pgbench reports.  Perhaps :scale or some such
 would be better.

I replaced all occurenes of tps to scale.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Release notes

2006-09-12 Thread Carlo Florendo

Bruce Momjian wrote:

Dave Page wrote:
  
 



-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Michael Fuhr

Sent: 12 September 2006 19:57
To: Bruce Momjian
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Release notes

On Tue, Sep 12, 2006 at 02:31:22PM -0400, Bruce Momjian wrote:
  

I again will not be able to complete the release notes today as
promised.  My next target date is Monday, August 18.  Sorry.


The next Monday, August 18, is in 2008.  Surely that'll be
enough time ;-)
  

Someone will have to speak to Denis about getting Bruce more community
time :-)



It is more family activity that is causing my delays.  I was hoping to
carve out last weekend to work on it, but I couldn't.  I wish I could
blame Denis.  ;-)

  
The family is more important than PostgreSQL.  Having fun with the 
family indeed gives energy to someone to work.  So, go family fun!


Best Regards,

Carlo Florendo
Astra Philippines Inc.
www.astra.ph


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Peter Eisentraut wrote:

 If you know of a more effective way to do that beyond the effective_cache_size
 parameter that we have now, let us know.

I don't know the better way and it is not my goal at all. I think about
more accurate cost estimation of nested loops join and subqueries.
Usual case in data request is a joining detail and some master tables
into a single relation. Often master tables are small and after some
nested loops iterations are well (perhaps wholly) cached. Cost
estimation of the tables access path don't care about the such caching
and cause overestimation. In some cases it can lead up to choosing not
the best plan.

Example from real life. The following request return count of national
calls from the call registration table.

select count(*) from conn.conn20060803 c
where
exists (select code from trunk_codes tc
where c.bnum = tc.code and c.bnum like tc.code || '%'
order by tc.code desc limit 1)

enable_seqscan = off:

Aggregate  (cost=103185258.68..103185258.69 rows=1 width=0) (actual
time=13385.674..13385.676 rows=1 loops=1)
  -  Seq Scan on conn20060803 c  (cost=1.00..103184640.52
rows=247264 width=0) (actual time=0.409..13307.254 rows=38739 loops=1)
Filter: (subplan)
SubPlan
  -  Limit  (cost=0.00..6.42 rows=1 width=10) (actual
time=0.020..0.020 rows=0 loops=494527)
-  Index Scan Backward using belg_mobile_pkey on
belg_mobile tc  (cost=0.00..6.42 rows=1 width=10) (actual
time=0.012..0.012 rows=0 loops=494527)
  Index Cond: (($0)::text = (code)::text)
  Filter: (($0)::text ~~ ((code)::text ||
'%'::text))
Total runtime: 13385.808 ms

enable_seqscan =on:

Aggregate  (cost=1101623.47..1101623.48 rows=1 width=0) (actual
time=63724.508..63724.509 rows=1 loops=1)
  -  Seq Scan on conn20060803 c  (cost=0.00..1101005.30 rows=247264
width=0) (actual time=2.244..63640.413 rows=38739 loops=1)
Filter: (subplan)
SubPlan
  -  Limit  (cost=2.20..2.20 rows=1 width=10) (actual
time=0.121..0.121 rows=0 loops=494527)
-  Sort  (cost=2.20..2.20 rows=1 width=10) (actual
time=0.114..0.114 rows=0 loops=494527)
  Sort Key: code
  -  Seq Scan on belg_mobile tc  (cost=0.00..2.19
rows=1 width=10) (actual time=0.096..0.099 rows=0 loops=494527)
Filter: ((($0)::text = (code)::text) AND
(($0)::text ~~ ((code)::text || '%'::text)))
Total runtime: 63724.630 ms


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Tom Lane
Say42 [EMAIL PROTECTED] writes:
 Usual case in data request is a joining detail and some master tables
 into a single relation.

Optimizing on the basis of only one example is seldom a good idea...
and I think you are falling into that trap by supposing that there
is a usual case.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Say42 wrote:

 select count(*) from conn.conn20060803 c
 where
 exists (select code from belg_mobile tc
...

Correction: replace 'trunk_codes' with 'belg_mobile'.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-12 Thread Jim C. Nasby
On Fri, Sep 01, 2006 at 03:28:36PM -0400, Stephen Frost wrote:
 Overall, I really think 8.2 is going to be an excellent release.  I wish
 autovacuum could have been enabled by default and I'd just like to ask,
 now and I'll try to remember again once 8.2 is out, please let's turn it
 on by default for 8.3 (and early on so we get some good testing of it).

Can someone put this on the TODO, just so we (hopefully) don't forget
about it?
-- 
Jim C. Nasby, Database Architect   [EMAIL PROTECTED]
512.569.9461 (cell) http://jim.nasby.net

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Tom Lane wrote:
 Optimizing on the basis of only one example is seldom a good idea...
 and I think you are falling into that trap by supposing that there
 is a usual case.

Perhaps I am wrong but I assume normalization is a usual case, small
master (parent) tables are not very rare also.
Yes, my example is unusual but it is _real_ and demonstrate PG
optimizer inaccuracy. Why don't we make PG optimizer more close to
reality if we can? Is it so needless and I make a mountain out of a
molehill?


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org