Re: [HACKERS] Shared row locking

2004-12-16 Thread Bruce Momjian
Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Using a B-tree
> 
> > At transaction end, nothing special happens (tuples are not unlocked
> > explicitly).
> 
> I don't think that works, because there is no guarantee that an entry
> will get cleaned out before the XID counter wraps around.  Worst case,
> you might think that a tuple is locked when the XID is left over from
> the previous cycle.  (Possibly this could be avoided by cleaning out old
> XIDs in this table whenever we truncate pg_clog, but that seems a tad
> messy.)  I'm also a bit concerned about how we avoid table bloat if
> there's no proactive cleanup at transaction end.
> 
> I think I like the pg_clog-modeled structure a bit better.  However it
> could be objected that that puts a hard limit of 4G share-locked tuples
> at any one time.
> 
> In the clog-modeled idea, it wasn't real clear how you decide whether to
> assign a new counter value to a previously locked row, or reuse its
> previous counter.  You must *not* assign a new value when the existing
> entry still has bits set, but you probably do want to be aggressive
> about assigning new values when you can; else it gets tough to be sure
> that the log can be truncated in a reasonable time.

I assume you check and if all the bits are zero, you don't reuse it and
get a new counter.  In fact you shouldn't reuse it in case the log is
being truncated while you are looking.  :-)

> ISTM that your description is conflating several orthogonal issues:
> how do we identify entries in this data structure (by CTID, or a shared
> counter that increments each time a new lock is acquired); how do we
> index the data structure (btree or linear array); and what is stored in
> each entry (array of XIDs, or bitmap indexed by BackendId).  Not all of
> the eight combinations work, but we do have more alternatives than the
> two offered, even without coming up with any new ideas ;-)

True.  The only advantage to a bitmap vs. just a counter of locked
backends is that you can clean out your own backend bits from the table
without having to record them in your memory.  However, because
recording your own counters in local memory doesn't require fixed shared
memory we might be better just recording the shared lock indexes in your
local backend memory and just use an int4 counter in the pg_clog-like
file that we can decrement on backend commit.  However I am unclear that
we can guarantee an exiting backend will do that.  Certainly it is
cleared on server start.

> > Note that to check whether a transaction is running we need to lock
> > SInvalLock.  To minimize the time we hold it, we save the BackendId so
> > we don't have to scan the whole shmInvalBuffer->procState array, only
> > the item that we need to look at.  Another possibility would be to use
> > stock TransactionIdIsInProgress and save the extra 4 bytes of storage.
> 
> I'm a bit worried about deadlocks and race conditions associated with
> the conflict between locking a page of this data structure and locking
> SInvalLock.
> 
> > At server restart, the btree is created empty (or just deleted).  There
> > is one btree per database.
> 
> One per cluster you meant, right?  (Else we can't do locking of rows in
> shared tables.)

He meant one per database, I think.  I suppose we would need another one
for global tables or disallow shared locking of them.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Shared row locking

2004-12-16 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> You mean all empty/zero rows can be removed?  Can we guarantee that on
> commit we can clean up the bitmap?  If not the idea doesn't work.

For whatever data structure we use, we may reset the structure to empty
during backend-crash recovery.  So your objection boils down to "what if
a backend exits normally but forgets to clean up its locks?"  Assuming
that doesn't happen isn't any worse than assuming a backend will clean
up its shared memory state on non-crash exit, so I don't think it's a
serious concern.

That brings another thought: really what this is all about is working
around the fact that the standard lock manager can only cope with a
finite number of coexisting locks, because it's working in a fixed-size
shared memory arena.  Maybe we should instead think about ways to allow
the existing lock table to spill to disk when it gets too big.  That
would eliminate max_locks_per_transaction as a source of hard failures,
which would be a nice benefit.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Shared row locking

2004-12-16 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Using a B-tree

> At transaction end, nothing special happens (tuples are not unlocked
> explicitly).

I don't think that works, because there is no guarantee that an entry
will get cleaned out before the XID counter wraps around.  Worst case,
you might think that a tuple is locked when the XID is left over from
the previous cycle.  (Possibly this could be avoided by cleaning out old
XIDs in this table whenever we truncate pg_clog, but that seems a tad
messy.)  I'm also a bit concerned about how we avoid table bloat if
there's no proactive cleanup at transaction end.

I think I like the pg_clog-modeled structure a bit better.  However it
could be objected that that puts a hard limit of 4G share-locked tuples
at any one time.

In the clog-modeled idea, it wasn't real clear how you decide whether to
assign a new counter value to a previously locked row, or reuse its
previous counter.  You must *not* assign a new value when the existing
entry still has bits set, but you probably do want to be aggressive
about assigning new values when you can; else it gets tough to be sure
that the log can be truncated in a reasonable time.

ISTM that your description is conflating several orthogonal issues:
how do we identify entries in this data structure (by CTID, or a shared
counter that increments each time a new lock is acquired); how do we
index the data structure (btree or linear array); and what is stored in
each entry (array of XIDs, or bitmap indexed by BackendId).  Not all of
the eight combinations work, but we do have more alternatives than the
two offered, even without coming up with any new ideas ;-)

> Note that to check whether a transaction is running we need to lock
> SInvalLock.  To minimize the time we hold it, we save the BackendId so
> we don't have to scan the whole shmInvalBuffer->procState array, only
> the item that we need to look at.  Another possibility would be to use
> stock TransactionIdIsInProgress and save the extra 4 bytes of storage.

I'm a bit worried about deadlocks and race conditions associated with
the conflict between locking a page of this data structure and locking
SInvalLock.

> At server restart, the btree is created empty (or just deleted).  There
> is one btree per database.

One per cluster you meant, right?  (Else we can't do locking of rows in
shared tables.)

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [INTERFACES] PL/Python: How do I use result methods?

2004-12-16 Thread Tom Lane
I wrote:
> Michael Fuhr <[EMAIL PROTECTED]> writes:
>> Any comments on this?  The 8.0.0rc1 PL/Python documentation,
>> Section 39.3 "Database Access", still mentions the nrows and
>> status methods, but they don't work.  Here's Oliver's original
>> message and my followup:

>> http://archives.postgresql.org/pgsql-interfaces/2004-10/msg00019.php
>> http://archives.postgresql.org/pgsql-interfaces/2004-10/msg00020.php

> It looks like someone #ifdef'd out those sections after observing that
> the PLy_result_methods table isn't used anyplace.  Perhaps the place
> where it should have been used got lost in some earlier patch?

> Just out of curiosity, what sort of patch would it take to enable these
> functions?  If it's at all nontrivial I'd vote to hold over to 8.1,
> but if it's a line or two of code that got lost at some point, it would
> seem like a reasonable bug fix ...

Comparing the result and plan method types made it pretty obvious how
those methods are supposed to be hooked up, and it was indeed a
one-liner omission in the original source code.  So I've fixed it.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Shared row locking

2004-12-16 Thread Christopher Kings-Lynne
The SQL spec does not say anything on this respect (that I can find).
It only talks of "FOR UPDATE" and "FOR READ ONLY".  However, because the
FK code uses SPI to do the locking, we definitely have to expose the
funcionality through SQL.  So I think we need a new clause, which I
propose to be "FOR SHARE".
MySQL uses LOCK IN SHARE MODE:
http://dev.mysql.com/doc/mysql/en/InnoDB_locking_reads.html
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Shared row locking

2004-12-16 Thread Bruce Momjian
Alvaro Herrera wrote:
> The btree idea:
> - does not need crash recovery.  Maybe we could use a stripped down
>   version of nbtree.  This could cause a maintanibility nightmare.

Are you saying the btree is an index with no heap?  If so, what about
the xid's?  Are they just in the btree?

How does the btree get cleaned up over time?

> The bitmap idea:
> - seems to have lower overhead
> 
> - can use the same lazy cleanup mechanism exposed for the btree idea (in
>   which case we don't need the list in local memory).

You mean all empty/zero rows can be removed?  Can we guarantee that on
commit we can clean up the bitmap?  If not the idea doesn't work.

> - What can happen in presence of large max_connections settings?  Is
>   this a real problem?

I thought about that.  50 backends is 7 bytes, 1000 backends is 128
bytes.  For a large number of backends you could just allow X concurrent
locks and use space X*4 bytes.

I think the basic issue is that the btree can be of variable length
while the bitmap has to be of a fixed length.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[HACKERS] Shared row locking

2004-12-16 Thread Alvaro Herrera
Hi,

I've been thinking on how to do shared row locking.  There are some very
preliminar ideas on this issue.  Please comment; particularly if any
part of it sounds unworkable or too incomplete.

There are several problems to be solved here: the grammar, the internal
SelectStmt representation, how to store and share the info between
backends, how to clean up at transaction end, and how to clean up at
backend crash.

The Grammar
===

The SQL spec does not say anything on this respect (that I can find).
It only talks of "FOR UPDATE" and "FOR READ ONLY".  However, because the
FK code uses SPI to do the locking, we definitely have to expose the
funcionality through SQL.  So I think we need a new clause, which I
propose to be "FOR SHARE".


The Parser and SelectStmt
=

The parser uses for_update_clause and opt_for_update_clause
nonterminals.  I assume it's best to change them to (new)
locking_clause, which can in turn be for_update_clause or (new)
for_share_clause.

SelectStmt currently has a forUpdate field (a List to the to-be-locked
tables, or an empty list meaning all of them).  We could simply add
another list, say forShare, or use a common list and a flag saying that
it's one or the other.  I prefer adding a new list.  (Same with the
Query node.)


How to Store the Info
=

This is the really interesting part.  I have two ideas, one mine (btree)
and other Bruce's (bitmap).

Using a B-tree
--
When a backend wants to lock a tuple, it set a bit in its infomask.
Then it inserts to a btree in a special tablespace, using
RelationId-BlockNumber-OffsetNumber as key, and BackendId-TransactionId
as value; actually, an array with a single element containing those two
values.

When a backend wants to lock a tuple that is already locked, it goes to
the btree and inserts itself into the array.  To do this, it inserts a
new index item (the enlarged array) and delete the previous one.  No
other backend may want to insert simultaneously (thus causing an ugly
race condition), because we hold an exclusive lock on the tuple's heap
page's buffer.

At transaction end, nothing special happens (tuples are not unlocked
explicitly).

When someone wants to know if the tuple is locked (to mark it FOR
UPDATE, or to delete it), it checks the infomask.  If it says it's
locked, it goes to check the btree.  If the array contains only
BackendId-TransactionId pairs that are no longer running, then the 
tuple is not locked and can be deleted/marked (and the btree can be
cleaned up).  Else, it will have to wait, using XactLockTableWait, for
the first transaction in the array that is still running.  We can be
sure that no one will try to share-lock the tuple while we check the
btree because we hold an exclusive lock on the tuple's heap page's
buffer.

Note that to check whether a transaction is running we need to lock
SInvalLock.  To minimize the time we hold it, we save the BackendId so
we don't have to scan the whole shmInvalBuffer->procState array, only
the item that we need to look at.  Another possibility would be to use
stock TransactionIdIsInProgress and save the extra 4 bytes of storage.

At server restart, the btree is created empty (or just deleted).  There
is one btree per database.


Using a Bitmap
--
First we create a counter called shared lock row counter.  Then we
create a file like pg_clog, and each counter slot has a bit for every
backend.  When we want to shared lock a row we increment the counter and
put that counter value on the row, and set our backend bit in the new
file.  We also store that counter value in our backend local memory.  On
commit we go through that local memory list and reset all our bits for
those counters.  When a row has all zeros, it can be recycled like we do
with pg_clog.


Problems and random comments


There is possibility of starvation, if somebody wants to lock
exclusively a tuple and shared lockers are coming all the time.  Not
sure how to solve this.

The wakeup mechanism is not discussed ... is there a special need for
something beyond what we can do with XactLockTable{Insert,Wait} ?

Thanks to tablespaces, it's very easy to create special Relations that
can be dealt with by standard buffer and storage manager, etc.

The btree idea:
- does not need crash recovery.  Maybe we could use a stripped down
  version of nbtree.  This could cause a maintanibility nightmare.

- can't hold more than 300 or so simultaneous lockers (because of value
  length, limited to 1/3 of a page).  I doubt this is a real problem.

- could have problems (excessive storage requirements) in the long run
  because of empty or almost-empty pages.

The bitmap idea:
- seems to have lower overhead

- can use the same lazy cleanup mechanism exposed for the btree idea (in
  which case we don't need the list in local memory).

- What can happen in presence of large max_connections settings?  Is
  this a real problem?

-- 
Alvar

Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Tom Lane
Darcy Buskermolen <[EMAIL PROTECTED]> writes:
> On December 16, 2004 12:37 pm, Tom Lane wrote:
> I think you are in need of the local equivalent to GNU ld's -E or
> --export-dynamic switch, ie, make sure that all global symbols within
> the backend will be available to dynamically loaded libraries.

> xild just gets envoked as a wrapper to gnu ld by the looks of it.

Hmm.  I see in Makefile.freebsd:

ifdef ELF_SYSTEM
export_dynamic = -export-dynamic
rpath = -R$(rpathdir)
shlib_symbolic = -Wl,-Bsymbolic -lc
endif

Perhaps ELF_SYSTEM isn't getting defined?  Or maybe it's being picky
about seeing --export-dynamic instead of -export-dynamic ?  Or you need
to spell it like "-Wl,-E" to get it past icc and into the linker?
You did not show the build log, but take a look at how the postgres
executable is getting linked to see if -export-dynamic is getting in
there or not.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Port report: NetBSD 2.0 mac68k

2004-12-16 Thread Bruce Momjian

I will apply the ASM changes that affect only NetBSD mac68k ELF.

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Rémi Zara wrote:
> Hi,
> 
> Here is a port report for NetBSD 2.0 mac68k, with sources of  
> postgresql8.0.0rc1.
> 
> Here is the configure line used :
> ./configure --prefix=/data/postgresql/pgsql-8.0.0rc1 --with-openssl  
> --with-python --with-perl --with-tcl --with-krb5 --with-pam
> 
> But some tweaking was necessary to make it work:
>   * krb5.h is in /usr/include/krb5 on netbsd (set via CPPFLAGS)
>   * krb5_encrypt is to be found in  -lkrb5 -ldes -lasn1 -lroken -lcrypto
> 
> --enable-thread-safety does not work because the thread safety test  
> fails (src/tools/thread/thread_test)
> configure:18831: ./conftest
> conftest in free(): error: freelist is destroyed.
> [1]   Abort trap (core dumped) ./conftest${ac_e...
> 
> 
> Then the tas code in src/backend/storage/lmgr/s_lock.c cannot be  
> compiled and linked on this system without modification:
> the '_' in front of the tas symbol should be removes, and '%' added in  
> front of register names. I've attached a diff that makes these  
> modifications only for NetBSD mac68k ELF.
> 
> 
> With these modifications, make and make install are OK !
> 
> template1=# SELECT version();
>   version
>  
> -
>   PostgreSQL 8.0.0rc1 on m68k-unknown-netbsdelf2.0, compiled by GCC gcc  
> (GCC) 3.3.3 (NetBSD nb3 20040520)
> (1 row)
> 
> in make check, two tests fail: float8 and misc.
> I've attached the regression.diffs file.
> 
> Regards,
> 
> R?mi Zara
> 
> --
> R?mi Zara
> http://www.remi-zara.net/
> 
> 

[ Attachment, skipping... ]

> 
> 
> 
> 

[ Attachment, skipping... ]

> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] LISTEN/NOTIFY improvements?

2004-12-16 Thread Alvaro Herrera
On Fri, Dec 17, 2004 at 09:56:05AM +1100, Gavin Sherry wrote:

> I've got a patch floating around that does this and also moves
> LISTEN/NOTIFY into the shared inval code 

Uh, what will happen with idle backends?

-- 
Alvaro Herrera (<[EMAIL PROTECTED]>)
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] LISTEN/NOTIFY improvements?

2004-12-16 Thread Gavin Sherry
On Thu, 16 Dec 2004, Aaron Hillegass wrote:

> I am a programmer who works on a couple of products that use PostgreSQL
> as their backend (http://www.nopali.com, http://www.iwanttops.com).
> Both my clients need to receive notifications when rows in the database
> change.  Thus far, we've been rolling our own because LISTEN/NOTIFY
> doesn't do what we need.
>
> What we need is on the TODO list:
> - Add optional textual message to NOTIFY: This would allow an
> informational message to be added to the notify message, perhaps
> indicating the row modified or other custom information.
>
> I want to help get this done for 8.1, but lack the deep understanding
> necessary.  Is there someone on this list that I can beg/bribe to make
> this happen?  I would be delighted to do stress-testing and
> benchmarking on the resulting code.
>
> (Incidentally, there are two related TODO items you might want to take
> care of while you are in there:
> - Allow NOTIFY in rules involving conditionals
> - Allow LISTEN/NOTIFY to store info in memory rather than tables?
> Currently LISTEN/NOTIFY information is stored in pg_listener. Storing
> such information in memory would improve performance.)
>
> Regardless, thank you for all the great work that you have been doing;
> PostgreSQL has made my life a better place.

I've got a patch floating around that does this and also moves
LISTEN/NOTIFY into the shared inval code -- which means its faster but
still suffers the problem of relying on a statically sized shared memory
buffer. The point is, I can get the patch into shape without too much
trouble and send it in for 8.1. The great thing is that the patch does not
break the FE/BE protocol because Tom had the foresight to allow optional
messages with NOTIFY events. You legend.

Thanks,

Gavin

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Darcy Buskermolen
On December 16, 2004 02:28 pm, Tom Lane wrote:
> Darcy Buskermolen <[EMAIL PROTECTED]> writes:
> > On December 16, 2004 12:37 pm, Tom Lane wrote:
> >> I think you are in need of the local equivalent to GNU ld's -E or
> >> --export-dynamic switch, ie, make sure that all global symbols within
> >> the backend will be available to dynamically loaded libraries.
> >
> > basicly it's done via a
> > CC=icc LD=xild  ./configure ...
> >
> > and for refrence find attached the icc -help output.
>
> That's the wrong list --- what about the linker's options?

xild just gets envoked as a wrapper to gnu ld by the looks of it.

But just incase:

>
>   regards, tom lane

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com
Usage: ld [options] file...
Options:
  -a KEYWORD  Shared library control for HP/UX compatibility
  -A ARCH, --architecture ARCH
  Set architecture
  -b TARGET, --format TARGET  Specify target for following input files
  -c FILE, --mri-script FILE  Read MRI format linker script
  -d, -dc, -dpForce common symbols to be defined
  -e ADDRESS, --entry ADDRESS Set start address
  -E, --export-dynamicExport all dynamic symbols
  -EB Link big-endian objects
  -EL Link little-endian objects
  -f SHLIB, --auxiliary SHLIB Auxiliary filter for shared object symbol table
  -F SHLIB, --filter SHLIBFilter for shared object symbol table
  -g  Ignored
  -G SIZE, --gpsize SIZE  Small data size (if no size, same as --shared)
  -h FILENAME, -soname FILENAME
  Set internal name of shared library
  -I PROGRAM, --dynamic-linker PROGRAM
  Set PROGRAM as the dynamic linker to use
  -l LIBNAME, --library LIBNAME
  Search for library LIBNAME
  -L DIRECTORY, --library-path DIRECTORY
  Add DIRECTORY to library search path
  -m EMULATIONSet emulation
  -M, --print-map Print map file on standard output
  -n, --nmagicDo not page align data
  -N, --omagicDo not page align data, do not make text readonly
  --no-omagic Page align data, make text readonly
  -o FILE, --output FILE  Set output file name
  -O  Optimize output file
  -Qy Ignored for SVR4 compatibility
  -q, --emit-relocs   Generate relocations in final output
  -r, -i, --relocatable   Generate relocatable output
  -R FILE, --just-symbols FILE
  Just link symbols (if directory, same as --rpath)
  -s, --strip-all Strip all symbols
  -S, --strip-debug   Strip debugging symbols
  --strip-discarded   Strip symbols in discarded sections
  --no-strip-discardedDo not strip symbols in discarded sections
  -t, --trace Trace file opens
  -T FILE, --script FILE  Read linker script
  -u SYMBOL, --undefined SYMBOL
  Start with undefined reference to SYMBOL
  --unique [=SECTION] Don't merge input [SECTION | orphan] sections
  -Ur Build global constructor/destructor tables
  -v, --version   Print version information
  -V  Print version and emulation information
  -x, --discard-all   Discard all local symbols
  -X, --discard-localsDiscard temporary local symbols (default)
  --discard-none  Don't discard any local symbols
  -y SYMBOL, --trace-symbol SYMBOL
  Trace mentions of SYMBOL
  -Y PATH Default search path for Solaris compatibility
  -(, --start-group   Start a group
  -), --end-group End a group
  --accept-unknown-input-arch Accept input files whose architecture cannot be 
determined
  --no-accept-unknown-input-arch
  Reject input files whose architecture is unknown
  -assert KEYWORD Ignored for SunOS compatibility
  -Bdynamic, -dy, -call_shared
  Link against shared libraries
  -Bstatic, -dn, -non_shared, -static
  Do not link against shared libraries
  -Bsymbolic  Bind global references locally
  --check-sectionsCheck section addresses for overlaps (default)
  --no-check-sections Do not check section addresses for overlaps
  --cref  Output cross reference table
  --defsym SYMBOL=EXPRESSION  Define a symbol
  --demangle [=STYLE] Demangle symbol names [using STYLE]
  --embedded-relocs   Generate embedded relocs
  -fini SYMBOLCall SYMBOL at unload-time
  --force-exe-suffix  Force generation of file with .exe suffix
  --gc-sections   Remove unused

Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Tom Lane
Darcy Buskermolen <[EMAIL PROTECTED]> writes:
> On December 16, 2004 12:37 pm, Tom Lane wrote:
>> I think you are in need of the local equivalent to GNU ld's -E or
>> --export-dynamic switch, ie, make sure that all global symbols within
>> the backend will be available to dynamically loaded libraries.
>> 
> basicly it's done via a
> CC=icc LD=xild  ./configure ...

> and for refrence find attached the icc -help output.

That's the wrong list --- what about the linker's options?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Darcy Buskermolen
On December 16, 2004 01:49 pm, Bruce Momjian wrote:
> Andrew Dunstan wrote:
> > Bruce Momjian wrote:
> > >Darcy Buskermolen wrote:
> > >>As per
> > >>http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16
> > >>%2018:46:18
> > >>
> > >>This combination of OS/compiler does not result in a working copy.
> > >>
> > >>Andrew and i have been digging into this for better than a week now,
> > >> and something just dosent look right.
> > >
> > >That is strange.  The problem line appears to be:
> > >
> > >   creating conversions ... FATAL:  could not load library
> > >   ... /postgresql/ascii_and_mic.so":dlopen
> > >   ... /postgresql/ascii_and_mic.so'failed.
> > >   ... /postgresql/ascii_and_mic.so:Undefined symbol "pg_mic2ascii")
> > >
> > >I wonder if running ldconfig would help.
> >
> > What would it do? This is failing at the "make check" stage, before it's
> > even installed (and buildfarm doesn't install to a standard location
> > anyway).
>
> Sometimes pre-existing libraries that are already installed override
> even the regression initdb on some platforms.  It is just a guess.

And in this case it's a "virgin", not having an installed copy of pg.

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] LISTEN/NOTIFY improvements?

2004-12-16 Thread Aaron Hillegass
I am a programmer who works on a couple of products that use PostgreSQL 
as their backend (http://www.nopali.com, http://www.iwanttops.com).  
Both my clients need to receive notifications when rows in the database 
change.  Thus far, we've been rolling our own because LISTEN/NOTIFY 
doesn't do what we need.

What we need is on the TODO list:
- Add optional textual message to NOTIFY: This would allow an 
informational message to be added to the notify message, perhaps 
indicating the row modified or other custom information.

I want to help get this done for 8.1, but lack the deep understanding 
necessary.  Is there someone on this list that I can beg/bribe to make 
this happen?  I would be delighted to do stress-testing and 
benchmarking on the resulting code.

(Incidentally, there are two related TODO items you might want to take 
care of while you are in there:
- Allow NOTIFY in rules involving conditionals
- Allow LISTEN/NOTIFY to store info in memory rather than tables?   
Currently LISTEN/NOTIFY information is stored in pg_listener. Storing 
such information in memory would improve performance.)

Regardless, thank you for all the great work that you have been doing;  
PostgreSQL has made my life a better place.

Sincerely,
Aaron Hillegass
Big Nerd Ranch, Inc.
404-210-5663

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] UNION ALL vs INHERITANCE

2004-12-16 Thread Tom Lane
Adi Alurkar <[EMAIL PROTECTED]> writes:
> Why does the append resulting from a inheritance take longer than one  
> resulting from  UNION ALL?

The index scan is where the time difference is:

>   ->  Index Scan using fftiallbgrgfid_1102715649 on  
> f_f_all_base  (cost=0.00..3.52 rows=1 width=51) (actual  
> time=3.871..244.356 rows=28 loops=1)
> Index Cond: (group_id = 78745)
> Filter: (all_tidx @@ '\'mmcach\''::tsquery)

>   ->  Index Scan using fftiallbgrgfid_1102715649 on  
> f_f_all_base  (cost=0.00..3.52 rows=1 width=51) (actual  
> time=3.714..79.996 rows=28 loops=1)
> Index Cond: (group_id = 78745)
> Filter: (all_tidx @@ '\'mmcach\''::tsquery)

One would have to suppose this is a caching effect, ie, the data is
already in RAM on the second try and doesn't have to be read from disk
again.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[HACKERS] Updates for beta/rc stamping

2004-12-16 Thread Bruce Momjian
FYI, Magnus just pointed out that src/include/pg_config.h.win32 had the
version string of 8.0devel rather than the correct 8.0.0rc1.

I have added a mention in tools/RELEASE_CHANGES that this file should be
updated during beta and rc as well as final release.  This file is used
by VC and BCC for non-configure-run win32-client-only builds.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Darcy Buskermolen
On December 16, 2004 12:37 pm, Tom Lane wrote:
> Darcy Buskermolen <[EMAIL PROTECTED]> writes:
> > As per
> > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2
> >018:46:18 This combination of OS/compiler does not result in a working
> > copy.
>
> The failure is
>   ascii_and_mic.so: Undefined symbol "pg_mic2ascii"
>
> I think you are in need of the local equivalent to GNU ld's -E or
> --export-dynamic switch, ie, make sure that all global symbols within
> the backend will be available to dynamically loaded libraries.
>
basicly it's done via a
CC=icc LD=xild  ./configure ...


and for refrence find attached the icc -help output.
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com
   Intel(R) C++ Compiler Help
   ==

usage: icc [options] file1 [file2 ...]

  where options represents zero or more compiler options

fileN is a C/C++ source (.C .c .cc .cpp .cxx .i), assembly (.s),
object (.o), static library (.a), or other linkable file


  Commonly used options may be placed in the icc.cfg file.

   Compiler Option List
   
Performance
---
-O1enable optimizations (DEFAULT)
-O2same as -O1
-O3enable -O2 plus more aggressive optimizations that may not improve
   performance for all programs
-O0disable optimizations
-O same as -O1
-Ob control inline expansion:
n=0  disables inlining
n=1  inline functions declared with __inline, and perform C++ inlining
n=2  inline any function, at the compiler's discretion (same as -Qip)
-falias   assume aliasing in program (DEFAULT)
-fno-aliasassume no aliasing in program
-ffnalias assume aliasing within functions (DEFAULT)
-fno-fnalias  assume no aliasing within functions, but assume aliasing across
  calls
-nolib_inline disable inline expansion of intrinsic functions
-mp   maintain floating point precision (disables some optimizations)
-mp1  improve floating-point precision (speed impact is less than -mp)
-fp   disable using EBP as general purpose register
-prec_div improve precision of floating-point divides (some speed impact)
-fp_port  round fp results at assignments & casts (some speed impact)
-pc32 set internal FPU precision to 24 bit significand
-pc64 set internal FPU precision to 53 bit significand
-pc80 set internal FPU precision to 64 bit significand (DEFAULT)
-rcd  rounding mode to enable fast float-to-int conversions
-tpp5  optimize for Pentium(R) processor
-tpp6  optimize for Pentium(R) Pro, Pentium(R) II and Pentium(R) III processors
-tpp7  optimize for Pentium(R) 4 processor (DEFAULT)
-mcpu=  optimize for a specific cpu
   pentium- optimize for Pentium(R) processor
   pentiumpro - optimize for Pentium(R) Pro, Pentium(R) II and Pentium(R)
III processors
   pentium4   - optimize for Pentium(R) 4 processor (DEFAULT)
-ax generate code specialized for processor extensions specified by
while also generating generic IA-32 code.   includes
   one or more of the following characters:
i  Pentium Pro and Pentium II processor instructions
M  MMX(TM) instructions
K  streaming SIMD extensions
W  Pentium(R) 4 New Instructions
-x  generate specialized code to run exclusively on processors
   supporting the extensions indicated by  as described above.
-march=  generate code excusively for a given 
   pentiumpro - Pentium(R) Pro and Pentium(R) II processor instructions
   pentiumii  - MMX(TM)instructions
   pentiumiii - streaming SIMD extensions
   pentium4   - Pentium(R) 4 New Instructions

Advanced Performance

Enable and specify the scope of Interprocedural (IP) Optimizations:
-ip enable single-file IP optimizations (within files)
-ipoenable multi-file IP optimizations (between files)
-ipo_c  generate a multi-file object file (ipo_out.o)
-ipo_S  generate a multi-file assembly file (ipo_out.s)

Modify the behavior of IP:
-ip_no_inliningdisable full and partial inlining (requires -ip or -ipo)
-ip_no_pinlining   disable partial inlining (requires -ip or -ipo)
-ipo_obj   force generation of real object files (requires -ipo)

Other Advanced Performance Options:
-unroll[n]  set maximum number of times to unroll loops.  Omit n to use
default heuristics.  Use n=0 to disable loop unroller.
-prof_dirspecify directory for profiling output files (*.dyn and *.dpi)
-prof_file   specify file name for profiling summary file
-prof_gen[x]instrument program for profiling; with the x qualifier, extra
information is gathered
-prof_use   enable use

Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Andrew Dunstan wrote:
>> What would it do? This is failing at the "make check" stage, before it's 
>> even installed (and buildfarm doesn't install to a standard location 
>> anyway).

> Sometimes pre-existing libraries that are already installed override
> even the regression initdb on some platforms.  It is just a guess.

The "undefined" symbol is in the main backend, not in any shared
library.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Andrew Dunstan

Tom Lane wrote:
Andrew Dunstan <[EMAIL PROTECTED]> writes:
 

To answer your question about how to choose it, you do it through the buildfarm member's config file - example (showing use of ccache) can be seen here:
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgbuildfarm/client-code/build-farm.conf?rev=1.3&content-type=text/x-cvsweb-markup
   

I see.  Would it be reasonable for the buildfarm members to provide
links to the config files they're using?  You'd have to do something
about separating out the passwords :-(
 

Not a link, but I could easily have them report the contents of the 
config hash, with the password removed. Feature request filed on pgfoundry.

cheers
andrew
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> To answer your question about how to choose it, you do it through the 
> buildfarm member's config file - example (showing use of ccache) can be seen 
> here:
> http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgbuildfarm/client-code/build-farm.conf?rev=1.3&content-type=text/x-cvsweb-markup

I see.  Would it be reasonable for the buildfarm members to provide
links to the config files they're using?  You'd have to do something
about separating out the passwords :-(

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Bruce Momjian
Andrew Dunstan wrote:
> 
> 
> Bruce Momjian wrote:
> 
> >Darcy Buskermolen wrote:
> >  
> >
> >>As per 
> >>http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18
> >>
> >>This combination of OS/compiler does not result in a working copy.
> >>
> >>Andrew and i have been digging into this for better than a week now, and 
> >>something just dosent look right.
> >>
> >>
> >
> >That is strange.  The problem line appears to be:
> > 
> > creating conversions ... FATAL:  could not load library
> > ... /postgresql/ascii_and_mic.so":dlopen
> > ... /postgresql/ascii_and_mic.so'failed.
> > ... /postgresql/ascii_and_mic.so:Undefined symbol "pg_mic2ascii")
> >
> >I wonder if running ldconfig would help.
> >  
> >
> 
> What would it do? This is failing at the "make check" stage, before it's 
> even installed (and buildfarm doesn't install to a standard location 
> anyway).

Sometimes pre-existing libraries that are already installed override
even the regression initdb on some platforms.  It is just a guess.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Port report: NetBSD 2.0 mac68k

2004-12-16 Thread Bruce Momjian

I am confused by the threading failure.  I don't see any free() call in
thread_test.c.   Would you go to the tools/thread directory and run the
program manually and use a debugger to see the failure line?  Is there
some threading flag NetBSD requires for compiles or linking?

---

Rémi Zara wrote:
> Hi,
> 
> Here is a port report for NetBSD 2.0 mac68k, with sources of  
> postgresql8.0.0rc1.
> 
> Here is the configure line used :
> ./configure --prefix=/data/postgresql/pgsql-8.0.0rc1 --with-openssl  
> --with-python --with-perl --with-tcl --with-krb5 --with-pam
> 
> But some tweaking was necessary to make it work:
>   * krb5.h is in /usr/include/krb5 on netbsd (set via CPPFLAGS)
>   * krb5_encrypt is to be found in  -lkrb5 -ldes -lasn1 -lroken -lcrypto
> 
> --enable-thread-safety does not work because the thread safety test  
> fails (src/tools/thread/thread_test)
> configure:18831: ./conftest
> conftest in free(): error: freelist is destroyed.
> [1]   Abort trap (core dumped) ./conftest${ac_e...
> 
> 
> Then the tas code in src/backend/storage/lmgr/s_lock.c cannot be  
> compiled and linked on this system without modification:
> the '_' in front of the tas symbol should be removes, and '%' added in  
> front of register names. I've attached a diff that makes these  
> modifications only for NetBSD mac68k ELF.
> 
> 
> With these modifications, make and make install are OK !
> 
> template1=# SELECT version();
>   version
>  
> -
>   PostgreSQL 8.0.0rc1 on m68k-unknown-netbsdelf2.0, compiled by GCC gcc  
> (GCC) 3.3.3 (NetBSD nb3 20040520)
> (1 row)
> 
> in make check, two tests fail: float8 and misc.
> I've attached the regression.diffs file.
> 
> Regards,
> 
> R?mi Zara
> 
> --
> R?mi Zara
> http://www.remi-zara.net/
> 
> 

[ Attachment, skipping... ]

> 
> 
> 
> 

[ Attachment, skipping... ]

> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Andrew Dunstan

Bruce Momjian wrote:
Darcy Buskermolen wrote:
 

As per 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18

This combination of OS/compiler does not result in a working copy.
Andrew and i have been digging into this for better than a week now, and 
something just dosent look right.
   

That is strange.  The problem line appears to be:

creating conversions ... FATAL:  could not load library
... /postgresql/ascii_and_mic.so":dlopen
... /postgresql/ascii_and_mic.so'failed.
... /postgresql/ascii_and_mic.so:Undefined symbol "pg_mic2ascii")
I wonder if running ldconfig would help.
 

What would it do? This is failing at the "make check" stage, before it's 
even installed (and buildfarm doesn't install to a standard location 
anyway).

cheers
andrew
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Andrew Dunstan

Tom Lane wrote:
Darcy Buskermolen <[EMAIL PROTECTED]> writes:
 

As per 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18
This combination of OS/compiler does not result in a working copy.
   

The failure is
ascii_and_mic.so: Undefined symbol "pg_mic2ascii"
I think you are in need of the local equivalent to GNU ld's -E or 
--export-dynamic switch, ie, make sure that all global symbols within
the backend will be available to dynamically loaded libraries.

BTW, how did you get configure to choose icc?  If you used a CC
environment variable, it seems like the buildfarm output is missing
important information by not reporting it.
 

That log shows:
 ccache icc -g -fpic -DPIC -I../../../src/interfaces/libpq 
-I../../../src/include   -c -o regress.o regress.c
 xild -x -shared -o regress.so regress.o
The status page also says this regarding the member:
 FreeBSD  6-CURRENT  Intel(tm) CC  7.1   Build 20040901Z  ia32
I can highlight this and more info if required - just tell me what you want to 
know. I already have on my list your request for version numbers for changed 
files.
It's a bit of a balancing act between providing too much info and not enough.
To answer your question about how to choose it, you do it through the buildfarm 
member's config file - example (showing use of ccache) can be seen here:
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgbuildfarm/client-code/build-farm.conf?rev=1.3&content-type=text/x-cvsweb-markup
cheers
andrew



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Tom Lane
Darcy Buskermolen <[EMAIL PROTECTED]> writes:
> As per 
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18
> This combination of OS/compiler does not result in a working copy.

The failure is
ascii_and_mic.so: Undefined symbol "pg_mic2ascii"

I think you are in need of the local equivalent to GNU ld's -E or 
--export-dynamic switch, ie, make sure that all global symbols within
the backend will be available to dynamically loaded libraries.

BTW, how did you get configure to choose icc?  If you used a CC
environment variable, it seems like the buildfarm output is missing
important information by not reporting it.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Bruce Momjian
Darcy Buskermolen wrote:
> As per 
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18
> 
> This combination of OS/compiler does not result in a working copy.
> 
> Andrew and i have been digging into this for better than a week now, and 
> something just dosent look right.

That is strange.  The problem line appears to be:

creating conversions ... FATAL:  could not load library
... /postgresql/ascii_and_mic.so":dlopen
... /postgresql/ascii_and_mic.so'failed.
... /postgresql/ascii_and_mic.so:Undefined symbol "pg_mic2ascii")

I wonder if running ldconfig would help.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Port report: Fedora Core 3 x86_64

2004-12-16 Thread Tom Lane
Bernd Helmle <[EMAIL PROTECTED]> writes:
> Peter Eisentraut <[EMAIL PROTECTED]> wrote:
>> Tom Lane wrote:
>>> "Andrew Dunstan" <[EMAIL PROTECTED]> writes:
 Instead of doing that, do:
 --with-includes=/usr/include/et

>>> This same workaround is in the RPMs.  I wonder if it would be
>>> worthwhile for configure to assume the above when --with-krb5 is
>>> mentioned.  I don't know how widespread this particular file layout
>>> is ...

>> I have never seen it before.  I've never had to add any paths to compile
>> with Kerberos support, at least since they moved it out
>> of /usr/kerberos.

> FYI, i had the same issue here on Mandrake Linux 10.0/10.1 (32Bit), 
> Community Edition and Official.

Actually, I'm wondering why we directly include com_err.h at all.  At
least in the version of  I have here, that file is included by
krb5.h; so both backend/libpq/auth.c and interfaces/libpq/fe-auth.c
compile just fine with #include  diked out.  I suspect that
our problems in this area all stem from trying to include something
we are not supposed to include to begin with.

It's possible that com_err.h was considered an exported file in old
krb5 releases but no longer is.  So perhaps what we should do is make
the C code read
#ifdef HAVE_COM_ERR_H
#include 
#endif
and adjust configure to generate that symbol, but not require the header
to be found.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7

2004-12-16 Thread Darcy Buskermolen
As per 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18

This combination of OS/compiler does not result in a working copy.

Andrew and i have been digging into this for better than a week now, and 
something just dosent look right.


-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] join selectivity

2004-12-16 Thread Greg Stark

"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes:

> Well at the moment PostGIS has a RESTRICT function that takes an expression
> of the formwhere column is a column consisting of
> geometries and constant is a bounding box. This is based upon histogram
> statistics and works well.

Are these functions that would be useful for GiST indexes in general? 

What's involved in pulling them into a system? I mean, for example, a database
using RTREE (or GiST I guess) boxes and the @ operator.

I didn't realize anyone really had any idea where to start with gathering
statistics or writing selectivity functions for geometric types. It's great
news to hear there's actually work in this area.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] join selectivity

2004-12-16 Thread Tom Lane
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes:
> ... But in the case of  
> , if we're estimating the number of rows to return then
> that becomes harder

I didn't say it was easy ;-).  The existing selectivity functions can't
do better than a rough guess in such cases, and I don't expect you can
either.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] join selectivity

2004-12-16 Thread Mark Cave-Ayland
Hi Tom, 

> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED] 
> Sent: 16 December 2004 17:56
> To: Mark Cave-Ayland
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
> [EMAIL PROTECTED]
> Subject: Re: [HACKERS] join selectivity
> 
> 
> "Mark Cave-Ayland" <[EMAIL PROTECTED]> writes:
> > OK I think I've misunderstood something more fundamental 
> than that; I 
> > understood from what you said that the RESTRICT clause is used to 
> > evaluate the cost of table1.geom && table2.geom against 
> table2.geom && 
> > table1.geom (i.e. it is used to help decide which one should be seq 
> > scanned and which should be index scanned in a nested loop 
> node). So 
> > is the trick here for a commutative operator to simply 
> return the same 
> > value for both cases, as other factors such as index size costs are 
> > considered elsewhere?
> 
> If the operator is commutative then the result should be too. 
>  Really you should not be thinking about costs at all when 
> coding a selectivity
> estimator: its charter is to estimate how many rows will 
> match the condition, not to estimate costs per se.
> 
> Note however that these aren't really the "same case", as 
> you'd be referencing two different columns with presumably 
> different statistics.

Well at the moment PostGIS has a RESTRICT function that takes an expression
of the formwhere column is a column consisting of
geometries and constant is a bounding box. This is based upon histogram
statistics and works well.

The surprise came when writing the JOIN function and finding that the
RESTRICT clause was being called. Now I understand that this is part of the
nested loop and not the JOIN so that helps. But in the case of  
, if we're estimating the number of rows to return then
that becomes harder - I'm thinking pick a rectangle half the area of the
statistical rectangle for the column and return the number of rows within
that area.

> You should probably read the existing selectivity estimators 
> in utils/adt/selfuncs.c.  There's a fair amount of 
> infrastructure code in that file that you could borrow.  
> (It's not currently exported because it tends to change from 
> version to version, but maybe we could think about making 
> some of the routines global.)

OK will try and find some inspiration within.


Many thanks,

Mark.


WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] integer datetimes

2004-12-16 Thread Andrew Dunstan

Tom Lane wrote:
 

I'm probably going to add the flag enabling it to the default
buildfarm setup.
   

This should be selected for some buildfarm members but not all, just
like other configuration options.
 

We're very democratic - every member gets to choose their own config ;-)
cheers
andrew
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] integer datetimes

2004-12-16 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Has any thought been given to making integer datetimes the default on 
> platforms that support it? Are there any performance implications?

I don't know that anyone's done any serious performance comparisons.
My guess is there wouldn't be a noticeable difference, but that's just
a guess.

The real reason it's not default is portability worries.  We do not
currently have any fundamental functionality that breaks if you don't
have a working int64 datatype, and I'd just as soon not start.

I'm not real thrilled with the idea of a platform-dependent default,
either.

> I'm probably going to add the flag enabling it to the default
> buildfarm setup.

This should be selected for some buildfarm members but not all, just
like other configuration options.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [Testperf-general] BufferSync and bgwriter

2004-12-16 Thread Richard Huxton
Josh Berkus wrote:
Simon,

Clearly, OSDL-DBT2 is not a real world test! That is its benefit, since
it is heavily instrumented and we are able to re-run it many times
without different parameter settings. The application is well known and
doesn't suffer that badly from factors that would allow certain effects
to be swamped. If it had too much randomness or variation, it would be
difficult to interpret.

I don't think you followed me.   The issue is that for parameters designed to 
"smooth out spikes" like bgwriter and vacuum delay, it helps to have really 
bad spikes to begin with.   There's a possibility that the parameters (and 
calculations) that work well for for a "steady-state" OLTP application are 
actually bad for an application with much more erratic usage, just as a high 
sort_mem is good for DSS and bad for OLTP.
I'm a little concerned that in an erratic, or even just a changing 
environment there isn't going to be any set of values that are "correct".

If I've got this right, the behaviour we're trying to get is:
 1. Starting from the oldest dirty block,
 2. Write as many dirty blocks as you can, but don't...
 3. Re-write frequently used blocks too much (wasteful)
So, can we not just keep track of two numbers:
 1. Change in the number of dirty blocks this time vs last
 2. Number of re-writes we perform (count collisions in a hash or 
similar - doesn't need to be perfect).

If #1 is increasing, then we need to become more active (reduce 
bgwriter_delay, increase bgwriter_maxpages).
If #2 starts to go up, or goes past some threshold then we reduce 
activity (increase bgwriter_delay, decrease bgwriter_maxpages).
If of the last N blocks written, C have been collisions then assume 
we've run out of low-activity blocks to write, stop and sleep.

This has a downside that the figures will never be completely accurate, 
but has the advantage that it will automatically track activity.

I'm clearly beyond my technical knowledge here, so if I haven't 
understood / it's impractical / will never work, then don't be afraid to 
step up and let me know. If it helps, you could always think of me as an 
idiot savant who failed his savant exams :-)

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] join selectivity

2004-12-16 Thread Tom Lane
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes:
> OK I think I've misunderstood something more fundamental than that; I
> understood from what you said that the RESTRICT clause is used to evaluate
> the cost of table1.geom && table2.geom against table2.geom && table1.geom
> (i.e. it is used to help decide which one should be seq scanned and which
> should be index scanned in a nested loop node). So is the trick here for a
> commutative operator to simply return the same value for both cases, as
> other factors such as index size costs are considered elsewhere?

If the operator is commutative then the result should be too.  Really
you should not be thinking about costs at all when coding a selectivity
estimator: its charter is to estimate how many rows will match the
condition, not to estimate costs per se.

Note however that these aren't really the "same case", as you'd be
referencing two different columns with presumably different statistics.

> My final question would be how would can we detect the difference between
> RESTRICT being called in this manner (as part ofwith
> an unknown constant) as opposed towith a known
> constant?

You should probably read the existing selectivity estimators in
utils/adt/selfuncs.c.  There's a fair amount of infrastructure code in
that file that you could borrow.  (It's not currently exported because
it tends to change from version to version, but maybe we could think
about making some of the routines global.)

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] integer datetimes

2004-12-16 Thread Andrew Dunstan
Has any thought been given to making integer datetimes the default on 
platforms that support it? Are there any performance implications? I saw 
the Tom Lane raised this when it was added, and it looks like the 
decision was deferred. I know that, ceteris paribus, I would rather have 
fixed microsecond precision across a slightly smaller range in every app 
I've ever worked on. Which platforms don't support it? I did some search 
on all these questions but didn't find much info. I'm probably going to 
add the flag enabling it to the default buildfarm setup.

cheers
andrew

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] join selectivity

2004-12-16 Thread Mark Cave-Ayland

> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED]
> Sent: 16 December 2004 15:55
> To: Mark Cave-Ayland
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
> [EMAIL PROTECTED]
> Subject: Re: [HACKERS] join selectivity
> 
> 
> "Mark Cave-Ayland" <[EMAIL PROTECTED]> writes:
> > ...and with two indices RESTRICT is called four times. The part I 
> > find
> > confusing is why with one index that RESTRICT is called twice.
> 
> [ shrug... ]  clause_selectivity doesn't try to cache the result.


Hi Tom,

OK I think I've misunderstood something more fundamental than that; I
understood from what you said that the RESTRICT clause is used to evaluate
the cost of table1.geom && table2.geom against table2.geom && table1.geom
(i.e. it is used to help decide which one should be seq scanned and which
should be index scanned in a nested loop node). So is the trick here for a
commutative operator to simply return the same value for both cases, as
other factors such as index size costs are considered elsewhere?

My final question would be how would can we detect the difference between
RESTRICT being called in this manner (as part ofwith
an unknown constant) as opposed towith a known
constant?


Many thanks,

Mark.


WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


> > I was also thinking whether calling RESTRICT when comparing with an
> > unknown value is worth doing at all, however I did think 
> that perhaps
> > if you are using a cast to perform an operation on two
> datatypes, then
> > you may be able to imply something from the index, such as its
> > physical size, and hint that the planner should use a 
> particular index
> > in preference for the other.
> 
> That would be inappropriate; the index size is factored in elsewhere
> (gistcostestimate() to be specific).  Restriction selectivity
> shouldn't directly consider the existence of indexes at all.
> 
> > Would it be correct to assume that if returning the same value for
> > RESTRICT for both means that the planner will choose one at random?
> 
> If the tables/indexes are exactly the same size then you'd
> get the same cost and the choice would be effectively random.
> 
>   regards, tom lane
> 



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] [Testperf-general] BufferSync and bgwriter

2004-12-16 Thread Zeugswetter Andreas DAZ SD

> Hmmm, I've not seen this.  For example, with people who are having trouble 
> with checkpoint spikes on Linux, I've taken to recommending that they call 
> sync() (via cron) every 5-10 seconds (thanks, Bruce, for suggestion!).   
> Believe it or not, this does help smooth out the spikes and give better 
> overall performance in a many-small-writes situation.

The reason is imho that the checkpoint otherwise also syncs all other 
writes. These can be writes other backends had to do to replace a buffer.
Linux obviously lacks a mechanism to distribute the IO for cached writes 
over time ala bgwriter (or does not do it when already faced with an IO 
bottleneck).

Andreas

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] join selectivity

2004-12-16 Thread Tom Lane
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes:
> ...and with two indices RESTRICT is called four times. The part I find
> confusing is why with one index that RESTRICT is called twice.

[ shrug... ]  clause_selectivity doesn't try to cache the result.

> I was also thinking whether calling RESTRICT when comparing with an unknown
> value is worth doing at all, however I did think that perhaps if you are
> using a cast to perform an operation on two datatypes, then you may be able
> to imply something from the index, such as its physical size, and hint that
> the planner should use a particular index in preference for the other.

That would be inappropriate; the index size is factored in elsewhere
(gistcostestimate() to be specific).  Restriction selectivity shouldn't
directly consider the existence of indexes at all.

> Would it be correct to assume that if returning the same value for
> RESTRICT for both means that the planner will choose one at random?

If the tables/indexes are exactly the same size then you'd get the same
cost and the choice would be effectively random.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] join selectivity

2004-12-16 Thread Mark Cave-Ayland
Hi Tom,

> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED] 
> Sent: 13 December 2004 17:16
> To: Mark Cave-Ayland
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
> [EMAIL PROTECTED]
> Subject: Re: [HACKERS] join selectivity
> 
> 
> "Mark Cave-Ayland" <[EMAIL PROTECTED]> writes:
> > For a query like this:
> > 
> > SELECT id FROM table1, table2
> > WHERE table1.geom && table2.geom;
> > 
> > RESTRICT selectivity is invoked twice and
> > JOIN selectivity is invoked once.
> 
> Hm, are you testing in a context where both tables have 
> indexes that are relevant to the && operator?
> 
> The estimated join result size is computed from the join 
> selectivity estimate for the && operator.  I was about to say 
> that restriction selectivity wouldn't be used at all, but on 
> second thought I believe that it would be invoked while 
> considering nestloop with inner indexscan plans.  That is, 
> we'd consider
> 
>   NestLoop
>   Seq Scan on table2
>   Indexscan on table1
>   IndexCond: table1.geom && outer.geom
> 
> and to determine the estimated cost of each indexscan, we 
> would invoke restriction selectivity for &&, with varRelid 
> referencing table1. Given this call you are supposed to treat 
> table2.geom as a constant of uncertain value, so the thing is 
> semantically sensible as a restriction clause for table1 
> (whether you can produce a really good estimate is another 
> question :-().
> 
> Similarly, we'd consider the reverse plan with table1 as 
> outer, and that would give rise to another restriction 
> selectivity check with varRelid = table2.

Just to clarify, here are the explain results from strk's query:


strk=# explain analyze select * from test1, test2 where test1.geom &&
test2.geom;
NOTICE:  LWGEOM_gist_joinsel called (returning 0.05)
  QUERY PLAN


--
 Nested Loop  (cost=3.27..105.84 rows=1 width=64) (actual time=0.217..39.305
rows=2700 loops=1)
   Join Filter: ("inner".geom && "outer".geom)
   ->  Seq Scan on test2  (cost=0.00..28.32 rows=132 width=32) (actual
time=0.081..1.111 rows=108 loops=1)
   ->  Materialize  (cost=3.27..3.52 rows=25 width=32) (actual
time=0.001..0.011 rows=25 loops=108)
 ->  Seq Scan on test1  (cost=0.00..3.25 rows=25 width=32) (actual
time=0.043..0.129 rows=25 loops=1)  Total runtime: 40.471 ms (6 rows)


 so with no indices the JOIN function is called once, RESTRICT never. I
can understand this :)


strk=# create index test2_gist on test2 using gist (geom gist_geometry_ops);
CREATE INDEX
strk=# explain analyze select * from test1, test2 where test1.geom &&
test2.geom;
NOTICE:  LWGEOM_gist_joinsel called (returning 0.05)
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
  QUERY PLAN


--
 Nested Loop  (cost=3.27..92.11 rows=1 width=64) (actual time=0.046..39.219
rows=2700 loops=1)
   Join Filter: ("inner".geom && "outer".geom)
   ->  Seq Scan on test2  (cost=0.00..28.08 rows=108 width=32) (actual
time=0.009..0.198 rows=108 loops=1)
   ->  Materialize  (cost=3.27..3.52 rows=25 width=32) (actual
time=0.000..0.013 rows=25 loops=108)
 ->  Seq Scan on test1  (cost=0.00..3.25 rows=25 width=32) (actual
time=0.002..0.052 rows=25 loops=1)  Total runtime: 40.307 ms (6 rows)


...with one index RESTRICT is called twice.


strk=# create index test1_gist on test1 using gist (geom gist_geometry_ops);
CREATE INDEX
strk=# explain analyze select * from test1, test2 where test1.geom &&
test2.geom;
NOTICE:  LWGEOM_gist_joinsel called (returning 0.05)
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
  QUERY PLAN


--
 Nested Loop  (cost=3.27..92.11 rows=1 width=64) (actual time=0.052..38.867
rows=2700 loops=1)
   Join Filter: ("inner".geom && "outer".geom)
   ->  Seq Scan on test2  (cost=0.00..28.08 rows=108 width=32) (actual
time=0.012..0.181 rows=108 loops=1)
   ->  Materialize  (cost=3.27..3.52 rows=25 width=32) (actual
time=0.000..0.010 rows=25 loops=108)
 ->  Seq Scan on test1  (cost=0.00..3.25 rows=25 width=32) (actual
time=0.002..0.032 rows=25 

Re: [Fwd: Re: [HACKERS] race condition for drop schema cascade?]

2004-12-16 Thread Jim Buttafuoco
I have rebuild the filesystem on my indy (MIPS) that Andrew reported on.  The 
first run completed 100%,  I would give 
it a couple more runs before we can say its the filesystem not Postgresql that 
was causing the drop to fail.
 


-- Original Message ---
From: Andrew Dunstan <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Wed, 15 Dec 2004 16:42:59 -0500
Subject: [Fwd: Re: [HACKERS] race condition for drop schema cascade?]

> Jim, please advise?
> 
> thanks
> 
> andrew
> 
>  Original Message 
> Subject:  Re: [HACKERS] race condition for drop schema cascade?
> Date: Wed, 15 Dec 2004 16:29:01 -0500
> From: Tom Lane <[EMAIL PROTECTED]>
> To:   Andrew Dunstan <[EMAIL PROTECTED]>
> CC:   PostgreSQL-development <[EMAIL PROTECTED]>
> References:   <[EMAIL PROTECTED]>
> 
> Andrew Dunstan <[EMAIL PROTECTED]> writes:
> > I have seen this failure several times, but not consistently, on the 
> > buildfarm member otter (Debian/MIPS) and possible on others, and am 
> > wondering if it indicates a possible race condition on DROP SCHEMA CASCADE.
> 
> Hard to see what, considering that there's only one backend touching
> that tablespace in the test.  I'd be inclined to wonder if there's
> a filesystem-level problem on that platform.  What filesystem are you
> running on anyway?
> 
>   regards, tom lane
--- End of Original Message ---


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] bgwriter changes

2004-12-16 Thread Neil Conway
Zeugswetter Andreas DAZ SD wrote:
This has the disadvantage of converging against 0 dirty pages.
A system that has less than maxpages dirty will write every page with 
every bgwriter run.
Yeah, I'm concerned about the bgwriter being overly aggressive if we 
disable bgwriter_percent. If we leave the settings as they are (delay = 
200, maxpages = 100, shared_buffers = 1000 by default), we will be 
writing all the dirty pages to disk every 2 seconds, which seems far too 
much.

It might also be good to reduce the delay, in order to more proactively 
keep the LRUs clean (e.g. scanning to find N dirty pages once per second 
is likely to reach father away from the LRU than scanning for N/M pages 
once per 1/M seconds). On the other hand the more often the bgwriter 
scans the buffer pool, the more times the BufMgrLock needs to be 
acquired -- and in a system in which pages aren't being dirtied very 
rapidly (or the dirtied pages tend to be very hot), each of those scans 
is going to take a while to find enough dirty pages using #2. So perhaps 
it is best to leave the delay as is for 8.0.

This might have the disadvantage of either leaving too much for the 
checkpoint or writing too many dirty pages in one run. Is writing a lot 
in one run actually a problem though ? Or does the bgwriter pause
periodically while writing the pages of one run ?
The bgwriter does not pause between writing pages. What would be the 
point of doing that? The kernel is going to be caching the write() anyway.

If this is expressed in pages it would naturally need to be more than the 
current maxpages (to accomodate for clean pages). The suggested 2% sounded 
way too low for me (that leaves 98% to the checkpoint).
I agree this might be a problem, but it doesn't necessarily leave 98% to 
be written at checkpoint: if the buffers in the LRU change over time, 
the set of pages searched by the bgwriter will also change. I'm not sure 
how quickly the pages near the LRU change in a "typical workload"; 
moreover, I think this would vary between different workloads.

-Neil
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] bgwriter changes

2004-12-16 Thread Zeugswetter Andreas DAZ SD

> > Only if you redefine the meaning of bgwriter_percent.  At present it's
> > defined by reference to the total number of dirty pages, and that can't
> > be known without collecting them all.
> > 
> > If it were, say, a percentage of the total length of the T1/T2 lists,
> > then we'd have some chance of stopping the scan early.

> The other way around would make sense. In order to avoid writing the 
> busiest buffers at all (except for checkpoinging), the parameter should 
> mean "don't scan the last x% of the queue at all".

Your meaning is 1 - above meaning (at least that is what Tom and I meant),
but is probably easier to understand (== Informix LRU_MIN_DIRTY).

> Still, we need to avoid scanning over all the clean blocks of a large 
> buffer pool, so there is need for a separate dirty-LRU.

Maybe a "may be dirty" bitmap would be easier to do without beeing deadlock 
prone ?

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] getting 'order by' working with unicode locale? ICU?

2004-12-16 Thread Peter Eisentraut
Palle Girgensohn wrote:
> Not on FreeBSD, since collation is not implemented in unicode
> locales. One way would be to implement it in the OS, of course...

Try taking the locale definition files from another system and use 
localedef to build locale files for your local system.  The localedef 
source files are supposed to be portable.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 8: explain analyze is your friend