Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites

2004-11-14 Thread Dawid Kuroczko
On 15 Nov 2004 02:00:37 -0500, Greg Stark <[EMAIL PROTECTED]> wrote:
> I think people should get away from thinking about "order by + limit". That
> isn't going to work for anything with a GROUP BY. And it isn't going to work
> for anything more complex than a single min() or max().
> 
> min() only needs the first record from whatever set of records it's operating
> on as long as they're provided in a specified order. This is just as true for
> a min() applied to only a single GROUP as it is for a min() applied to an
> entire table.

But as far as I can tell there is no way of forcing such order, at least
ORDER BY queries are doomed to fail:

select max(val) from test_max order by val desc;
ERROR:  column "test_max.val" must appear in the GROUP BY clause or be
used in an aggregate function

Anyway I think that any optimization (supposedly "imlicit" order by
when min() or max() is the only requested column) would at least stop
people from using
awkward syntax for performance reasons...

Regards,
Dawid

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] Odd plpgsql behaviour

2004-11-14 Thread Christopher Kings-Lynne
On 7.4:
This is what we wanted to do:
IF TG_OP = 'INSERT' OR (TG_OP = 'UPDATE' AND NEW.name != OLD.name) THEN
 EXECUTE x;
END IF;
However, we had to write it like this:
IF TG_OP = 'INSERT' THEN
 EXECUTE x;
ELSIF TG_OP = 'UPDATE' AND NEW.name != OLD.name THEN
 EXECUTE x;
END IF;
Because in the first case it would complain that OLD.name wasn't 
defined, if the trigger was NOT an update.

OK, but the second case works??!?!  Is this a weird peculiarity of the 
pl/pgsql lazy evaluation rules?  Why doesn't the first one work if the 
second one does?

Chris
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites

2004-11-14 Thread Greg Stark

Jan Wieck <[EMAIL PROTECTED]> writes:

> Both cases can be expressed with order by + limit queries, that would indeed
> utilize those indexes. But what's been discussed so far does not cover any of
> them.

I think people should get away from thinking about "order by + limit". That
isn't going to work for anything with a GROUP BY. And it isn't going to work
for anything more complex than a single min() or max().

min() only needs the first record from whatever set of records it's operating
on as long as they're provided in a specified order. This is just as true for
a min() applied to only a single GROUP as it is for a min() applied to an
entire table.

I don't think you want to use the existing Limit executor node. That will only
ever let you handle these simple aggregates that return the first value they
see. What you want is a normal Aggregate node, but the node feeding it should
be an altered index scan that knows it only needs to pull out the first and/or
last record for each GROUP.

That will let you handle min() and max() in the same query for example. It
might also leave open the door for other more complex data subsets. Say a
geometric data type where it needs all the bounding points of an area.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] psql \e broken again

2004-11-14 Thread joseph speigle
hi,

As for mac os x this is on darwin mac os (it's mac something)
I ran a find command with a space in it.

mechatronics:~ joe$ find . -name '* *'
./backups/dscpadmin/scriptsMay19/dscp/validate/List of Cancers.doc
./backups/pain/PAINjune/validate/List of Cancers.doc
./backups/untarhere/test/List of Cancers.doc
./Library/Address Book Plug-Ins
./Library/Application Support
./Library/Caches/Desktop/Desktop Pictures.dpImageCache
./Library/Caches/Desktop/Solid Colors.dpImageCache
./Library/Caches/iPhoto Cache
./Library/Caches/Software Update
./Library/FontCollections/Fixed Width.collection
./Library/iMovie/Sound Effects
./Library/Internet Plug-Ins
./Library/Keyboard Layouts
./Library/Preferences/QuickTime Preferences
./Library/Safari/Form Values
./Public/Drop Box

On Sun, Nov 14, 2004 at 10:14:06PM -0500, Bruce Momjian wrote:
> John Hansen wrote:
> > > > 1. Quote only on Windows.
> > > >
> > > > 2. Expect user to put quotes in the EDITOR value if it contains a 
> > > > space-containing path.
> > 
> > 
> > As far I I'm aware, the options on windows are very much like those on
> > unix:
> > 
> > "path containing spaces" or
> > path\ containing\ spaces
> 
> My guess is that we should only address spaces on Win32, but should
> document why did didn't do it on Unix.  Also, what about OS X?  Doesn't
> that have frequent spaces in file paths?
> 
> -- 
>   Bruce Momjian|  http://candle.pha.pa.us
>   [EMAIL PROTECTED]   |  (610) 359-1001
>   +  If your life is a hard drive, |  13 Roberts Road
>   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faqs/FAQ.html

-- 
if you're christian, god bless; otherwise, good luck; and, if you dont believe 
in luck ... 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] GiST: PickSplit and multi-attr indexes

2004-11-14 Thread Neil Conway
On Sun, 2004-11-14 at 18:54 -0500, Tom Lane wrote:
> It's probably just a hangover from the days when GiST didn't support
> multi-column indexes at all.  I agree it should be changed.

I'm not sure the right way to fix it (at least without significant
changes to the GiST API). At present, the PickSplit() method is passed a
vector of GISTENTRYs and fills in a GIST_SPLITVEC. The GISTENTRYs
correspond to the first attributes of all the tuples in the node to be
split. There is no provision for the GiST extension to be informed about
any additional attributes in the index tuples. Even if we changed the
API to allow that, the GiST extension would have a hard time making a
reasonable decision in the multi-attribute case: the additional index
attributes could well be implemented using a different GiST extension.

> But note you will then have to cope with NULL values.

Yes -- I'm not sure offhand why GiST does not allow leading NULL values
in index attributes, but that ought to be fixed anyway.

-Neil



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] psql \e broken again

2004-11-14 Thread Bruce Momjian
John Hansen wrote:
> > > 1. Quote only on Windows.
> > >
> > > 2. Expect user to put quotes in the EDITOR value if it contains a 
> > > space-containing path.
> 
> 
> As far I I'm aware, the options on windows are very much like those on
> unix:
> 
> "path containing spaces" or
> path\ containing\ spaces

My guess is that we should only address spaces on Win32, but should
document why did didn't do it on Unix.  Also, what about OS X?  Doesn't
that have frequent spaces in file paths?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] IpcSemaphoreLock/Unlock and proc_exit on 7.2.6

2004-11-14 Thread Tom Lane
Kris Jurka <[EMAIL PROTECTED]> writes:
> On Sun, 14 Nov 2004, Tom Lane wrote:
>> Is your current value one less than a multiple of 16, by any chance?

> Currently 32.  It is unclear whether you think 31 is the failure case your 
> thinking of or whether 31 might help.

No, 32 is actually the best case (most slop) if I'm reading the code
correctly.

I'd suggest an update to 7.3 or later ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] IpcSemaphoreLock/Unlock and proc_exit on 7.2.6

2004-11-14 Thread Kris Jurka


On Sun, 14 Nov 2004, Tom Lane wrote:

> The comment in ProcGetNewSemIdAndNum suggests that you might be able to
> suppress the problem in 7.2 by using a different max_connections value.
> Is your current value one less than a multiple of 16, by any chance?
> 

Currently 32.  It is unclear whether you think 31 is the failure case your 
thinking of or whether 31 might help.

Kris Jurka

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] IpcSemaphoreLock/Unlock and proc_exit on 7.2.6

2004-11-14 Thread Tom Lane
Kris Jurka <[EMAIL PROTECTED]> writes:
> I have an underpowered server running 7.2.6 that backs a website which
> occasionally gets hit by a bunch of traffic and starts firing off "FATAL
> 1:  Sorry, too many clients already" messages.  This is all as expected,
> but sometimes it just crashes.  I had no clue what was going on until I
> checked the stderr log (because I had set it up to use syslog).  In there
> I find a whole bunch of these:

> IpcSemaphoreLock: semop(id=-1) failed: Invalid argument

[ eyeballs code... ]  It looks like this could happen in 7.2 during exit
from a backend that failed to acquire a semaphore --- ProcKill does
things like LockReleaseAll, which needs to acquire the lockmanager LWLock,
which could try to block using the process semaphore if there's
contention for the LWLock.  The problem should be gone in 7.3 and later
due to reorganization of the semaphore management code.  I'm not sure
it's worth trying to fix in 7.2.* --- the odds of introducing new
problems seem too high, and we're not really maintaining 7.2 anymore
anyway.

The comment in ProcGetNewSemIdAndNum suggests that you might be able to
suppress the problem in 7.2 by using a different max_connections value.
Is your current value one less than a multiple of 16, by any chance?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] GiST: PickSplit and multi-attr indexes

2004-11-14 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> If I understand the code correctly, GiST will only pass the first 
> attribute of each index tuple to the user-defined PickSplit method when 
> it wants to split a node. (see circa line 1269 of gist.c)

> Is this a wise design decision?

It's probably just a hangover from the days when GiST didn't support
multi-column indexes at all.  I agree it should be changed.  But note
you will then have to cope with NULL values.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] German-style quotes in the source file

2004-11-14 Thread Tom Lane
"Serguei Mokhov" <[EMAIL PROTECTED]> writes:
> I was about to update initdb translation, but noticed
> that newly introduced error messages  in the code have 
> German-style quotes. These propagated to the .po files now... 

Good catch; fix committed.  A quick grep confirms these are
the only two cases.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] code question: storing INTO relation

2004-11-14 Thread Simon Riggs
On Sun, 2004-11-14 at 22:59, Neil Conway wrote:
> On Sun, 2004-11-14 at 11:06 +, Simon Riggs wrote:
> > HASH - works OK, but a pain to administer, no huge benefit in using
> 
> At least in theory, I think this could offer better performance for
> equality searches than b+-tree. Given how common those kinds of queries
> are, I still think hash indexes are worth putting some time into. My
> guess is that their relatively poor performance at present (relative to
> b+-trees) is just a reflection of how much more tuning and design work
> has gone into the b+-tree code than the hash code.

Can be faster for equality searches on a fairly static table; on a
growing table, could be same or worse. IMHO The theoretical difference
in speed doesn't seem worth the effort of spending additional time in
that part of the code, given the inherent pain of REINDEX.

> > GiST - index of choice for PostGIS, TSearch2, in need of optimization
> 
> I'm working on adding page-level locking and WAL safety, although this
> is a pretty difficult project. 

Difficult, yes. I'm glad you're stepping up to the plate for the WAL
safety.

Two index types is sufficient, and ISTM should be the maximum therefore.
When you've finished tuning GiST, I wager that you will agree :)

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] German-style quotes in the source file

2004-11-14 Thread Serguei Mokhov
Hello Peter,

I was about to update initdb translation, but noticed
that newly introduced error messages  in the code have 
German-style quotes. These propagated to the .po files now... 
It happened in this commit:

http://developer.postgresql.org/cvsweb.cgi/pgsql/src/bin/initdb/initdb.c.diff?r1=1.65;r2=1.66

--
Serguei A. Mokhov

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] postmaster segfaults with HUGE table

2004-11-14 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> This specific assertion is triggered because we represent attribute 
> numbers throughout the code base as a (signed) int16 -- the assertion 
> failure has occurred because an int16 has wrapped around due to 
> overflow. A fix would be to add a check to DefineRelation() (or even 
> earlier) to reject CREATE TABLEs with more than "MaxHeapAttributeNumber" 
> columns.

Good analysis.  We can't check earlier than DefineRelation AFAICS,
because earlier stages don't know about inherited columns.

On reflection I suspect there are similar issues with SELECTs that have
more than 64K output columns.  This probably has to be guarded against
in parser/analyze.c.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] code question: storing INTO relation

2004-11-14 Thread Neil Conway
On Sun, 2004-11-14 at 11:06 +, Simon Riggs wrote:
> HASH - works OK, but a pain to administer, no huge benefit in using

At least in theory, I think this could offer better performance for
equality searches than b+-tree. Given how common those kinds of queries
are, I still think hash indexes are worth putting some time into. My
guess is that their relatively poor performance at present (relative to
b+-trees) is just a reflection of how much more tuning and design work
has gone into the b+-tree code than the hash code.

> R-TREE - slightly broken in places, limited in usablity

I agree. I hope that when we have a good GiST infrastructure,
implementing rtree via GiST will offer performance that is as good as or
better than the builtin rtree.

> GiST - index of choice for PostGIS, TSearch2, in need of optimization

I'm working on adding page-level locking and WAL safety, although this
is a pretty difficult project. Gavin and I are also looking at
algorithms for bulk loading GiST indexes, although I'm not yet sure how
possible that will be.

-Neil



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites

2004-11-14 Thread Jan Wieck
On 11/10/2004 11:57 PM, Mark Kirkwood wrote:
Your example and ones like :
SELECT max(foo), count(foo) FROM bar
SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b
have made me realize that the scope of "what should be optimized" is 
somewhat subtle.

I am inclined to keep it simple (i.e rather limited) for a first cut, 
and if that works well, then look at extending to more complex rewrites.

What do you think?
The problem is, that
select min(foo) from bar where foo > 100;
is still solvable with an index scan, assuming there is an index on foo.
But
select min(foo) from bar where baz = 'IT';
is only doable with an index scan if you have a compound index on (foo,baz).
Both cases can be expressed with order by + limit queries, that would 
indeed utilize those indexes. But what's been discussed so far does not 
cover any of them.

Jan

Jim C. Nasby wrote:
On Thu, Nov 11, 2004 at 11:48:49AM +1300, Mark Kirkwood wrote:
 

I am looking at implementing this TODO item. e.g. (max case):
rewrite
SELECT max(foo) FROM bar
as
SELECT foo FROM bar ORDER BY foo DESC LIMIT 1
if there is an index on bar(foo)
   

Out of curiosity, will you be doing this in such a way that 

SELECT min(foo), max(foo) FROM bar
will end up as
SELECT (SELECT foo FROM bar ORDER BY foo ASC LIMIT 1), (SELECT ... DESC
LIMIT 1)
?
 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] psql \e broken again

2004-11-14 Thread John Hansen
> > 1. Quote only on Windows.
> >
> > 2. Expect user to put quotes in the EDITOR value if it contains a 
> > space-containing path.


As far I I'm aware, the options on windows are very much like those on
unix:

"path containing spaces" or
path\ containing\ spaces

Kind Regards

John Hansen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT

2004-11-14 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> OK, Tom please go ahead with the patch.

Done.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] psql \e broken again

2004-11-14 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes:
> The EDITOR variable seems to have a fairly standard meaning on Unix 
> systems.  I've been using that EDITOR value for years without problems, 
> only when I use psql's \e once in a while it breaks.  I don't think we 
> should deviate from what seems to be a standard practice.

Agreed, no quotes on Unix.  I'm just wondering what to do on Windows.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] psql \e broken again

2004-11-14 Thread Peter Eisentraut
Tom Lane wrote:
> I think the rationale was to allow paths containing spaces, which is
> a pretty serious problem on Windows.  Seems like we have two basic
> options:
>
> 1. Quote only on Windows.
>
> 2. Expect user to put quotes in the EDITOR value if it contains a
> space-containing path.

The EDITOR variable seems to have a fairly standard meaning on Unix 
systems.  I've been using that EDITOR value for years without problems, 
only when I use psql's \e once in a while it breaks.  I don't think we 
should deviate from what seems to be a standard practice.

I wonder whether a similar convention exists on Windows.  I could 
certainly live with quoting only on Windows if that is what the 
convention is there.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] psql \e broken again

2004-11-14 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes:
> A remember specifically that I somewhat recently fixed psql to accept 
> editors with arguments, say EDITOR="pico -t".  This was apparently 
> broken again during some Windows-related reshuffling.  It now takes the 
> editor as one quoted string rather than possibly several shell tokens.  
> Could this please be fixed?

I think the rationale was to allow paths containing spaces, which is
a pretty serious problem on Windows.  Seems like we have two basic
options:

1. Quote only on Windows.

2. Expect user to put quotes in the EDITOR value if it contains a
space-containing path.

I don't much care for either of these :-(.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] psql \e broken again

2004-11-14 Thread Peter Eisentraut
A remember specifically that I somewhat recently fixed psql to accept 
editors with arguments, say EDITOR="pico -t".  This was apparently 
broken again during some Windows-related reshuffling.  It now takes the 
editor as one quoted string rather than possibly several shell tokens.  
Could this please be fixed?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] pgxs regression

2004-11-14 Thread Thomas Hallgren
Joe Conway wrote:
I'm not sure exactly what has changed, nor at the moment how to fix it, 
but I'm finding that pgxs no longer works for PL/R or dblink. Error as 
follows:

make: *** No rule to make target 
`/usr/local/pgsql-dev/lib/pgxs/src/makefiles/../../src/port/pg_config_paths.h', 
needed by `all-static-lib'.  Stop.

The problem is related specifically to Makefiles using MODULE_big. I 
tested a few contribs that use MODULES and they seem to work fine under 
pgxs.

Joe
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Seems you have the same issue that I have with PL/Java. I had to 
explicitly change my target from:

   all: all_lib
to:
   all: $(shlib)
The thread "Problems with pgxs" started in hackers on 10/31 may be of 
some interest.

Regards,
Thomas Hallgren
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] code question: storing INTO relation

2004-11-14 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Fri, 2004-11-12 at 23:13, Tom Lane wrote:
>> Greg is correct --- at least for btree build, which is the only index
>> type we have WAL-ified at all :-(

> Is there a place (or a single best place) to document this behaviour?

If you're talking about the lack of WAL backup for non-btree indexes,
it is documented (at the end of the PITR section IIRC).

If you're talking about the optimization of not logging index builds,
I don't see a need to document that per se.  Ordinary users shouldn't
need to care, mainly because they can't affect it one way or the other.
Anyone who does care can look at the code and see how it's done.  (Open
source has a big advantage over closed source in that regard, and I
think it's reasonable to have different documentation practices than
closed-source products would use.)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Update TIP 9 please

2004-11-14 Thread Markus Bertheau
Who maintains the tips?

Ð ÐÑÐ, 14/11/2004 Ð 11:31 +, Simon Riggs ÐÐÑÐÑ:
> When 8.0 is released, TIP 9 should change from
> 
> TIP 9: the planner will ignore your desire to choose an index scan if
> your joining column's datatypes do not match
> 
> to 
> 
> TIP 9: the planner will ignore your desire to choose an index scan if
> your joining column's datatypes do not match (upgrade to 8.0!)
> 
> We could change this now...
> 
> ...and add another TIP to encourage people to upgrade...?
> 
-- 
Markus Bertheau <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] Update TIP 9 please

2004-11-14 Thread Simon Riggs
When 8.0 is released, TIP 9 should change from

TIP 9: the planner will ignore your desire to choose an index scan if
your joining column's datatypes do not match

to 

TIP 9: the planner will ignore your desire to choose an index scan if
your joining column's datatypes do not match (upgrade to 8.0!)

We could change this now...

...and add another TIP to encourage people to upgrade...?

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] code question: storing INTO relation

2004-11-14 Thread Simon Riggs
On Fri, 2004-11-12 at 23:13, Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Greg Stark wrote:
> >> I think that's already done for CREATE INDEX/REINDEX.
> 
> > I don't think so.  Can someone confirm?
> 
> Greg is correct --- at least for btree build, which is the only index
> type we have WAL-ified at all :-(
> 

Is there a place (or a single best place) to document this behaviour?
- with each command?
- in the backup section?
- in runtime?

Seems a shame to optimize and not tell anyone.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] postmaster segfaults with HUGE table

2004-11-14 Thread Simon Riggs
On Sun, 2004-11-14 at 10:05, Neil Conway wrote:
> Joachim Wieland wrote:
> > this query makes postmaster (beta4) die with signal 11:
> > 
> > (echo "CREATE TABLE footest(";
> > for i in `seq 0 66000`; do
> > echo "col$i int NOT NULL,";
> > done;
> > echo "PRIMARY KEY(col0));") | psql test
> > 
> > 
> > ERROR:  tables can have at most 1600 columns
> > LOG:  server process (PID 2140) was terminated by signal 11
> > LOG:  terminating any other active server processes
> > LOG:  all server processes terminated; reinitializing
> 
> At best you're going to get the error message above: "tables can have at 
> most 1600 columns". But this is definitely a bug: we end up triggering 
> this assertion:
> 
> TRAP: BadArgument("!(attributeNumber >= 1)", File: "tupdesc.c", Line: 405)
> 
> This specific assertion is triggered because we represent attribute 
> numbers throughout the code base as a (signed) int16 -- the assertion 
> failure has occurred because an int16 has wrapped around due to 
> overflow. A fix would be to add a check to DefineRelation() (or even 
> earlier) to reject CREATE TABLEs with more than "MaxHeapAttributeNumber" 
> columns. We eventually do perform this check in 
> heap_create_with_catalog(), but by that point it is too late: various 
> functions have been invoked that assume we're dealing with a sane number 
> of columns.
> 
> BTW, I noticed that there's an O(n^2) algorithm to check for duplicate 
> column names in DefineRelation() -- with 60,000 column names that takes 
> minutes to execute, but an inconsequential amount of time with 1500 
> column names. So I don't think there's any point rewriting the algorithm 
> to be faster -- provided we move the check for MaxHeapAttributeNumber 
> previous to this loop, we should be fine.
> 

This seems too obvious a problem to have caused a bug...presumably this
has been there for a while? Does this mean that we do not have
regression tests for each maximum setting ... i.e. are we missing a
whole class of tests in the regression tests?

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] code question: storing INTO relation

2004-11-14 Thread Simon Riggs
On Fri, 2004-11-12 at 23:13, Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Greg Stark wrote:
> >> I think that's already done for CREATE INDEX/REINDEX.
> 
> > I don't think so.  Can someone confirm?
> 
> Greg is correct --- at least for btree build, which is the only index
> type we have WAL-ified at all :-(

[well...at least they're optimized then...  :) ]

With regard to the other index types, my opinion was:
HASH - works OK, but a pain to administer, no huge benefit in using
R-TREE - slightly broken in places, limited in usablity
GiST - index of choice for PostGIS, TSearch2, in need of optimization

Following recent optimization work on GiST, it now seems worth the
trouble to add WAL logging to it. ISTM that the other two aren't widely
used enough to make it worthwhile to spend time on, evidence for which
is also that no one ever has, up 'til now.

Time-management seems to be the key to making progress in the most
important areas...

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Increasing the length of

2004-11-14 Thread Simon Riggs
On Wed, 2004-11-10 at 22:51, Andrew Sullivan wrote: 
> On Wed, Nov 10, 2004 at 09:52:17PM +, Simon Riggs wrote:
> > On Wed, 2004-11-10 at 21:48, Richard Huxton wrote:
> > > 
> > > Isn't that:
> > > log_min_duration_statement (integer)
> > 
> > That gets written when a statement completes, not during execution.
> 
> I've been following this thread, and I was thinking the same thing. 
> I wonder how much work it'd be to have another log setting -- say
> log_statement_after_min_duration (integer) -- which did what Simon
> wants.  That'd more than satisfy my need, for sure.  Might the cost
> of that be too high, though?

I think this is a great idea.

...Rather than invent a new GUC, I think this is the solution:

log_min_duration_statement writes to log at end of execution, if
execution time is greater than that threshold. Let's move that piece of
code so it is executed as the query progresses. That way, you get
notified that a problem query is occurring NOW, rather than "it has
occurred".

The code already has such a timer check, for statement_timeout, in
backend/storage/lmgr/proc.c. We could reuse this timer to go off at
log_min_duration_statement and then log query if still executing. (If
log_min_duration_statement >= statement_timeout, we would skip that
step.) We would then reset the timer so that it then goes off at where
it does now, at statement_timeout. So, same piece of code, used twice...

That way you can set up 2 limits, with three bands of actions:

Between 0 and log_min_duration_statement 
- logs nothing

Between log_min_duration_statement and statement_timeout
- statement written to log, though execution continues...

At statement_timeout
- statement cancelled

We'd just need a small piece of code to set timer correctly first, then
another piece to record state change and reset timer again. Lift and
drop the existing code from end-of-execution.

This then:
- solves the *problem query* diagnosis issue, as originally raised by
Sean and seconded by myself and Greg
- makes the answer exactly what Tom proposed - look in the logs
- doesn't make any changes to the technical innards of UDP and pgstats.c
- no administrative interface changes, just slightly changed behaviour -
existing users mostly wouldn't even notice we'd done it...

Thoughts?

Easy enough change to be included as a hot fix for 8.0: no new system
code, no new interface code, just same behaviour at different time.


-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] postmaster segfaults with HUGE table

2004-11-14 Thread Neil Conway
Joachim Wieland wrote:
this query makes postmaster (beta4) die with signal 11:
(echo "CREATE TABLE footest(";
for i in `seq 0 66000`; do
echo "col$i int NOT NULL,";
done;
echo "PRIMARY KEY(col0));") | psql test
ERROR:  tables can have at most 1600 columns
LOG:  server process (PID 2140) was terminated by signal 11
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
At best you're going to get the error message above: "tables can have at 
most 1600 columns". But this is definitely a bug: we end up triggering 
this assertion:

TRAP: BadArgument("!(attributeNumber >= 1)", File: "tupdesc.c", Line: 405)
This specific assertion is triggered because we represent attribute 
numbers throughout the code base as a (signed) int16 -- the assertion 
failure has occurred because an int16 has wrapped around due to 
overflow. A fix would be to add a check to DefineRelation() (or even 
earlier) to reject CREATE TABLEs with more than "MaxHeapAttributeNumber" 
columns. We eventually do perform this check in 
heap_create_with_catalog(), but by that point it is too late: various 
functions have been invoked that assume we're dealing with a sane number 
of columns.

BTW, I noticed that there's an O(n^2) algorithm to check for duplicate 
column names in DefineRelation() -- with 60,000 column names that takes 
minutes to execute, but an inconsequential amount of time with 1500 
column names. So I don't think there's any point rewriting the algorithm 
to be faster -- provided we move the check for MaxHeapAttributeNumber 
previous to this loop, we should be fine.

Thanks for the report.
-Neil
---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] postmaster segfaults with HUGE table

2004-11-14 Thread Joachim Wieland
Hi,

this query makes postmaster (beta4) die with signal 11:

(echo "CREATE TABLE footest(";
for i in `seq 0 66000`; do
echo "col$i int NOT NULL,";
done;
echo "PRIMARY KEY(col0));") | psql test


ERROR:  tables can have at most 1600 columns
LOG:  server process (PID 2140) was terminated by signal 11
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing

Program received signal SIGSEGV, Segmentation fault.
0x4015d43c in mallopt () from /lib/tls/libc.so.6
(gdb) bt
#0  0x4015d43c in mallopt () from /lib/tls/libc.so.6
#1  0x00021680 in ?? ()
[...]
#16 0x0001 in ?? ()
#17 0x0821bc9b in AllocSetDelete ()
Previous frame inner to this frame (corrupt stack?)

Furthermore the backend doesn't react on query cancel requests from psql
during execution of the query.


Joachim




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] GiST: PickSplit and multi-attr indexes

2004-11-14 Thread Neil Conway
Oleg & Teodor,
If I understand the code correctly, GiST will only pass the first 
attribute of each index tuple to the user-defined PickSplit method when 
it wants to split a node. (see circa line 1269 of gist.c)

Is this a wise design decision? Granted, in many situations the first 
attribute in the index is sufficient to make a reasonable decision about 
how to divide the node into two halves, but I don't think that is 
universally true. For example, consider a two column index whose first 
attribute has a small number of distinct values. It could well be that 
all the first attribute values in a node-to-be-split would be the same. 
Only passing the first attribute to PickSplit would result in an 
essentially random distribution of tuples among the split nodes, rather 
than allowing the GiST extension to make use of the second attribution 
to partition the nodes. That's an extreme example, but it is easy to 
construct more realistic scenarios (basically, any situation in which 
the cardinality of the first index attribute is low -- a reasonably 
common occurrence with a multi-column index, I believe).

I'm not sure whether this would be a problem in practice. Speculation: 
repeated invocations of PickSplit are one of the main factors in 
deriving the ultimate shape of the GiST tree. Poor distribution of keys 
resulting from PickSplit would eventually result in unnecessarily loose 
bounding predicates in internal nodes, which would hurt performance.

Comments welcome,
Neil
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] Big Database

2004-11-14 Thread Alexander Antonakakis
I would like to ask the more experienced users on Postgres database a 
couple of questions I have on a db I manage with a lot of data. A lot of 
data means something like 15.000.000 rows in a table. I will try to 
describe the tables and what I will have to do on them  :)
There is a table that has product data in the form of
Table product:
product_id varchar(8),
product_name text

and
product actions table:
product_id varchar(8),
flow char(1),
who int,
where int,
value float.
I will have to make sql queries in the form "select value from 
product_actions where who='someone' and where='somewhere' and maybe make 
also some calculations on these results. I allready have made some 
indexes on these tables and a view that joins the two of them but I 
would like to ask you people if someone is using such a big db and how 
can I speed up things as much as it is possible on this ... these 
product_actions tables exists for each year from 1988 till 2003 so this 
means a lot of data...

Thanks in Advance
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]