date:20080109

Re: [HACKERS] Proposal - libpq Type System beta-0.8a (was PGparam)

2008-01-09 Thread Martijn van Oosterhout

On Tue, Jan 08, 2008 at 05:33:51PM -0500, Merlin Moncure wrote:
 Here is a short example which demonstrates some of the major features.
  There are many other examples and discussions of minutia in the
 documentation.

I havn't looked at the source but FWIW I think it's an awesome idea.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Those who make peaceful revolution impossible will make violent revolution 
 inevitable.
  -- John F Kennedy


signature.asc
Description: Digital signature

Re: [HACKERS] [BUGS] BUG #3852: Could not create complex aggregate

2008-01-09 Thread Sokolov Yura


Tom Lane wrote:

Joe Conway [EMAIL PROTECTED] writes:
  
Did you want me to work on this? I could probably put some time into it 
this coming weekend.



I'll try to get to it before that --- if no serious bugs come up this
week, core is thinking of wrapping 8.3.0 at the end of the week, so
it'd be nice to have this dealt with sooner than that.

regards, tom lane

  


CREATE AGGREGATE array_concat(anyarray) (
 SFUNC=array_cat,
 STYPE=anyarray
);

CREATE AGGREGATE array_build(anyelement) (
 SFUNC=array_append,
 STYPE=anyarray
);


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] timestamp refactor effort

2008-01-09 Thread Warren Turkal

So...in the vein of my last mail, I have tried to create another patch
for refactoring out some of the HAVE_INT64_TIMESTAMP ifdefs in the
code in timestamp.c. I have attached the patch. Please let me know if
this patch is acceptable and what I can do to continue this effort.

Thanks,
wt
From 77db4f84999161c0c7ba8ded78636512cc719878 Mon Sep 17 00:00:00 2001
From: Warren Turkal [EMAIL PROTECTED]
Date: Wed, 9 Jan 2008 00:19:46 -0800
Subject: [PATCH] Add PackedTime typedef.

The PackedTime type is meant to be a type that holds the hour, minute,
second, and fractional seconds part of the time. The actual primitive
type that the PackedTime type uses is determined by the
HAVE_INT64_TIMESTAMP define.

I have also changed some of the instances of variable declaratations
for times to use the PackedTime type instead of the primitive types.
---
 src/backend/utils/adt/timestamp.c |   21 ++---
 src/include/utils/timestamp.h |9 +++--
 2 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 2883caf..1a4c247 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -44,11 +44,7 @@
 TimestampTz PgStartTime;
 
 
-#ifdef HAVE_INT64_TIMESTAMP
-static int64 time2t(const int hour, const int min, const int sec, const fsec_t fsec);
-#else
-static double time2t(const int hour, const int min, const int sec, const fsec_t fsec);
-#endif
+static PackedTime time2t(const int hour, const int min, const int sec, const fsec_t fsec);
 static int	EncodeSpecialTimestamp(Timestamp dt, char *str);
 static Timestamp dt2local(Timestamp dt, int timezone);
 static void AdjustTimestampForTypmod(Timestamp *time, int32 typmod);
@@ -1539,11 +1535,10 @@ tm2timestamp(struct pg_tm * tm, fsec_t fsec, int *tzp, Timestamp *result)
 {
 #ifdef HAVE_INT64_TIMESTAMP
 	int			date;
-	int64		time;
 #else
-	double		date,
-time;
+	double		date;
 #endif
+	PackedTime time;
 
 	/* Julian day routines are not correct for negative Julian days */
 	if (!IS_VALID_JULIAN(tm-tm_year, tm-tm_mon, tm-tm_mday))
@@ -1648,19 +1643,15 @@ tm2interval(struct pg_tm * tm, fsec_t fsec, Interval *span)
 	return 0;
 }
 
-#ifdef HAVE_INT64_TIMESTAMP
-static int64
+static PackedTime
 time2t(const int hour, const int min, const int sec, const fsec_t fsec)
 {
+#ifdef HAVE_INT64_TIMESTAMP
 	return (hour * MINS_PER_HOUR) + min) * SECS_PER_MINUTE) + sec) * USECS_PER_SEC) + fsec;
-}	/* time2t() */
 #else
-static double
-time2t(const int hour, const int min, const int sec, const fsec_t fsec)
-{
 	return (((hour * MINS_PER_HOUR) + min) * SECS_PER_MINUTE) + sec + fsec;
-}	/* time2t() */
 #endif
+}	/* time2t() */
 
 static Timestamp
 dt2local(Timestamp dt, int tz)
diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h
index 6eec76d..945b970 100644
--- a/src/include/utils/timestamp.h
+++ b/src/include/utils/timestamp.h
@@ -36,20 +36,17 @@
 #ifdef HAVE_INT64_TIMESTAMP
 typedef int64 Timestamp;
 typedef int64 TimestampTz;
+typedef int64 PackedTime;
 #else
 typedef double Timestamp;
 typedef double TimestampTz;
+typedef double PackedTime;
 #endif
 
 typedef struct
 {
-#ifdef HAVE_INT64_TIMESTAMP
-	int64		time;			/* all time units other than days, months and
- * years */
-#else
-	double		time;			/* all time units other than days, months and
+	PackedTime	time;			/* all time units other than days, months and
  * years */
-#endif
 	int32		day;			/* days, after time for alignment */
 	int32		month;			/* months and years, after time for alignment */
 } Interval;
-- 
1.5.3.7


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [BUGS] BUG #3852: Could not create complex aggregate

2008-01-09 Thread Sokolov Yura


Sorry for previous message having no comments.

Just remark:
These aggregates created successfuly both in 8.2 and 8.3beta4:

CREATE AGGREGATE array_concat(anyarray) (
SFUNC=array_cat,
STYPE=anyarray
);

CREATE AGGREGATE array_build(anyelement) (
SFUNC=array_append,
STYPE=anyarray
);

But aggregate from first letter does not:

create aggregate build_group(anyelement, int4) (
 SFUNC= add_group,
 STYPE = anyarray
);


Excuse me for being noisy and bad English.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Ilya A. Kovalenko


I suggest one more standard date/time operator, to divide one interval
by another with numeric (or float, for example) result.
I.e. something like that:

database=# SELECT '5400 seconds'::interval / '1 hour'::interval;

 ?column?
--
  1.5
(1 row)

Ilya A. Kovalenko


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PATCHES] [HACKERS] Archiver behavior at shutdown

2008-01-09 Thread Simon Riggs

On Sat, 2008-01-05 at 12:09 +, Simon Riggs wrote:
 On Fri, 2008-01-04 at 17:28 +0900, Fujii Masao wrote:
  Simon Riggs wrote:
  
   My original one line change described on bug 3843 seems like the best
   solution for 8.3.
   
  
  +1
  Is this change in time for RC1?
 
 Patch attached.

Not sure why this hasn't being applied yet for 8.3

We have a small problem, a fix and a user voting for the fix.

Can we discuss, briefly?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Some ideas about Vacuum

2008-01-09 Thread Gokulakannan Somasundaram

Hi,
May be i am reposting something which has been discussed to end in this
forum. I have made a search in the archives and i couldn't find any
immediately.
With my relatively small experience in Performance Testing and Tuning,
one of the rules of thumb for getting Performance is Don't do it, if you
don't need to do it. When we look at clearing the older versions of tuples
from our tables in PostgreSQL, we can't stop thinking about how it is done
in other databases. When we compare the Oracle Undo Log approach with the
Postgresql Vacuum approach, the pattern looks very similar to C++ memory
de-allocation and Java garbage collection.
  So, as you may all know, the thing which worries us about Vacuum is
that it is going to places where it need not goto. That's when we are
thinking about Dead space Map. This dead space map is a map, if implemented
correctly, would guide Vacuum to go and only look at places where there was
some activity of Delete/Update/Insert after the last Vacuum. This is
accomplished at the cost of some very small overhead to
Inserts/Deletes/Updates.
  Dead space Map is like an undo-log, if we think its role is to get rid
of the older versions of data. Instead of moving the tuples to separate
location, it guides the Vacuum process to do the cleanup task. May be we can
even think of something like Dead space log, which may not be a bitmap. In
this log, transactions might enter their transaction ids and ctids, which
can be scanned by the Vacuum process. While this might take more space, it
is with lesser contention, while compared to Dead space Map. To me, as far
as i can think of, the only advantage of Dead space Map over Dead space log
is the disk space.
 It just strikes me that WAL log is already doing just that. I think you
can follow my thought-line. If we can ask the Vacuum process to scan the WAL
log, it can get all the relevant details on where it needs to go. One
optimization, that can be placed here is to somehow make the archiver do a
double-job of helping the Vacuum, while doing the archiving. For people, who
have switched off archiving, this might not be a benefit.
One main restriction it places on the WAL Logs is that the WAL Log needs
to be archived only after all the transactions in it completes. In other
words, WAL logs need to be given enough space, to survive the longest
transaction of the database. It is possible to avoid this situation by
asking the Vacuum process to take the necessary information out of WAL log
and store it somewhere and wait for the long running transaction to
complete.
The information of interest in WAL is only the table
inserts/updates/deletes. So if everyone accepts that this is a good idea,
till this point, there is a point in reading further.
Ultimately, what has been achieved till now is that we have made the
sequential scans made by the Vacuum process on each table into a few random
i/os. Of course there are optimizations possible to group the random i/os
and find some sequential i/o out of it. But still we need to do a full index
scan for all those indexes out there. HOT might have saved some work over
there. But i am pessimistic here and wondering how it could have been
improved. So it just strikes me, we can do the same thing which we did just
with the tables. Convert a seq scan of the entire table into a random scan
of few blocks. We can read the necessary tuple information from the tuples,
group them and hit at the index in just those blocks and clean it up.
   I can already hear people, saying that it is not always possible to go
back to index from table. There is this culprit called unstable function
based indexes. The structure stops us from going back to index from table.
So currently we should restrict the above said approach to only normal
indexes(not the function based ones). I hope it would still give a good
benefit.
   Of course Vacuum can convert the few random scans into a seq scan, if
required by referring to table statistics.

   Thoughts about the idea

Thanks,
Gokul.

P.S.:  Let the objections/opposing views have a subtle reduction in its
harshness.

Re: [HACKERS] VACUUM FULL out of memory

2008-01-09 Thread Michael Akinde

Thanks for the explanation on the ulimits; I can see how that could turn 
out a problem in some cases.


Following Tom's suggestion, here is the startup script I used:
#!/bin/sh
ulimit -a  $PGHOST/server.ulimit
pg_ctl start -l $PGHOST/server.log

The ulimits seem to be the same, though:
$ cat server.ulimit
core file size  (blocks, -c) 1
data seg size   (kbytes, -d) unlimited
max nice(-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) unlimited
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Regards,

Michael A.

Tom Lane wrote:

Andrew Sullivan [EMAIL PROTECTED] writes:
  

On Tue, Jan 08, 2008 at 05:27:16PM +0100, Michael Akinde wrote:

Those are the ulimits of the db_admin account (i.e., the user that set 
up and runs the DB processes). Is Postgres limited by other settings?
  

On one system I used many years ago, /bin/sh wasn't what I thought it was,
and so the ulimit that I got when logged in was not what the postmaster was
starting under.  Took me many days to figure out what was up.



The only thing I find convincing is to insert ulimit -a someplace
into the script that starts the postmaster, adjacent to where it does
so, and then reboot.  There are too many systems on which daemons are
launched under settings different from what interactive shells use
(a policy that's often a good one, too).

regards, tom lane
  


begin:vcard
fn:Michael Akinde
n:Akinde;Michael
org:Meteorologisk Institutt, Norge;IT
adr;quoted-printable:;;Gaustadall=C3=A9en 30D;Oslo;;0313;Norge
email;internet:[EMAIL PROTECTED]
tel;work:22963379
tel;cell:45885379
x-mozilla-html:FALSE
url:http://www.met.no
version:2.1
end:vcard


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Some ideas about Vacuum

2008-01-09 Thread Markus Schiltknecht


Hi,

Gokulakannan Somasundaram wrote:
If we can ask the Vacuum process to scan 
the WAL log, it can get all the relevant details on where it needs to 
go.


You seem to be assuming that only few tuples have changed between 
vacuums, so that WAL could quickly guide the VACUUM processes to the 
areas where cleaning is necessary.


Let's drop that assumption, because by default, autovacuum_scale_factor 
is 20%, so a VACUUM process normally kicks in after 20% of tuples 
changed (disk space is cheap, I/O isn't). Additionally, there's a 
default nap time of one minute - and VACUUM is forced to take at least 
that much of a nap.


So it's easily possible having more dead tuples, than live ones. In such 
cases, scanning the WAL can easily takes *longer* than scanning the 
table, because the amount of WAL to read would be bigger.


One main restriction it places on the WAL Logs is that the WAL Log 
needs to be archived only after all the transactions in it completes. In 
other words, WAL logs need to be given enough space, to survive the 
longest transaction of the database. It is possible to avoid this 
situation by asking the Vacuum process to take the necessary information 
out of WAL log and store it somewhere and wait for the long running 
transaction to complete.


That would result in even more I/O...

The information of interest in WAL is only the table 
inserts/updates/deletes. So if everyone accepts that this is a good 
idea, till this point, there is a point in reading further.


Well, that's the information of interest, the question is where to store 
that information. Maintaining a dead space map looks a lot cheaper to 
me, than relying on the WAL to store that information.


Ultimately, what has been achieved till now is that we have made the 
sequential scans made by the Vacuum process on each table into a few 
random i/os. Of course there are optimizations possible to group the 
random i/os and find some sequential i/o out of it. But still we need to 
do a full index scan for all those indexes out there. HOT might have 
saved some work over there. But i am pessimistic here and wondering how 
it could have been improved. So it just strikes me, we can do the same 
thing which we did just with the tables. Convert a seq scan of the 
entire table into a random scan of few blocks. We can read the necessary 
tuple information from the tuples, group them and hit at the index in 
just those blocks and clean it up.


Sorry, I don't quite get what you are talking about here. What do 
indexes have to do with dead space? Why not just keep acting on the 
block level?


   I can already hear people, saying that it is not always possible to 
go back to index from table. There is this culprit called unstable 
function based indexes.


No, there's no such thing. Citing [1]: All functions and operators used 
in an index definition must be immutable, that is, their results must 
depend only on their arguments and never on any outside influence.


Of course, you can mark any function IMMUTABLE and get unstable function 
based indexes, but that turns into a giant foot gun very quickly.


P.S.:  Let the objections/opposing views have a subtle reduction in its 
harshness.


I'm just pointing at things that are in conflict with my knowledge, 
assumptions and believes, all which might be erroneous, plain wrong or 
completely mad. ;-)


Regards

Markus

[1]: the Very Fine Postgres Manual on CREATE INDEX:
http://www.postgresql.org/docs/8.3/static/sql-createindex.html

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Simon Riggs

On Sat, 2008-01-05 at 16:30 -0500, Robert Treat wrote:

 I'm not following this.  If we can work out a scheme, I see no reason not to 
 allow a single table to span multiple tablespaces. 

That seems to be something we might want anyway, so yes.

 The difference is that, if I currently have a table split by month, I 
 can re-partition it into weekly segments, and only shuffle one months data 
 at a time minimize impact on the system while I shuffle it. This can even be 
 used to do dynamic management, where data from the current month is archived 
 by day, data from the past year by week, and data beyond that done monthly.   
  

Understood

 On many other databases, if you change the partition scheme, it requires 
 exclusive locks and a shuffleing of all of the data, even data whose 
 partitions arent being redefined.  Even worse are systems like mysql, where 
 you need to rewrite the indexes as well.  To me, these requirements always 
 seem like show stoppers; I generally can't afford to lock a table while the 
 database rewrites a billion rows of data. 

Agreed

 In any case, my thinking is if we had the segment exclusion technique, I 
 could 
 convert that partitioned table into a regular table again, use segment 
 exclusion to handle what is currently handled by partitions, and create 
 a global index across all the other data for that other, currently killer, 
 query. 

Yes, that's what I have in mind.

Can I ask that you produce a gap analysis between what you have now
and what you would have in the future, so we can see what omissions or
errors there are in the segex proposal?

If we had indexes that spanned partitions, would we find that some of
the queries that were producing seq scans will now produce better join
and index plans, do you think?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] LD_LIBRARY_PATH not honored on Debian unstable

2008-01-09 Thread Markus Schiltknecht


Hi,

I'm trying to run 'make check' on a 64bit Debian unstable. That aborts 
after 60 seconds due to not being able to connect to the postmaster.


I figured that there's nothing wrong with the postmaster, rather psql 
can't start up, because it gets linked against an older libpq.so.5. It 
looks like for some reason, it doesn't respect the LD_LIBRARY_PATH env 
variable, see ldd output:


[EMAIL PROTECTED]:/home/markus/projects/pgsql/sources/current/pgsql/src/test/regress# 
LD_LIBRARY_PATH=/home/markus/projects/pgsql/sources/current/pgsql/src/test/regress/./tmp_check/install//usr/lib 
ldd -r -v tmp_check/install/usr/bin/psql

linux-vdso.so.1 =  (0x7fffc2bfe000)
libpq.so.5 = /usr/lib/libpq.so.5 (0x2ac8e81ba000)
libz.so.1 = /usr/lib/libz.so.1 (0x2ac8e83db000)
libreadline.so.5 = /lib/libreadline.so.5 (0x2ac8e8606000)
libcrypt.so.1 = /lib/libcrypt.so.1 (0x2ac8e8846000)
libdl.so.2 = /lib/libdl.so.2 (0x2ac8e8a7e000)
libm.so.6 = /lib/libm.so.6 (0x2ac8e8c82000)
libc.so.6 = /lib/libc.so.6 (0x2ac8e8f04000)
libssl.so.0.9.8 = /usr/lib/libssl.so.0.9.8 (0x2ac8e9262000)
libcrypto.so.0.9.8 = /usr/lib/libcrypto.so.0.9.8 (0x2ac8e94ae000)
libkrb5.so.3 = /usr/lib/libkrb5.so.3 (0x2ac8e983c000)
libcom_err.so.2 = /lib/libcom_err.so.2 (0x2ac8e9ad7000)
libpthread.so.0 = /lib/libpthread.so.0 (0x2ac8e9cd9000)
libncurses.so.5 = /usr/lib/libncurses.so.5 (0x2ac8e9ef5000)
/lib64/ld-linux-x86-64.so.2 (0x2ac8e7f9c000)
libk5crypto.so.3 = /usr/lib/libk5crypto.so.3 (0x2ac8ea132000)
libkrb5support.so.0 = /usr/lib/libkrb5support.so.0 (0x2ac8ea357000)
libkeyutils.so.1 = /lib/libkeyutils.so.1 (0x2ac8ea55f000)
libresolv.so.2 = /lib/libresolv.so.2 (0x2ac8ea761000)
undefined symbol: pg_valid_server_encoding_id 
(tmp_check/install/usr/bin/psql)

undefined symbol: PQconnectionNeedsPassword (tmp_check/install/usr/bin/psql)


Giving it an additional LD_PRELOAD for the newish libpq.5.1 helps:

[EMAIL PROTECTED]:/home/markus/projects/pgsql/sources/current/pgsql/src/test/regress# 

LD_PRELOAD=/home/markus/projects/pgsql/sources/current/pgsql/src/test/regress/tmp_check/install/usr/lib/libpq.so.5.1 
LD_LIBRARY_PATH=/home/markus/projects/pgsql/sources/current/pgsql/src/test/regress/./tmp_check/install//usr/lib 
ldd -r -v tmp_check/install/usr/bin/psql

linux-vdso.so.1 =  (0x7fffe97fe000)
	/home/markus/projects/pgsql/sources/current/pgsql/src/test/regress/tmp_check/install/usr/lib/libpq.so.5.1 
(0x2b69c1547000)

libz.so.1 = /usr/lib/libz.so.1 (0x2b69c1765000)
libreadline.so.5 = /lib/libreadline.so.5 (0x2b69c199)
libcrypt.so.1 = /lib/libcrypt.so.1 (0x2b69c1bd)
libdl.so.2 = /lib/libdl.so.2 (0x2b69c1e08000)
libm.so.6 = /lib/libm.so.6 (0x2b69c200c000)
libc.so.6 = /lib/libc.so.6 (0x2b69c228e000)
libncurses.so.5 = /usr/lib/libncurses.so.5 (0x2b69c25ec000)
/lib64/ld-linux-x86-64.so.2 (0x2b69c1329000)


Somebody have an idea on what's wrong here? Thanks.

Regards

Markus


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] LD_LIBRARY_PATH not honored on Debian unstable

2008-01-09 Thread Andrew Dunstan




Markus Schiltknecht wrote:

Hi,

I'm trying to run 'make check' on a 64bit Debian unstable. That aborts 
after 60 seconds due to not being able to connect to the postmaster.


I figured that there's nothing wrong with the postmaster, rather psql 
can't start up, because it gets linked against an older libpq.so.5. It 
looks like for some reason, it doesn't respect the LD_LIBRARY_PATH env 
variable, see ldd output:


[EMAIL PROTECTED]:/home/markus/projects/pgsql/sources/current/pgsql/src/test/regress# 
LD_LIBRARY_PATH=/home/markus/projects/pgsql/sources/current/pgsql/src/test/regress/./tmp_check/install//usr/lib 
ldd -r -v tmp_check/install/usr/bin/psql



[snip]
undefined symbol: pg_valid_server_encoding_id 
(tmp_check/install/usr/bin/psql)
undefined symbol: PQconnectionNeedsPassword
(tmp_check/install/usr/bin/psql)



Giving it an additional LD_PRELOAD for the newish libpq.5.1 helps:

[EMAIL PROTECTED]:/home/markus/projects/pgsql/sources/current/pgsql/src/test/regress# 

LD_PRELOAD=/home/markus/projects/pgsql/sources/current/pgsql/src/test/regress/tmp_check/install/usr/lib/libpq.so.5.1 
LD_LIBRARY_PATH=/home/markus/projects/pgsql/sources/current/pgsql/src/test/regress/./tmp_check/install//usr/lib 
ldd -r -v tmp_check/install/usr/bin/psql



[snip]

Somebody have an idea on what's wrong here? Thanks.




Smells suspiciously like an rpath problem to me. What are your configure 
settings?


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread A. Kretschmer

am  Wed, dem 09.01.2008, um 17:33:00 +0700 mailte Ilya A. Kovalenko folgendes:
 
 I suggest one more standard date/time operator, to divide one interval
 by another with numeric (or float, for example) result.
 I.e. something like that:
 
 database=# SELECT '5400 seconds'::interval / '1 hour'::interval;
 
  ?column?
 --
   1.5
 (1 row)

test=#  SELECT extract(epoch from '5400 seconds'::interval) / extract(epoch 
from '1 hour'::interval);
 ?column?
--
  1.5


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] LD_LIBRARY_PATH not honored on Debian unstable

2008-01-09 Thread Markus Schiltknecht


Andrew Dunstan wrote:
Smells suspiciously like an rpath problem to me. What are your configure 
settings?


Ah, yeah, I see. Using something else than --prefix=/usr helped.

Thanks for the hint!

Regards

Markus


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Simon Riggs

On Sun, 2008-01-06 at 11:39 +0100, Markus Schiltknecht wrote:

 I think this has to do with SE not being of much use for index scans. 

Hmmm. I think it fits rather neatly with BitmapIndexScans. It would be
easy to apply the index condition and/or filters to see which segments
are excluded and then turn off bits in the bitmap appropriately.

Not fully sure about IndexScans yet. I don't think it would be worth
trying to apply SE until we estimated we would return say 100 rows. It
needs to be able to work without slowing down the common path.

 Or 
 put it another way: SE is an optimization for sequential scans. For 
 tables where it works well, it could possibly replace the index entirely.

True

 Without the index, you would rely on SE to always be able to exclude 
 enough segments, so that the seq scan is less expensive than an index 
 scan with the following table lookups.

It would have to be a very fat index scan for so large a table...

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Simon Riggs

On Mon, 2008-01-07 at 14:20 +0100, Markus Schiltknecht wrote:

 AFAIUI, Segment Exclusion combines perfectly well with 
 clustering. 

Yes, seems like it would be possible to have a segment-aware CLUSTER, so
it was actually usable on large tables. Not planning that initially
though.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Some ideas about Vacuum

2008-01-09 Thread Gokulakannan Somasundaram

 So it's easily possible having more dead tuples, than live ones. In such
 cases, scanning the WAL can easily takes *longer* than scanning the
 table, because the amount of WAL to read would be bigger.


Yes... i made a wrong assumption there..  so the idea  is totally
useless.

Thanks,
Gokul.

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Markus Schiltknecht


Simon Riggs wrote:

Hmmm. I think it fits rather neatly with BitmapIndexScans. It would be
easy to apply the index condition and/or filters to see which segments
are excluded and then turn off bits in the bitmap appropriately.


Yeah, good point.


Not fully sure about IndexScans yet. I don't think it would be worth
trying to apply SE until we estimated we would return say 100 rows. It
needs to be able to work without slowing down the common path.


Yup.

Or 
put it another way: SE is an optimization for sequential scans. For 
tables where it works well, it could possibly replace the index entirely.


True

Without the index, you would rely on SE to always be able to exclude 
enough segments, so that the seq scan is less expensive than an index 
scan with the following table lookups.


It would have to be a very fat index scan for so large a table...


..for SE to be faster than an index scan, you mean? Yes.

Regards

Markus




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Simon Riggs

On Mon, 2008-01-07 at 12:14 +0100, Csaba Nagy wrote:
 On Wed, 2008-01-02 at 17:56 +, Simon Riggs wrote:
  Like it?
 
 Very cool :-)

Thanks. As ever, a distillation of various thoughts, not all mine.

 One additional thought: what about a kind of segment fill factor ?
 Meaning: each segment has some free space reserved for future
 updates/inserts of records in the same range of it's partitioning
 constraint. And when inserting/updating you put the new record into the
 corresponding segment... just like a very coarse clustering. Then you
 could vacuum the segments separately to keep the free space not running
 out. For active segments you would then fix the partitioning constraint
 range once the fill factor is reached, to allow for keeping it's
 constraint even when heavily updating (heavily vacuuming it too as
 response to that), and create a new segment for the unbounded range for
 new inserts... this would work fine for tables where the constraint is
 based on ever increasing keys and accidental inserts in old ranges
 (which do happen occasionally in real life).

Lots of thoughts there, so I'll try to separate them out and analyse.

The way I originally described it is a very simple mechanism and we
could tweak that some more. All ideas welcome.

If we had dynamic segment constraints when the segment was not yet read
only that would lead to changes in the segment constraint for each
INSERT when we have increasing keys. It seems better to set the
constraints once only, when the segment was full, then prevent further
INSERTs. The accidental inserts in old ranges seem like something that
can be avoided if it is a real-world problem.

For UPDATEs on a segment with constraints we might choose to apply the
constraints to see what to do. You might want to allow the UPDATE and
have it stretch the constraints outwards or you might want to prevent it
and throw an error, or you might want to allow the UPDATE, yet migrate
the new tuple to the appropriate segment. 

Dynamic partitioning works in multiple dimensions, so there isn't just
one single valid location for any row. i.e. if we update a have a row
with OrderId, OrderDate, RequiredByDate, ShipDate and LastModifiedDate
on it, we'll probably expand the constraints on at least one of those.
If we were lucky enough to have only changed one of those it might turn
out there was *in that case* a single more sensible location for the new
tuple, but that probably isn't the common case.

So the likely best behaviour for UPDATEs is to try to keep the new row
version in the same segment, then change the constraints. 

The segment constraints concept and the read only concept were linked.
You're right we could separate them, thought that turns out not to be
that desirable. When we do an DELETE or an UPDATE we don't know whether
the deleted row version was the last tuple with that particular boundary
value. So we don't know whether the DELETE or UPDATE changes the
constraints or not, and however we try to avoid it, we'll probably need
to recalc the constraints in some circumstance. So updating the
constraints dynamically isn't anywhere near as easy as it sounds and
ultimately probably isn't worth the effort. So thats why constraints and
read-only go together.

HOT allows us to keep the new row version within the segment, in many
cases. What might be worth doing is marking the FSM space for that
segment as update-only to exclude inserts from using it, then forcing
UPDATEs to stay within the segment if possible by providing the current
block number in each call to the FSM. That would also mean that every
UPDATE that wants to do a multi-block update on a currently read-only
segment would need to call the FSM. Sounds good, but that would only
work for the first UPDATE on a segment after it is marked read only,
which isn't much use, or we would do it for *every* block-spanning
UPDATE, which would cause contention in other use cases.

So although I'm willing to listen and tweak, that hasn't yet resulted in
any additional design points, unless I missed something above?

 When the change rate of old segments get down, the segments could be
 reorganized to have a smaller fill factor, so that you still allow for
 accidental updates but keep space usage efficient. This would be some
 similar action as a clustering, but hopefully not blocking (which might
 be a hard thing to do)... and later again you could mark some of the
 really old things as read only and put them in special segments with no
 wasted space.

Yes, hopefully marking read only and compressed segments is a possible
future.

 One problem would be when the segment's free space runs out, so you must
 put records from the same constraint range in multiple segments - but
 that could still work, you just would have multiple segments covering
 the same range, but if the segment fill factor is chosen properly it
 should not be the case... you could normally maintain a set of
 non-overlapping segments in terms of the

Re: [HACKERS] Some notes about the index-functions security vulnerability

2008-01-09 Thread Simon Riggs

On Wed, 2008-01-09 at 00:22 -0500, Tom Lane wrote:
 pgsql-core wasted quite a lot of time

Core's efforts are appreciated by all, so not time wasted.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Simon Riggs

On Sat, 2008-01-05 at 16:42 +0100, Markus Schiltknecht wrote:

 Simon Riggs wrote:
   On Fri, 2008-01-04 at 22:26 +0100, Markus Schiltknecht wrote:
  
   I'm still puzzled about how a DBA is expected to figure out which
   segments to mark. Simon, are you assuming we are going to pass on
   segment numbers to the DBA one day?
  
   No Way!
 
 Ah, I'm glad ;-)

But now you want names on partitions?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Simon Riggs

On Wed, 2008-01-09 at 02:25 +, Gregory Stark wrote:
 Markus Schiltknecht [EMAIL PROTECTED] writes:
 
  There are two very distinct ways to handle partitioning. For now, I'm 
  calling
  them named and unnamed partitioning. 

 The naming is precisely the useful part in that it is how the DBA associates
 the properties with chunks of data.

Why does giving something a name help partition exclusion?

 Without naming the DBA would have to specify the same ranges every time he
 wants to change the properties. He might do a SET read_only WHERE created_on
  '2000-01-01' one day then another SET tablespace tsslow WHERE created_on 
 '2000-01-01' and then later again do SET offline WHERE created_on 
 '2000-01-01'
 
 I have to admit I always found it kludgy to have objects named
 invoices_2000_JAN and invoices_2000_FEB and so on. It's kind of an meta
 denormalization. But so is specifying where clauses repeatedly.

The idea for using the WHERE clauses was to specifically avoid naming.

In most cases the table is divided into old read only and newer data. So
there is one split point that make it easy to use a simple WHERE clause.

If you guys really want names, we can have names, but I think I want to
see a case where the storage characteristics of the table are so complex
we can only make sense of it by naming particular chunks.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Simon Riggs

Gavin and all,

This is quite a long reply, so apologies for that.

On Wed, 2008-01-09 at 07:28 +0100, Gavin Sherry wrote:
 On Wed, Jan 02, 2008 at 05:56:14PM +, Simon Riggs wrote:
  This technique would be useful for any table with historical data keyed
  by date or timestamp. It would also be useful for data where a
  time-of-insert component is implicit, such as many major entity tables
  where the object ids are assigned by a sequence. e.g. an Orders table
  with an OrderId as PK. Once all orders placed in a period have been
  shipped/resolved/closed then the segments will be marked read-only.
  

 A novel approach to the problem. For me, this proposal addresses some
 of the other problems in postgres (visibility in the heap vs. index)
 rather than the problems with partitioning itself.

I think you're right that it isn't attempting to address the problems
with partitioning directly, it is attempting to address the core use
case itself. I think we have an opportunity to bypass the
legacy-of-thought that Oracle has left us and implement something more
usable.

There are some low-level technical issues associated with declarative
partitioning that I'll address last, which are in many ways the big
reason to want to go to non-declarative partitioning.

 It might seem otherwise, but for me partitioning is tool for managing
 large volumes of data. It allows the user to encode what they know about
 the nature of their data, and that's often about management. In this
 way, your proposal actually makes partitioning too smart: the user wants
 to tell the system how to organise the data, as opposed to the other way
 around.

 At Greenplum, we've been discussing this in depth. Interestingly, we
 also felt that the storage layer could be much smarter with large tables
 with predictable layouts and predictable data patterns. But the thing
 is, people with large tables like partitioning, putting different
 partitions on different storage; they like the ability to merge and
 split partitions; they like truncating old partitions. In a word, they
 seem to like the managability partitions give them -- as well as the
 performance that comes with this.

Do people really like running all that DDL? There is significant
manpower cost in implementing and maintaining a partitioning scheme,
plus significant costs in getting it wrong. 

If people with large tables like partitioning why is Oracle moving
towards automated partitioning in 11g? Automated partitioning was one of
the major goals for this next set of Postgres partitioning functionality
also, whether or not we have declarative partitioning. My thinking is
lets go for the best ideas and skip over the stuff that will be (is)
obsolete before its left the design stage.

I see many more systems in the Terabyte range now than I did 10 years
ago, but I see about the same number of DBAs. We'll always need experts,
but I feel we should be using our expertise to simplify the standard
cases, not just maintain the same level of difficulty in managing them.
One of the big benefits of the dynamic partitioning approach is that it
needs no DDL. So it will work out-of-the-box, for anybody.

Deleting older data would be optimised under the proposed scheme, so
that's not really a problem. Loading data is actually slightly harder
and slower with declarative partitioning (see below).

Merging and splitting partitions are tools for fine tuning a very
complex partitioning scheme. They do also allow a non-linear
segmentation scheme, which might aid performance in some cases.

(On a different note, I'm strongly in favour of a declarative approach
to other optimizer information to allow the DBA to say what they know
about how a table will be used. But that's off-topic for now.)

One major advantage of the dynamic approach is that it can work on
multiple dimensions simultaneously, which isn't possible with
declarative partitioning. For example if you have a table of Orders then
you will be able to benefit from Segment Exclusion on all of these
columns, rather than just one of them: OrderId, OrderDate,
RequiredByDate, LastModifiedDate. This will result in some sloppiness
in the partitioning, e.g. if we fill 1 partition a day of Orders, then
the OrderId and OrderData columns will start out perfectly arranged. Any
particular RequiredByDate will probably be spread out over 7 partitions,
but thats way better than being spread out over 365+ partitions.

When we look at the data in the partition we can look at any number of
columns. When we declaratively partition, you get only one connected set
of columns, which is one of the the reasons you want multi-dimensional
partitioning in the first place.

 To this end, we (well, Jeff Cohen) looked at the syntax and semantics of
 partitining in leading databases (Oracle, Informix, DB2) and came up
 with a highly expressive grammar which takes the best of each I think
 (I'll post details on the grammar in a seperate thread). The idea is
 that range (for

Re: [HACKERS] Some ideas about Vacuum

2008-01-09 Thread Gregory Stark


Markus Schiltknecht [EMAIL PROTECTED] writes:

 Hi,

 Gokulakannan Somasundaram wrote:
 If we can ask the Vacuum process to scan the WAL log, it can get all the
 relevant details on where it needs to go.

That's an interesting thought. I think your caveats are right but with some
more work it might be possible to work it out. For example if a background
process processed the WAL and accumulated an array of possibly-dead tuples to
process in batch. It would wait whenever it sees an xid which isn't yet past
globalxmin, and keep accumulating until it has enough to make it worthwhile
doing a pass.

I think a bigger issue with this approach is that it ties all your tables
together. You can't process one table frequently while some other table has
some long-lived deleted tuples.

I'm also not sure it really buys us anything over having a second
dead-space-map data structure. The WAL is much larger and serves other
purposes which would limit what we can do with it.

 You seem to be assuming that only few tuples have changed between vacuums, so
 that WAL could quickly guide the VACUUM processes to the areas where cleaning
 is necessary.

 Let's drop that assumption, because by default, autovacuum_scale_factor is 
 20%,
 so a VACUUM process normally kicks in after 20% of tuples changed (disk space
 is cheap, I/O isn't). Additionally, there's a default nap time of one minute -
 and VACUUM is forced to take at least that much of a nap.

I think this is exactly backwards. The goal should be to improve vacuum, then
adjust the autovacuum_scale_factor as low as we can. As vacuum gets cheaper
the scale factor can go lower and lower. We shouldn't allow the existing
autovacuum behaviour to control the way vacuum works.

As a side point, disk is cheap, I/O isn't is a weird statement. The more
disk you use the more I/O you'll have to do to work with the data. I still
maintain the default autovacuum_scale_factor is *far* to liberal. If I had my
druthers it would be 5%. But that's mostly informed by TPCC experience, in
real life the actual value will vary depending on the width of your records
and the relative length of your transactions versus transaction rate. The TPCC
experience is with ~ 400 byte records and many short transactions.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PATCHES] [HACKERS] Archiver behavior at shutdown

2008-01-09 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 Not sure why this hasn't being applied yet for 8.3

Because it doesn't fix the problem ... which is that the postmaster
kills the archiver (and the stats collector too) at what is now the
wrong point in the shutdown sequence. 

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Markus Schiltknecht


Simon Riggs wrote:

I have to admit I always found it kludgy to have objects named
invoices_2000_JAN and invoices_2000_FEB and so on. It's kind of an meta
denormalization. But so is specifying where clauses repeatedly.


The idea for using the WHERE clauses was to specifically avoid naming.


I understand, and I'm all for avoiding needless, kludgy names.

As I pointed out, knowledge of split points might be important for the 
database system. Maybe we can store the split point without the need for 
names? Dunno.



If you guys really want names, we can have names, but I think I want to
see a case where the storage characteristics of the table are so complex
we can only make sense of it by naming particular chunks.


Well, assuming you only have to deal with one split point, that's 
certainly true. However, there are people using more than two table 
spaces, thus obviously needing more split points.


Can we name the split points, rather than the partitions?

Regards

Markus


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Tom Lane

Ilya A. Kovalenko [EMAIL PROTECTED] writes:
 I suggest one more standard date/time operator, to divide one interval
 by another with numeric (or float, for example) result.

You'd have to define exactly what that means, which seems a little
tricky for incommensurate intervals.  For instance what is the
result of '1 month' / '1 day' ?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PATCHES] [HACKERS] Archiver behavior at shutdown

2008-01-09 Thread Simon Riggs

On Wed, 2008-01-09 at 10:15 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Not sure why this hasn't being applied yet for 8.3
 
 Because it doesn't fix the problem ... which is that the postmaster
 kills the archiver (and the stats collector too) at what is now the
 wrong point in the shutdown sequence. 

The original bug report states the problem as being that the archiver
stays for a noticeable period after postmaster shutdown. My patch fixes
that very safely.

It doesn't fix your request for redesign, which I accept is still
pending and I've explained why.

I don't see any reason to leave the original problem hanging just
because the fix isn't as wide as we might really like.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Simon Riggs

On Wed, 2008-01-09 at 16:20 +0100, Markus Schiltknecht wrote:
 Simon Riggs wrote:
  I have to admit I always found it kludgy to have objects named
  invoices_2000_JAN and invoices_2000_FEB and so on. It's kind of an meta
  denormalization. But so is specifying where clauses repeatedly.
  
  The idea for using the WHERE clauses was to specifically avoid naming.
 
 I understand, and I'm all for avoiding needless, kludgy names.
 
 As I pointed out, knowledge of split points might be important for the 
 database system. Maybe we can store the split point without the need for 
 names? Dunno.
 
  If you guys really want names, we can have names, but I think I want to
  see a case where the storage characteristics of the table are so complex
  we can only make sense of it by naming particular chunks.
 
 Well, assuming you only have to deal with one split point, that's 
 certainly true. However, there are people using more than two table 
 spaces, thus obviously needing more split points.
 
 Can we name the split points, rather than the partitions?

So far, I've been looking at partition exclusion as the most important
feature for the VLDB use case.

You seem to have moved straight on to what I've regarded as later
features for partitioning. From my side, if I can't make SE work then
most of the other features seem moot, even though I personally regard
them as important also.

With that in mind, can I clarify what you're thinking, please?

1) the things you've been discussing are so important I should do them
first, which would necessarily require named chunks of tables

2) the things you've been discussing are essential requirements of
partitioning and we could never consider it complete until they are also
included and we must therefore talk about them now to check that its all
possible before we do anything on SE

3) doing SE first is right, I'm just thinking ahead

4) the topics aren't really linked and I'm suggesting doing development
on them in parallel

or...

Sorry if that seems blunt, I'm just not clear where we're going. I have
to think about implementability, planning and priorities if I'm to get
it done. 

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Simon Riggs

On Wed, 2008-01-09 at 15:53 +, Gregory Stark wrote:
 Simon Riggs [EMAIL PROTECTED] writes:

 Perhaps a good analogy is indexes. Index names are themselves kind of
 redundant and people usually use names which encode up most of the information
 of the definition.
 
 But the reason you need names for indexes is so that you can refer to them
 later to drop them, rebuild them, change their properties such as tablespace,
 fill factor, etc?
 
 You could imagine imposing on users that they should restate the index
 definition every time they want to change the fill factor or tablespace but
 I'm sure you could see the downsides with that approach.

Which is exactly what we do with DROP FUNCTION...

You'd run these things as often as you run ALTER TABLE SET tablespace,
so it doesn't seem a problem.

When I delete all rows WHERE some_date  'cut-off date' on a segment
boundary value that would delete all segments that met the criteria. The
following VACUUM will then return those segments to be read-write, where
they can then be refilled with new incoming data. The only command we
would have to run is the DELETE, everything else is automatic.

If we have named chunks, then you'd have to specifically reset the
boundary conditions on the named chunk after deletion before the chunk
could be reused. That all becomes DDL, which means additional code to be
written, bugs to be fixed, as well as the table locking required. Seems
like a lot just for some occasional convenience.

So not convinced of the need for named sections of tables yet. It all
seems like detail, rather than actually what we want for managing large
tables.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Some ideas about Vacuum

2008-01-09 Thread Simon Riggs

On Wed, 2008-01-09 at 15:10 +, Gregory Stark wrote:

 The goal should be to improve vacuum, then
 adjust the autovacuum_scale_factor as low as we can. As vacuum gets
 cheaper the scale factor can go lower and lower. We shouldn't allow
 the existing autovacuum behaviour to control the way vacuum works. 

Very much agreed.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Gregory Stark

Simon Riggs [EMAIL PROTECTED] writes:

 On Wed, 2008-01-09 at 02:25 +, Gregory Stark wrote:

 Without naming the DBA would have to specify the same ranges every time he
 wants to change the properties. He might do a SET read_only WHERE created_on
  '2000-01-01' one day then another SET tablespace tsslow WHERE created_on 
 
 '2000-01-01' and then later again do SET offline WHERE created_on 
 '2000-01-01'
 
 I have to admit I always found it kludgy to have objects named
 invoices_2000_JAN and invoices_2000_FEB and so on. It's kind of an meta
 denormalization. But so is specifying where clauses repeatedly.

 The idea for using the WHERE clauses was to specifically avoid naming.

 In most cases the table is divided into old read only and newer data. So
 there is one split point that make it easy to use a simple WHERE clause.

 If you guys really want names, we can have names, but I think I want to
 see a case where the storage characteristics of the table are so complex
 we can only make sense of it by naming particular chunks.

Perhaps a good analogy is indexes. Index names are themselves kind of
redundant and people usually use names which encode up most of the information
of the definition.

But the reason you need names for indexes is so that you can refer to them
later to drop them, rebuild them, change their properties such as tablespace,
fill factor, etc?

You could imagine imposing on users that they should restate the index
definition every time they want to change the fill factor or tablespace but
I'm sure you could see the downsides with that approach.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Some ideas about Vacuum

2008-01-09 Thread Markus Schiltknecht


Hi,

Gregory Stark wrote:

That's an interesting thought. I think your caveats are right but with some
more work it might be possible to work it out. For example if a background
process processed the WAL and accumulated an array of possibly-dead tuples to
process in batch. It would wait whenever it sees an xid which isn't yet past
globalxmin, and keep accumulating until it has enough to make it worthwhile
doing a pass.


I don't understand why one would want to go via the WAL, that only 
creates needless I/O. Better accumulate the data right away, during the 
inserts, updates and deletes.  Spilling the accumulated data to disk, if 
absolutely required, would presumably still result in less I/O.



I think a bigger issue with this approach is that it ties all your tables
together. You can't process one table frequently while some other table has
some long-lived deleted tuples.


Don't use the WAL as the source of that information and that's issue's gone.


I'm also not sure it really buys us anything over having a second
dead-space-map data structure. The WAL is much larger and serves other
purposes which would limit what we can do with it.


Exactly.


You seem to be assuming that only few tuples have changed between vacuums, so
that WAL could quickly guide the VACUUM processes to the areas where cleaning
is necessary.

Let's drop that assumption, because by default, autovacuum_scale_factor is 20%,
so a VACUUM process normally kicks in after 20% of tuples changed (disk space
is cheap, I/O isn't). Additionally, there's a default nap time of one minute -
and VACUUM is forced to take at least that much of a nap.


I think this is exactly backwards. The goal should be to improve vacuum, then
adjust the autovacuum_scale_factor as low as we can. As vacuum gets cheaper
the scale factor can go lower and lower.


But you can't lower it endlessly, it's still a compromise, because it 
also means reducing the amount of tuples being cleaned per scan, which 
is against the goal of minimizing overall I/O cost of vacuuming.



We shouldn't allow the existing
autovacuum behaviour to control the way vacuum works.


That's a point.


As a side point, disk is cheap, I/O isn't is a weird statement. The more
disk you use the more I/O you'll have to do to work with the data.


That's only true, as long as you need *all* your data to work with it.


I still
maintain the default autovacuum_scale_factor is *far* to liberal. If I had my
druthers it would be 5%. But that's mostly informed by TPCC experience, in
real life the actual value will vary depending on the width of your records
and the relative length of your transactions versus transaction rate. The TPCC
experience is with ~ 400 byte records and many short transactions.


Hm.. 5% vs 20% would mean 4x as many vacuum scans, but only a 15% growth 
in size (105% vs 120%), right? Granted, those 15% are also taken from 
memory and caches, resulting in additional I/O...  Still these numbers 
are surprising me. Or am I missing something?


Regards

Markus


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] OUTER JOIN performance regression remains in 8.3beta4

2008-01-09 Thread Gregory Stark

Alvaro Herrera [EMAIL PROTECTED] writes:

 Tom Lane wrote:

 Comparing the behavior of this to my patch for HEAD, I am coming to the
 conclusion that this is actually a *better* planning method than
 removing the redundant join conditions, even when they're truly
 rendundant!  The reason emerges as soon as you look at cases involving
 more than a single join.  If we strip the join condition from just one
 of the joins, then we find that the planner insists on doing that join
 last, whether it's a good idea or not, because clauseful joins are
 always preferred to clauseless joins in the join search logic.

 Would it be a good idea to keep removing redundant clauses and rethink
 the preference for clauseful joins, going forward?

I don't understand what's going on here. The planner is choosing one join
order over another because one join has more join clauses than the other? Even
though some of those joins are entirely redundant and have no selectivity?
That seems like a fortuitous choice made on entirely meaningless data.

Is there some other source of data we could use to make this decision instead
of the number of clauses? I would suggest the selectivity but from the sound
of it that's not going to help at all.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Markus Schiltknecht


Hi,

Simon Riggs wrote:

With that in mind, can I clarify what you're thinking, please?


Sure, I can try to clarify:


2) the things you've been discussing are essential requirements of
partitioning and we could never consider it complete until they are also
included and we must therefore talk about them now to check that its all
possible before we do anything on SE


I thought so, but am slowly dropping that point of view. In favor of 
something like: hey, if you manage to do it all automatically, cool, go 
for it!



3) doing SE first is right, I'm just thinking ahead


Yes, SE certainly has merit. Combine it with some sort of maintained 
CLUSTERing order and it's worth doing, IMO.


I'm not convinced about dynamic partitioning being able to generally 
replace explicit partitioning anytime soon.



Sorry if that seems blunt, I'm just not clear where we're going.


Well, implicit or automatic partitioning is still a pretty new concept 
to me, but I'm slowly beginning to like it. Thank you for pointing me at it.


Regards

Markus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] OUTER JOIN performance regression remains in 8.3beta4

2008-01-09 Thread Tom Lane

Gregory Stark [EMAIL PROTECTED] writes:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 Would it be a good idea to keep removing redundant clauses and rethink
 the preference for clauseful joins, going forward?

 I don't understand what's going on here. The planner is choosing one join
 order over another because one join has more join clauses than the
 other?

Not more join clauses, but any join clause at all.  We will not explore
join paths that don't have any join clause, unless forced to by lack of
any other way to form the result.

 Even
 though some of those joins are entirely redundant and have no selectivity?

You're confusing whether we explore a path (ie, cost it out) with
whether we choose it.  It's a necessary precondition, of course,
but we won't pick the path unless it looks cheapest.

Not exploring clauseless join paths is a heuristic that's needed to
avoid exponential growth of the search space in large join problems.
AFAIK every System-R-derived planner has done this.

As an example, consider
t1 join t2 on (...) join t3 on (...) ... join t8 on (...)
and for simplicity suppose that each ON condition relates the new
table to the immediately preceding table, and that we can't derive
any additional join conditions through transitivity.  In this situation
there are going to be only seven ways to form a two-base-relation
joinrel, as long as we allow only clauseful joins.  But there are
8*7/2 = 28 distinct ways to form a join if we consider all possible
join pairs whether they have a join clause or not.  At the
three-base-relation level there will be 12 joinrels if we only consider
clauseful pairs, or 56 if we don't.  It gets worse as you go up, and
most if not all of those additional joinrels represent entirely useless
variations on the theme of let's stupidly compute a cartesian product
and then winnow it sometime later.

This is not to say that there is never a case where an early cartesian
product couldn't be a useful part of a plan, but rejecting them is a
darn good heuristic.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Chris Browne

[EMAIL PROTECTED] (Simon Riggs) writes:
 I think we have an opportunity to bypass the legacy-of-thought that
 Oracle has left us and implement something more usable.

This seems like a *very* good thing to me, from a couple of
perspectives.

1.  I think you're right on in terms of the issue of the cost of
running all that DDL in managing partitioning schemes.

When I was working as DBA, I was decidedly *NOT* interested in
doing a lot of low level partition management work, and those that
are in that role now would, I'm quite sure, agree that they are
not keen on spending a lot of their time trying to figure out what
tablespace to shift a particular table into, or what tablespace
filesystem to get sysadmins to set up.

2.  Blindly following what Oracle does has always been a dangerous
sort of thing to do.

There are two typical risks:

  a) There's always the worry that they may have patented some
 part of how they implement things, and if you follow too
 closely, There Be Dragons...

  b) They have enough billion$ of development dollar$ and
 development re$ource$ that they can follow strategies that
 are too expensive for us to even try to follow.

3.  If, rather than blindly following, we create something at least
quasi-new, there is the chance of doing fundamentally better.

This very thing happened when it was discovered that IBM had a
patent on the ARC cacheing scheme; the clock system that emerged
was a lot better than ARC ever was.

 One major advantage of the dynamic approach is that it can work on
 multiple dimensions simultaneously, which isn't possible with
 declarative partitioning. For example if you have a table of Orders then
 you will be able to benefit from Segment Exclusion on all of these
 columns, rather than just one of them: OrderId, OrderDate,
 RequiredByDate, LastModifiedDate. This will result in some sloppiness
 in the partitioning, e.g. if we fill 1 partition a day of Orders, then
 the OrderId and OrderData columns will start out perfectly arranged. Any
 particular RequiredByDate will probably be spread out over 7 partitions,
 but thats way better than being spread out over 365+ partitions.

I think it's worth observing both the advantages and demerits of this
together.

In effect, with the dynamic approach, Segment Exclusion provides its
benefits as an emergent property of the patterns of how INSERTs get
drawn into segments.

The tendancy will correspondly be that Segment Exclusion will be able
to provide useful constraints for those patterns that can naturally
emerge from the INSERTs.

We can therefore expect useful constraints for attributes that are
assigned in some kind of more or less chronological order.  Such
attributes will include:

 - Object ID, if set by a sequence
 - Processing dates

There may be a bit of sloppiness, but the constraints may still be
useful enough to exclude enough segments to improve efficiency.

_On The Other Hand_, there will be attributes that are *NOT* set in a
more-or-less chronological order, and Segment Exclusion will be pretty
useless for these attributes.  

In order to do any sort of Exclusion for non-chronological
attributes, it will be necessary to use some mechanism other than the
patterns that fall out of natural chronological insertions.  If you
want exclusion on such attributes, then there needs to be some sort of
rule system to spread such items across additional partitions.  Mind
you, if you do such, that will weaken the usefulness of Segment
Exclusion.  For instance, suppose you have 4 regions, and scatter
insertions by region.  In that case, there will be more segments that
overlap any given chronological range.

 When we look at the data in the partition we can look at any number of
 columns. When we declaratively partition, you get only one connected set
 of columns, which is one of the the reasons you want multi-dimensional
 partitioning in the first place.

Upside: Yes, you get to exclude based on examining any number of
columns.

Downside: You only get the exclusions that are emergent properties
of the data...

The more I'm looking at the dynamic approach, the more I'm liking
it...
-- 
cbbrowne,@,cbbrowne.com
http://linuxfinances.info/info/linuxxian.html
Feel free  to contribute build  files.  Or work on  your motivational
skills, and maybe someone somewhere will write them for you...
-- Fredrik Lundh [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Chris Browne

[EMAIL PROTECTED] (Markus Schiltknecht) writes:
 Simon Riggs wrote:
 With that in mind, can I clarify what you're thinking, please?

 Sure, I can try to clarify:

 2) the things you've been discussing are essential requirements of
 partitioning and we could never consider it complete until they are also
 included and we must therefore talk about them now to check that its all
 possible before we do anything on SE

 I thought so, but am slowly dropping that point of view. In favor of
 something like: hey, if you manage to do it all automatically, cool,
 go for it!

 3) doing SE first is right, I'm just thinking ahead

 Yes, SE certainly has merit. Combine it with some sort of maintained
 CLUSTERing order and it's worth doing, IMO.

My suspicion is that if this gets added in with maintained CLUSTER
order, we *lose* all of the exclusions aside from the ones directly
established by the CLUSTER order.

That is, the CLUSTER ordering winds up preventing other natural
patterns from emerging, with the result that SE winds up being of
pretty limited usefulness.

 I'm not convinced about dynamic partitioning being able to generally
 replace explicit partitioning anytime soon.

It also seems to me that explicit partitioning would make this form of
dynamic partitioning less useful.

Suppose there are 4 more or less uniformly used partitions; if you're
splitting the data evenly across 4x the partitions, then that means
that each segment will tend to have ranges ~4x as wide, which makes SE
rather less of a win.  (Relax the assumption of uniform
distributions, and that just changes the weights...)
-- 
output = reverse(ofni.sesabatadxunil @ enworbbc)
http://cbbrowne.com/info/advocacy.html
It seems that perfection is attained not when nothing is left to add,
but when nothing is left to be taken away.
-- Antoine de Saint-Exupery.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Simon Riggs

On Wed, 2008-01-09 at 17:30 +0100, Markus Schiltknecht wrote:

 Simon Riggs wrote:
  With that in mind, can I clarify what you're thinking, please?
 
 Sure, I can try to clarify:
 
  2) the things you've been discussing are essential requirements of
  partitioning and we could never consider it complete until they are also
  included and we must therefore talk about them now to check that its all
  possible before we do anything on SE
 
 I thought so, but am slowly dropping that point of view. In favor of 
 something like: hey, if you manage to do it all automatically, cool, go 
 for it!
 
  3) doing SE first is right, I'm just thinking ahead
 
 Yes, SE certainly has merit. Combine it with some sort of maintained 
 CLUSTERing order and it's worth doing, IMO.
 
 I'm not convinced about dynamic partitioning being able to generally 
 replace explicit partitioning anytime soon.

In all cases, no. But do you think it would work well for the specific
databases you've used partitioning on? Would it be possible to check? 

  Sorry if that seems blunt, I'm just not clear where we're going.
 
 Well, implicit or automatic partitioning is still a pretty new concept 
 to me, but I'm slowly beginning to like it. Thank you for pointing me at it.

OK, thanks. I'll write up what I've learned in last few days into a new
version of the proposal and put it on the Wiki.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Markus Schiltknecht


Hi,

Simon Riggs wrote:

When I delete all rows WHERE some_date  'cut-off date' on a segment
boundary value that would delete all segments that met the criteria. The
following VACUUM will then return those segments to be read-write, where
they can then be refilled with new incoming data. The only command we
would have to run is the DELETE, everything else is automatic.


Agreed, that would be very nice.


So not convinced of the need for named sections of tables yet. It all
seems like detail, rather than actually what we want for managing large
tables.


What do you think about letting the database system know the split point 
vs it having to find optimal split points automatically?


Read-write vs. read-only is as good start, but can that concept be 
expanded to automatically choosing hash partitioning between storage 
systems, for example? Or more generally: can the database system gather 
enough information about the storage systems to take a decision as good 
as or better than the DBA?


Regards

Markus


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Simon Riggs

On Wed, 2008-01-09 at 18:04 +0100, Markus Schiltknecht wrote:

  So not convinced of the need for named sections of tables yet. It all
  seems like detail, rather than actually what we want for managing large
  tables.
 
 What do you think about letting the database system know the split point 
 vs it having to find optimal split points automatically?

For me, managing the table's files can be separate from the chunking
that allows partition exclusion.

Managing the table's files must be a manual operation. We can't infer
the presence of a new tablespace etc.. Those files would need less than
10 zones or chunks, usually just one.

The chunking to allow partition exclusion can still be automatic,
allowing a much finer grain of partition.

If we restrict the actions allowed to be just
- mark read-only

then for read-only segments (only)
- migrate tablespaces 
- compress read-only segments
- mark as off-line (not fully convinced we need this yet)

then it seems straightforward to allow this to occur by a WHERE clause
only, since the constraints on a segment will be fixed in place when it
is read-only. This also allows those operations to take place without
holding locks for any length of time, since they are all just metadata
ops or copying read only data to another place. 

The WHERE clause approach might easily allow more than 2 chunks and they
need not be logically contiguous. So the phrase split point doesn't
really fit because it implies a one dimensional viewpoint, but I'm happy
for you to give it a name.

If we want to perform manipulations on non-read-only chunks then we need
named or numbered partitions, locking, DDL etc.. That seems like too
much functionality for what we really need. I really am still open on
that point, but I would like to see a good description of a use case
that really needs it, rather than just saying of course we do. Which
is exactly where *I* started, even as recently as 3 weeks ago now.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] OUTER JOIN performance regression remains in 8.3beta4

2008-01-09 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 As an example, consider
   t1 join t2 on (...) join t3 on (...) ... join t8 on (...)
 and for simplicity suppose that each ON condition relates the new
 table to the immediately preceding table, and that we can't derive
 any additional join conditions through transitivity.  

So the problem is that if we happen to have some x=const clause for any
variable listed in those join clauses then we drop that clause entirely and
end up delaying that join until the very end?

So is the fact that the user provided a useless clause the only information we
have that these tables might be related?

So if I write (along with some other joins):

 t1 join t2 on (t1.x=t2.x) where t1.x=3

I'll get a different result than if I write

 t1, t2 where t1.x=3 and t2.x=3

?

Perhaps we could be going the other direction and trying to add redundant
selectivity 1.0 clauses when we have multiple variables which come out to the
same value?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Problem with CVS HEAD's handling of mergejoins

2008-01-09 Thread Tom Lane

I wrote:
 A perhaps less invasive idea is to discard any proposed mergeclauses
 that are redundant in this sense.  This would still require some
 reshuffling of responsibility between select_mergejoin_clauses and
 the code in pathkeys.c, since right now select_mergejoin_clauses
 takes no account of that.  However, I'm worried that that might result
 in planner failure on some FULL JOIN cases that work today, since we
 require all the join clauses to be mergejoinable for a FULL JOIN.
 People seem to complain when the planner fails, even for really
 stupid queries ;-).  I think this would only be acceptable if we can
 prove that ignoring clauses that are redundant in this sense doesn't
 change the result --- which might be the case, but I'm not sure.

On further study, this doesn't seem nearly as bad as it looked.  I had
hastily misread some of the existing code as assuming one-for-one
correspondence of mergeclauses and pathkeys, but actually it just
assumes that the pathkeys list contains at least one pathkey matching
each mergeclause.  Redundancy between two pathkeys in a list is
eliminated by allowing adjacent mergeclauses to share the same pathkey.
So that part actually all works, and the only problem is when an
equivalence class is redundant-for-sorting in itself --- that is,
when one of the eclass members is a constant.  So that explains why
we'd not seen it before; except in very odd corner cases, the old code
would have eliminated the mergejoinable clause anyway.

If we simply make select_mergejoin_clauses() reject a clause as
not-mergejoinable if either side is equated to a constant, then
the problems go away.  We'd have a new problem if this meant that
we fail to do a FULL JOIN, but AFAICS that can't happen: a variable
coming from a full join can't be equivalenced to a constant, except
perhaps in a below_outer_join eclass, which isn't considered redundant
for sorting anyway.

Bottom line is that it just takes another half dozen lines of code
to fix this; attached is the core of the fix (there are some
uninteresting prototype changes and macro-moving as well).

 I think I can fix this in a day or so, but I now definitely feel that
 we'll need an RC2 :-(

I no longer think that about this problem, but we've still got some
nasty-looking issues in xml.c, plus Hannes Dorbath's unsolved reports
of GIST/GIN problems ...

regards, tom lane

  
+   /*
+* Insist that each side have a non-redundant eclass.  This
+* restriction is needed because various bits of the planner 
expect
+* that each clause in a merge be associatable with some 
pathkey in a
+* canonical pathkey list, but redundant eclasses can't appear 
in
+* canonical sort orderings.  (XXX it might be worth relaxing 
this,
+* but not enough time to address it for 8.3.)
+*
+* Note: it would be bad if this condition failed for an 
otherwise
+* mergejoinable FULL JOIN clause, since that would result in
+* undesirable planner failure.  I believe that is not possible
+* however; a variable involved in a full join could only appear
+* in below_outer_join eclasses, which aren't considered 
redundant.
+*
+* This *can* happen for left/right join clauses, however: the
+* outer-side variable could be equated to a constant.  
(Because we
+* will propagate that constant across the join clause, the 
loss of
+* ability to do a mergejoin is not really all that big a deal, 
and
+* so it's not clear that improving this is important.)
+*/
+   cache_mergeclause_eclasses(root, restrictinfo);
+ 
+   if (EC_MUST_BE_REDUNDANT(restrictinfo-left_ec) ||
+   EC_MUST_BE_REDUNDANT(restrictinfo-right_ec))
+   {
+   have_nonmergeable_joinclause = true;
+   continue;   /* can't handle 
redundant eclasses */
+   }
+ 
result_list = lappend(result_list, restrictinfo);
}
  

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] tzdata issue on cross-compiled postgresql

2008-01-09 Thread Tim Yardley

Hackers;

I've noticed a strangeness on our cross-compiled uclibc linked
postgresql package that I was hoping to elicit some help with.

This is probably best described by showing some queries with
commentary, so on with that.

postgres=# select count(*) from pg_timezone_names where utc_offset != '00:00';
 count
---
 0
postgres=# select count(*) from pg_timezone_names where utc_offset != '00:00';
 count
---
   504
postgres=# select count(*) from pg_timezone_names where utc_offset != '00:00';
 count
---
 0
postgres=# select *from pg_timezone_names where utc_offset != '00:00' limit 5;
 name | abbrev | utc_offset | is_dst
--+++
(0 rows)

postgres=# select *from pg_timezone_names where utc_offset != '00:00' limit 5;
   name| abbrev | utc_offset | is_dst
---+++
 Africa/Algiers| CET| 01:00:00   | f
 Africa/Luanda | WAT| 01:00:00   | f
 Africa/Porto-Novo | WAT| 01:00:00   | f
 Africa/Gaborone   | CAT| 02:00:00   | f
 Africa/Bujumbura  | CAT| 02:00:00   | f
(5 rows)

As seen, the output is erratic, but when it is giving results, they
seem to be correct.

# pg_config
BINDIR = /usr/bin
DOCDIR = /usr/doc/postgresql
INCLUDEDIR = /usr/include
PKGINCLUDEDIR = /usr/include/postgresql
INCLUDEDIR-SERVER = /usr/include/postgresql/server
LIBDIR = /usr/lib
PKGLIBDIR = /usr/lib/postgresql
LOCALEDIR =
MANDIR = /usr/share/man
SHAREDIR = /usr/share/postgresql
SYSCONFDIR = /etc/postgresql
PGXS = /usr/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--target=i386-linux' '--host=i386-linux'
'--build=i486-linux-gnu' '--program-prefix=' '--program-suffix='
'--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin'
'--datadir=/usr/share' '--includedir=/usr/include'
'--infodir=/usr/share/info' '--libdir=/usr/lib'
'--libexecdir=/usr/lib' '--localstatedir=/var'
'--mandir=/usr/share/man' '--sbindir=/usr/sbin' '--sysconfdir=/etc'
'--disable-nls' '--enable-shared' '--enable-static' '--disable-rpath'
'--without-java' '--without-krb4' '--without-krb5' '--without-openssl'
'--without-pam' '--without-perl' '--without-python'
'--without-rendezvous' '--without-tcl' '--without-tk'
'--with-zlib=yes' '--enable-depend' 'CC=i386-linux-uclibc-gcc'
'CFLAGS=-O2 -pipe -march=i486 -funit-at-a-time' 'CPPFLAGS=
-I/mnt/build/kamikaze/staging_dir_i386/usr/include
-I/mnt/build/kamikaze/staging_dir_i386/include'
'LDFLAGS=-L/mnt/build/kamikaze/staging_dir_i386/usr/lib
-L/mnt/build/kamikaze/staging_dir_i386/lib'
'build_alias=i486-linux-gnu' 'host_alias=i386-linux'
'target_alias=i386-linux'
CC = i386-linux-uclibc-gcc
CPPFLAGS = -I/mnt/build/kamikaze/staging_dir_i386/usr/include
-I/mnt/build/kamikaze/staging_dir_i386/include -D_GNU_SOURCE
CFLAGS = -O2 -pipe -march=i486 -funit-at-a-time -Wall
-Wmissing-prototypes -Wpointer-arith -Winline
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
CFLAGS_SL = -fpic
LDFLAGS = -L/mnt/build/kamikaze/staging_dir_i386/usr/lib
-L/mnt/build/kamikaze/staging_dir_i386/lib
LDFLAGS_SL =
LIBS = -lpgport -lz -lreadline -lncurses -lcrypt -ldl -lm -lnotimpl
VERSION = PostgreSQL 8.2.5

uclibc version is 0.9.28-10 from our tree (-10 is our versioning).

You can see the current public cross-compilation Makefile for
postgresql at https://dev.openwrt.org/browser/packages/libs/postgresql/Makefile

Note, that in the current version of the Makefile that is uncommitted,
I have the following changes:
* I've removed --disable-integer-datetimes (per the suggestion of
someone in #postgresql on freenode)
* I've added a sed replacement (posix zone from the tzdata file
doesn't seem to exist anymore?)
  * $(SED) '[EMAIL PROTECTED]/[EMAIL PROTECTED]/New_York@'
$(PKG_BUILD_DIR)/src/timezone/Makefile

None of those have affected it at this point.  I'd appreciate any help
on this matter that you guys can provide.  If you need any more
information, please don't hesitate to ask.

/tmy

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Ron Mayer

Chris Browne wrote:
 _On The Other Hand_, there will be attributes that are *NOT* set in a
 more-or-less chronological order, and Segment Exclusion will be pretty
 useless for these attributes.  

Really?I was hoping that it'd be useful for any data
with long runs of the same value repeated - regardless of ordering.

My biggest tables are clustered by zip/postal-code -- which means that
while the City, State, Country attributes aren't monotonically increasing
or decreasing; they are grouped tightly together.   I'd expect all queries
for San Francisco to be able to come from at most 2 segments; and all queries
for Texas to be able to come from only a fraction of the whole.


If the segment sizes are configurable - I imagine this would even
be useful for other data - like a people table organized
by last_name,first_name.   John's may be scattered through out
the table -- but at least the John Smith's would all be on one
segment, while the Aaron-through-Jim Smith segments might get excluded.

Or am I missing something?

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Gavin Sherry

On Wed, Jan 09, 2008 at 11:47:31AM -0500, Chris Browne wrote:
 [EMAIL PROTECTED] (Simon Riggs) writes:
  I think we have an opportunity to bypass the legacy-of-thought that
  Oracle has left us and implement something more usable.
 
 This seems like a *very* good thing to me, from a couple of
 perspectives.
 
[snip]
 2.  Blindly following what Oracle does has always been a dangerous
 sort of thing to do.
 
 There are two typical risks:
 
   a) There's always the worry that they may have patented some
  part of how they implement things, and if you follow too
  closely, There Be Dragons...

I think that could be equally said of the dynamic partitioning approach.
In fact, it might be more likely since declarative partitioning has been
around for eons.

   b) They have enough billion$ of development dollar$ and
  development re$ource$ that they can follow strategies that
  are too expensive for us to even try to follow.

I don't see that as an argument against the declarative approach.
Reading the details on this thread so far, I think Simon's approach is
probably more complex from an implementation POV.

 
 3.  If, rather than blindly following, we create something at least
 quasi-new, there is the chance of doing fundamentally better.
 
 This very thing happened when it was discovered that IBM had a
 patent on the ARC cacheing scheme; the clock system that emerged
 was a lot better than ARC ever was.

Well, I don't think I'm proposing we /blindly follow/ others. I propose
we choose a grammar which takes the best of what others have tried to
do. Oracle's grammar is hideous, IBM's is too restrictive, for example.

  One major advantage of the dynamic approach is that it can work on
  multiple dimensions simultaneously, which isn't possible with
  declarative partitioning. For example if you have a table of Orders then
  you will be able to benefit from Segment Exclusion on all of these
  columns, rather than just one of them: OrderId, OrderDate,
  RequiredByDate, LastModifiedDate. This will result in some sloppiness
  in the partitioning, e.g. if we fill 1 partition a day of Orders, then
  the OrderId and OrderData columns will start out perfectly arranged. Any
  particular RequiredByDate will probably be spread out over 7 partitions,
  but thats way better than being spread out over 365+ partitions.
 
 I think it's worth observing both the advantages and demerits of this
 together.
 
 In effect, with the dynamic approach, Segment Exclusion provides its
 benefits as an emergent property of the patterns of how INSERTs get
 drawn into segments.
 
 The tendancy will correspondly be that Segment Exclusion will be able
 to provide useful constraints for those patterns that can naturally
 emerge from the INSERTs.

Many people, in my experience, doing the kind of data processing which
benefits from partitioning are regularly loading data, rather than
collecting it in an OLTP fashion. Lets take the easily understandable
concept of processing web site traffic. If the amount of data is large
enough to benefit from partitioning, they probably have multiple web
servers and therefore almost certainly multiple log files. If these
files are not sorted into a single file, the records will not have a
naturally progressing chronology: every file we go back to the beginning
of the period of time the load covers. If you add parallelism to your load,
things look even more different. This means you could end up with a
bunch of partitions, under the dynamic model, which all cover the same
time range.

Then there's the way that really big databases are used (say, up around
Simon's upper bound of 16 TB). It is expensive to keep data online so
people aren't. They're loading and unloading data all the time, to
perform different forms of analysis. A common scenario in the example
above might be to unload all but the current month's data and then load
the same month from the previous year. The unload step needs to be
costly (i.e., TRUNCATE). Then, there's no guarantee that what they're
interested in is the date range at all. They may want to compare user
agent (look at bot activity). In this case, the partitioning is across a
small list of strings (well, numbers most likely). Here, the partitions
would all have the same range. Partitioning would be useless.

 We can therefore expect useful constraints for attributes that are
 assigned in some kind of more or less chronological order.  Such
 attributes will include:
 
  - Object ID, if set by a sequence
  - Processing dates
 
 There may be a bit of sloppiness, but the constraints may still be
 useful enough to exclude enough segments to improve efficiency.

I really like the idea of the system being able to identify trends and
patterns in the data (actually useful at the user level) but it's the
uncertainty of how the partitioning will work for different kinds of
data that I don't like.

 
 _On The Other Hand_, there will be

Re: [HACKERS] OUTER JOIN performance regression remains in 8.3beta4

2008-01-09 Thread Tom Lane

Gregory Stark [EMAIL PROTECTED] writes:
 So if I write (along with some other joins):
  t1 join t2 on (t1.x=t2.x) where t1.x=3
 I'll get a different result than if I write
  t1, t2 where t1.x=3 and t2.x=3

In 8.3 you won't, because those are in fact exactly equivalent (and the
new EquivalenceClass machinery can prove it).  The cases that are
interesting are more like

t1 LEFT join t2 on (t1.x=t2.x) where t1.x=3

which is not equivalent to the other construction, because t2.x is only
sort-of-equal to 3.

Hmm ... now that I look at this, it might be a good idea if
have_relevant_eclass_joinclause() didn't skip ec_has_const
EquivalenceClasses.  That would give us the same behavior for simple
inner-join cases that I'm advocating for outer joins, namely that we
can consider an early join between two rels that are related in the
fashion you show.  We don't actually need to invent dummy join
clauses to make that happen, because the join search code believes
have_relevant_eclass_joinclause() even if it doesn't see a
joinclause for itself ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Simon Riggs

On Wed, 2008-01-09 at 20:03 +0100, Gavin Sherry wrote:

 I think Simon's approach is
 probably more complex from an implementation POV.

Much of the implementation is exactly the same, and I'm sure we agree on
more than 50% of how this should work already. We just need to close in
on the remainder.

My current opinion on the SE approach is the opposite of the one above,
though it gets us nowhere just to state it. I'm trying to avoid opinion
and look at the details, which is the reason my viewpoint recently
changed in favour of the dynamic approach as the main thrust for
implementation.

I've written a detailed email this morning to explain where and how the
problems lie, which are nowhere near the syntax level. I haven't ruled
out a declarative approach yet, but I need some detailed technical
review of the issues. I hope you'll be replying to that?

  3.  If, rather than blindly following, we create something at least
  quasi-new, there is the chance of doing fundamentally better.
  
  This very thing happened when it was discovered that IBM had a
  patent on the ARC cacheing scheme; the clock system that emerged
  was a lot better than ARC ever was.
 
 Well, I don't think I'm proposing we /blindly follow/ others. I propose
 we choose a grammar which takes the best of what others have tried to
 do. Oracle's grammar is hideous, IBM's is too restrictive, for example.

I assume the new grammar is good and if we do go that way, it sounds
like the right starting place. 

   One major advantage of the dynamic approach is that it can work on
   multiple dimensions simultaneously, which isn't possible with
   declarative partitioning. For example if you have a table of Orders then
   you will be able to benefit from Segment Exclusion on all of these
   columns, rather than just one of them: OrderId, OrderDate,
   RequiredByDate, LastModifiedDate. This will result in some sloppiness
   in the partitioning, e.g. if we fill 1 partition a day of Orders, then
   the OrderId and OrderData columns will start out perfectly arranged. Any
   particular RequiredByDate will probably be spread out over 7 partitions,
   but thats way better than being spread out over 365+ partitions.
  
  I think it's worth observing both the advantages and demerits of this
  together.
  
  In effect, with the dynamic approach, Segment Exclusion provides its
  benefits as an emergent property of the patterns of how INSERTs get
  drawn into segments.
  
  The tendancy will correspondly be that Segment Exclusion will be able
  to provide useful constraints for those patterns that can naturally
  emerge from the INSERTs.
 
 Many people, in my experience, doing the kind of data processing which
 benefits from partitioning are regularly loading data, rather than
 collecting it in an OLTP fashion. Lets take the easily understandable
 concept of processing web site traffic. If the amount of data is large
 enough to benefit from partitioning, they probably have multiple web
 servers and therefore almost certainly multiple log files. If these
 files are not sorted into a single file, the records will not have a
 naturally progressing chronology: every file we go back to the beginning
 of the period of time the load covers. If you add parallelism to your load,
 things look even more different. This means you could end up with a
 bunch of partitions, under the dynamic model, which all cover the same
 time range.

Depends how big we make the partitions and how sloppy this is as to
whether that is a problem or not. We might still expect a x100 gain from
using the SE approach depending upon the data volume.

 Then there's the way that really big databases are used (say, up around
 Simon's upper bound of 16 TB). It is expensive to keep data online so
 people aren't. They're loading and unloading data all the time, to
 perform different forms of analysis. 

That isn't my experience. That sounds very time consuming.

The storage cost issue was the reason Andrew wanted offline segments,
and why I have been talking about hierarchical storage.

 A common scenario in the example
 above might be to unload all but the current month's data and then load
 the same month from the previous year. The unload step needs to be
 costly (i.e., TRUNCATE). Then, there's no guarantee that what they're
 interested in is the date range at all. They may want to compare user
 agent (look at bot activity). In this case, the partitioning is across a
 small list of strings (well, numbers most likely). Here, the partitions
 would all have the same range. Partitioning would be useless.

I take it you mean SE-based partitioning would be useless, but
declarative partitioning would be useful? I would agree, assuming they
run queries with a few of the small list of strings. Seems like a
contrived case.

 Some of the biggest I've seen are GIS information or network
 information.

Those are good examples of where a declarative approach would be the
only way of getting

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Chris Browne

Ron Mayer [EMAIL PROTECTED] writes:
 Chris Browne wrote:
 _On The Other Hand_, there will be attributes that are *NOT* set in a
 more-or-less chronological order, and Segment Exclusion will be pretty
 useless for these attributes.  

 Really?I was hoping that it'd be useful for any data
 with long runs of the same value repeated - regardless of ordering.

 My biggest tables are clustered by zip/postal-code -- which means that
 while the City, State, Country attributes aren't monotonically increasing
 or decreasing; they are grouped tightly together.   I'd expect all queries
 for San Francisco to be able to come from at most 2 segments; and all queries
 for Texas to be able to come from only a fraction of the whole.


 If the segment sizes are configurable - I imagine this would even
 be useful for other data - like a people table organized
 by last_name,first_name.   John's may be scattered through out
 the table -- but at least the John Smith's would all be on one
 segment, while the Aaron-through-Jim Smith segments might get excluded.

 Or am I missing something?

Well, this can head in two directions...

1.  Suppose we're not using an organize in CLUSTER order approach.

If the data is getting added in roughly by order of insertion order,
then there's no reason to expect San Francisco data to be clustered
together.

Ditto for John Smiths; we can expect them to be as scattered as
their dates of creation.

1.  Suppose we *are* using an organize in CLUSTER order approach.

In that case, yes, indeed, you can expect segments to contain specific
ranges of the data.

However, in that case, the ONLY dimension under which the Segment
Exclusion may be expected to be useful is that of the first column of
the index being used.  Smith should be useful to SE, but not John,
because, as a secondary criteria, the first name values will be
scattered across all segments.
-- 
(reverse (concatenate 'string ofni.sesabatadxunil @ enworbbc))
http://www3.sympatico.ca/cbbrowne/linuxdistributions.html
PALINDROME spelled backwards is EMORDNILAP. 

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Dimitri Fontaine

Hi,

Le Wednesday 09 January 2008 19:27:41 Simon Riggs, vous avez écrit :
 The WHERE clause approach might easily allow more than 2 chunks and they
 need not be logically contiguous. So the phrase split point doesn't
 really fit because it implies a one dimensional viewpoint, but I'm happy
 for you to give it a name.

Maybe that's only me but I'm not yet clear, after reading this thread and the 
previous one, whether or not Segment Exclusion would allow for multi-level 
partitioning.

I have a use case at the moment, where we load logs-like data from several 
server to a central one (batch loading), the central table having an 
extra server column to identify each tuple origin. Will SE technique be 
able to see that this table would be better partitionned by server AND date?

That's what I would have done if it was easier to do with constraint exclusion 
(did only date partitioning), as the reporting queries will always have some 
server (stats by services, each service being installed on 1 or more servers) 
and date restrictions.

Please note I'd be happy to have this use case handled by explicitly 
specifying the partitioning system I want PostgreSQL to use, and more than 
happy if you answer me than an automatic transparent code is able to optimize 
the data on disk for my need without me bothering about partitions, their 
names and split points...

 If we want to perform manipulations on non-read-only chunks then we need
 named or numbered partitions, locking, DDL etc.. That seems like too
 much functionality for what we really need. I really am still open on
 that point, but I would like to see a good description of a use case
 that really needs it, rather than just saying of course we do. Which
 is exactly where *I* started, even as recently as 3 weeks ago now.

I like Markus ideas proposing to have SE at work inside partitions or tables, 
partitions being another kind of relations holding data. Then the DBA who 
needs to explicitly manage partitions to save faster tablespace for live data 
is able to tell that to the system and benefit fully from it.

Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Simon Riggs

On Tue, 2008-01-08 at 02:12 +, Gregory Stark wrote:

 I also don't understand how this proposal deals with the more common use case
 of unloading and loading data. Normally in partitioned tables we build the
 data in a side table until the data is all correct then load it as a
 partition. If you treat it as a lower-level object then I don't see that
 working. The layout of the new table won't often match the layout of the
 target partitioned table.

We optimised for that in 8.2, but I would say that not many people
noticed and that it isn't normal.

The problem with that approach, and the reason many people don't use it
is that it requires all data for a partition to be available at the time
you add the partition. That necessarily implies a time delay into the
process of loading data, which is no long acceptable in the world of
straight-through-processing or whatever you call the need for zero
processing delay in an/your industry. So people choose to load data
directly into the main table, allowing it to be immediately available,
though at the cost of some performance.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Simon Riggs

On Wed, 2008-01-09 at 21:29 +0100, Dimitri Fontaine wrote:

 Le Wednesday 09 January 2008 19:27:41 Simon Riggs, vous avez écrit :
  The WHERE clause approach might easily allow more than 2 chunks and they
  need not be logically contiguous. So the phrase split point doesn't
  really fit because it implies a one dimensional viewpoint, but I'm happy
  for you to give it a name.
 
 Maybe that's only me but I'm not yet clear, after reading this thread and the 
 previous one, whether or not Segment Exclusion would allow for multi-level 
 partitioning.
 
 I have a use case at the moment, where we load logs-like data from several 
 server to a central one (batch loading), the central table having an 
 extra server column to identify each tuple origin. Will SE technique be 
 able to see that this table would be better partitionned by server AND date?

No, but it will be able to handle partitioning on other columns that
provide a degree of differentiation that you possibly hadn't considered
at design time.

 That's what I would have done if it was easier to do with constraint 
 exclusion 
 (did only date partitioning), as the reporting queries will always have some 
 server (stats by services, each service being installed on 1 or more servers) 
 and date restrictions.

Hmm, well if you found declaring the partitions a problem with
constraint exclusion it's not going to be any easier using other
declarative approaches.

So it will work with what you currently use.

You can always use constraint exclusion to separate out the servers and
then segment exclusion to handle the date range.

 Please note I'd be happy to have this use case handled by explicitly 
 specifying the partitioning system I want PostgreSQL to use, and more than 
 happy if you answer me than an automatic transparent code is able to optimize 
 the data on disk for my need without me bothering about partitions, their 
 names and split points...
 
  If we want to perform manipulations on non-read-only chunks then we need
  named or numbered partitions, locking, DDL etc.. That seems like too
  much functionality for what we really need. I really am still open on
  that point, but I would like to see a good description of a use case
  that really needs it, rather than just saying of course we do. Which
  is exactly where *I* started, even as recently as 3 weeks ago now.
 
 I like Markus ideas proposing to have SE at work inside partitions or tables, 
 partitions being another kind of relations holding data. Then the DBA who 
 needs to explicitly manage partitions to save faster tablespace for live data 
 is able to tell that to the system and benefit fully from it.

OK, thanks,

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tzdata issue on cross-compiled postgresql

2008-01-09 Thread Tom Lane

Tim Yardley [EMAIL PROTECTED] writes:
 postgres=# select count(*) from pg_timezone_names where utc_offset != '00:00';
  count
 ---
  0
 postgres=# select count(*) from pg_timezone_names where utc_offset != '00:00';
  count
 ---
504
 postgres=# select count(*) from pg_timezone_names where utc_offset != '00:00';
  count
 ---
  0

That's just bizarre ... IIRC the output of that view is computed by
scanning the timezone directories, and you wouldn't expect that to
change.

Can you strace the backend while it's doing this and see if there's a
difference in the series of kernel calls issued?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Gavin Sherry

On Wed, Jan 09, 2008 at 02:38:21PM -0500, Chris Browne wrote:
 Ron Mayer [EMAIL PROTECTED] writes:
  Or am I missing something?
 
 Well, this can head in two directions...
 
 1.  Suppose we're not using an organize in CLUSTER order approach.
 
 If the data is getting added in roughly by order of insertion order,
 then there's no reason to expect San Francisco data to be clustered
 together.
 
 Ditto for John Smiths; we can expect them to be as scattered as
 their dates of creation.
 
 1.  Suppose we *are* using an organize in CLUSTER order approach.
 
 In that case, yes, indeed, you can expect segments to contain specific
 ranges of the data.
 
 However, in that case, the ONLY dimension under which the Segment
 Exclusion may be expected to be useful is that of the first column of
 the index being used.  Smith should be useful to SE, but not John,
 because, as a secondary criteria, the first name values will be
 scattered across all segments.

One of the reasons for using partitions is that the usefulness of
indexes breaks down because of the low cardinality of the indexed
column. Also, the random IO can kill your performance.

In my opinion, CLUSTER (and vacuum, mentioned else where) are not data
management tools.

Thanks,

Gavin

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Gavin Sherry

On Wed, Jan 09, 2008 at 08:17:41PM +, Simon Riggs wrote:
 On Wed, 2008-01-09 at 20:03 +0100, Gavin Sherry wrote:
 
  I think Simon's approach is
  probably more complex from an implementation POV.
 
 Much of the implementation is exactly the same, and I'm sure we agree on
 more than 50% of how this should work already. We just need to close in
 on the remainder.
 
 My current opinion on the SE approach is the opposite of the one above,
 though it gets us nowhere just to state it. I'm trying to avoid opinion
 and look at the details, which is the reason my viewpoint recently
 changed in favour of the dynamic approach as the main thrust for
 implementation.
 
 I've written a detailed email this morning to explain where and how the
 problems lie, which are nowhere near the syntax level. I haven't ruled
 out a declarative approach yet, but I need some detailed technical
 review of the issues. I hope you'll be replying to that?

I'm responding to that email seperately. Just a matter or addressing all
the points in the detail it deserves.

 
   3.  If, rather than blindly following, we create something at least
   quasi-new, there is the chance of doing fundamentally better.
   
   This very thing happened when it was discovered that IBM had a
   patent on the ARC cacheing scheme; the clock system that emerged
   was a lot better than ARC ever was.
  
  Well, I don't think I'm proposing we /blindly follow/ others. I propose
  we choose a grammar which takes the best of what others have tried to
  do. Oracle's grammar is hideous, IBM's is too restrictive, for example.
 
 I assume the new grammar is good and if we do go that way, it sounds
 like the right starting place. 
 
One major advantage of the dynamic approach is that it can work on
multiple dimensions simultaneously, which isn't possible with
declarative partitioning. For example if you have a table of Orders then
you will be able to benefit from Segment Exclusion on all of these
columns, rather than just one of them: OrderId, OrderDate,
RequiredByDate, LastModifiedDate. This will result in some sloppiness
in the partitioning, e.g. if we fill 1 partition a day of Orders, then
the OrderId and OrderData columns will start out perfectly arranged. Any
particular RequiredByDate will probably be spread out over 7 partitions,
but thats way better than being spread out over 365+ partitions.
   
   I think it's worth observing both the advantages and demerits of this
   together.
   
   In effect, with the dynamic approach, Segment Exclusion provides its
   benefits as an emergent property of the patterns of how INSERTs get
   drawn into segments.
   
   The tendancy will correspondly be that Segment Exclusion will be able
   to provide useful constraints for those patterns that can naturally
   emerge from the INSERTs.
  
  Many people, in my experience, doing the kind of data processing which
  benefits from partitioning are regularly loading data, rather than
  collecting it in an OLTP fashion. Lets take the easily understandable
  concept of processing web site traffic. If the amount of data is large
  enough to benefit from partitioning, they probably have multiple web
  servers and therefore almost certainly multiple log files. If these
  files are not sorted into a single file, the records will not have a
  naturally progressing chronology: every file we go back to the beginning
  of the period of time the load covers. If you add parallelism to your load,
  things look even more different. This means you could end up with a
  bunch of partitions, under the dynamic model, which all cover the same
  time range.
 
 Depends how big we make the partitions and how sloppy this is as to
 whether that is a problem or not. We might still expect a x100 gain from
 using the SE approach depending upon the data volume.

This comes back to my problem with the dynamic approach: the user might
get a certain experience but with the declarative approach the
expectation matches the result.

 
  Then there's the way that really big databases are used (say, up around
  Simon's upper bound of 16 TB). It is expensive to keep data online so
  people aren't. They're loading and unloading data all the time, to
  perform different forms of analysis. 
 
 That isn't my experience. That sounds very time consuming.

I cannot offer any evidence but it's what we see companies doing.

 The storage cost issue was the reason Andrew wanted offline segments,
 and why I have been talking about hierarchical storage.

The thing is, people with lots of data usually have good hierarchical
storage systems. Can we really do better?

 
  A common scenario in the example
  above might be to unload all but the current month's data and then load
  the same month from the previous year. The unload step needs to be
  costly (i.e., TRUNCATE). Then, there's no guarantee that what they're
  interested in is the date range at all. They may want to

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Gavin Sherry

On Wed, Jan 09, 2008 at 08:51:30PM +, Simon Riggs wrote:
  That's what I would have done if it was easier to do with constraint 
  exclusion 
  (did only date partitioning), as the reporting queries will always have 
  some 
  server (stats by services, each service being installed on 1 or more 
  servers) 
  and date restrictions.
 
 Hmm, well if you found declaring the partitions a problem with
 constraint exclusion it's not going to be any easier using other
 declarative approaches.

I disagree (although it is unreasonable for me to do so without posting
syntax -- it's coming). Proper grammar for partition support means
running a single DDL command. The user does not have to line up table
generation with rules (or triggers) and check constraints. As such, I
believe it to be much much easier.

Thanks,
Gavin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Warren Turkal

The year to month and day to second intervals should not overlap. The
standard doesn't actually allow it IIRC.

wt

On Jan 9, 2008 7:17 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Ilya A. Kovalenko [EMAIL PROTECTED] writes:
  I suggest one more standard date/time operator, to divide one interval
  by another with numeric (or float, for example) result.

 You'd have to define exactly what that means, which seems a little
 tricky for incommensurate intervals.  For instance what is the
 result of '1 month' / '1 day' ?

 regards, tom lane


 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Named vs Unnamed Partitions

2008-01-09 Thread Joshua D. Drake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 9 Jan 2008 23:52:09 +0100
Gavin Sherry [EMAIL PROTECTED] wrote:
te restrictions.
  
  Hmm, well if you found declaring the partitions a problem with
  constraint exclusion it's not going to be any easier using other
  declarative approaches.
 
 I disagree (although it is unreasonable for me to do so without
 posting syntax -- it's coming). Proper grammar for partition support
 means running a single DDL command. The user does not have to line up
 table generation with rules (or triggers) and check constraints. As
 such, I believe it to be much much easier.

+1 

http://www.databasedesign-resource.com/oracle-partitions.html

I am not saying I like Oracle's syntax (I don't) but:

http://dev.mysql.com/tech-resources/articles/mysql_5.1_partitioning_with_dates.html

and:

http://www.databasejournal.com/features/mssql/article.php/3456991

Or worlds above us in usability. 

Sincerely,

Joshua D. Drake

- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHhVoAATb/zqfZUUQRAp9IAJ4+LQ+zHOgD1wpblH/q1OwF4+1W3QCdFaLU
hlb5uRrbK7Z+oRCLMi+SNJs=
=cmIs
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] odd convert_from bug

2008-01-09 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 The case below has just been reported to me. It sure looks odd. I'm 
 looking into it but any ideas would be welcome. The problem only occurs 
 if we are updating more than one row.

Pfree'ing something you didn't palloc is bad news...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] odd convert_from bug

2008-01-09 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
  
The case below has just been reported to me. It sure looks odd. I'm 
looking into it but any ideas would be welcome. The problem only occurs 
if we are updating more than one row.



Pfree'ing something you didn't palloc is bad news...
  


Ah, yes, thanks, looks like I was a little too eager on the CP. I see 
you have fixed it.


cheers

andrew


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] odd convert_from bug

2008-01-09 Thread Andrew Dunstan




Andrew Dunstan wrote:



Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
 
The case below has just been reported to me. It sure looks odd. I'm 
looking into it but any ideas would be welcome. The problem only 
occurs if we are updating more than one row.



Pfree'ing something you didn't palloc is bad news...
  


Ah, yes, thanks, looks like I was a little too eager on the CP. I see 
you have fixed it.





BTW, if calling pfree() at all here is actually a bug, then we should 
probably fix it in the back branches. It looks more to me like the 
problem was that pg_convert_from was calling pfree() with the wrong 
argument - src_encoding_name instead of dest_encoding_name. But maybe 
the pfree in the back branches is unnecessary but harmless.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Gavin Sherry

Hi Simon,

On Wed, Jan 02, 2008 at 05:56:14PM +, Simon Riggs wrote:
 Segment Exclusion
 -
 
 After we note that a segment is read-only we can scan the segment and
 record min/max values for all columns. These are then implicit
 constraints, which can then be used for segment exclusion in a similar
 manner as we do with the current constraint exclusion mechanism.

What about columns which are varlen? The maximum value could be very long -- 1
GB in fact. Say we decide to not support those. Well, what about
numeric? What about user defined types?

 This would also allow a Merge Join to utilise exclusion for the inner
 plan at execution time. Instead of scanning the whole inner table plan
 from the start, we would be able to start the scan from the appropriate
 segment. This would require us to pass the current value of the outer
 plan down into the inner plan. The typical executor nodes on the inner
 plan would be a SortNode and below that a SeqScan. In that case the
 executor would need to pass the outer value from the MergeJoinNode down
 thru the SortNode to the SeqScan node. The SeqScan node could then
 perform partition exclusion, reducing the time for that scan and also
 reducing the time for the resulting sort. This sounds like it might be
 worth doing in normal cases also. It might turn out that the potentially
 applicable cases are already excluded during planning, I haven't thought
 about that aspect in enough detail yet.

I don't think that would work at all. Say you have have skew of data
across the partitions. You may end up doing a seq scan for every outer
tuple. It would look like a really expensive nested loop.

 If we collect data for all columns then many of our implicit constraints
 would be useless. e.g. if a column only has a few values and these are
 present in all segments. Matching our predicate against all constraints
 would be expensive, so we must weed out poor constraints. We would do
 this by removing any constraint that overlapped more than 10% of other
 segments. Various heuristics would likely need to be discovered during
 development to make this work without resorting to manual commands.
 
 Note that all of this exclusion is happening in the executor, not the
 planner. That allows this form of exclusion to work with stable
 functions and parameters without problem.

So, in that case, you've totally undercut the planner. All the steps up
the tree would be based on the stats for scanning the entire table but
you've only returned part of it.

To solve that, you could put the min/max values in a catalog somewhere.
For a 16 TB table, that's 16000 min/max values. That's pretty expensive.
And how do we handle making things non-read-only. You'd have to wait for
all current queries to finish... that could take a long time.

How do we handle crashes during setting of min/max? I presume it needs
WAL support.

What happens if the free space map gives us a free block in a read only
segment. Then we need to look at concurrency and transactionality of
min/max values don't we? If changing it makes it not read-only (which
seems to be the case) it would be trivial to totally degrade your
partitioning scheme to full seq scan. One a 16 TB table. And getting the
performance back might involve a day or more or VACUUMing. Impossible.

 Visibility Map
 --
[snip]
 No dynamic shared memory cache is required because any concurrent
 changes to the table would be ignored by a Scan anyway, so it doesn't
 matter if an INSERT, UPDATE or DELETE occurs while we are scanning. Any
 new scans that start will attempt to lock the table and then perform a
 rel cache check before continuing. So the visibility will be set
 correctly for *that* scan at least.

Is that true in the presence of READ COMMITTED?

 
 In most cases the visibility map can be summarised as a single boolean
 to show whether any 100% visible segments exist. That makes accessing
 the map very cheap in the common, unset case.

What do we do in failure scenarios. Does this go into WAL? Is it
replicated via log shipping techniques. You mention Slony support below.
I don't know that Slony's target is Very Big Tables (up to 16 TB) but
even if it was being used for a 'small' system of a few hundred GB, when
you fail over the system has degraded if you aren't vacuuming it. More
over, the layout of data may be different and therefore the performance
of the segment exclusion. That's a bit of a surprise.

 
 Setting the Visibility Map
 --
 
 VACUUM would scan all READ_WRITE_ALLOWED segments and mark some of
 them as EFFECTIVE_READ_ONLY if 100% of the remaining rows are visible to
 all current backends. Marking the map will cause a rel cache
 invalidation.

Argh. Remember, we're talking about 16 TB tables here. With the best of
tuning, that takes all day. So, day 1, load data; day 2 VACUUM it? With
the existing partitioning approach, there's no such thing.

 
 We would never mark the highest numbered

Re: [HACKERS] tzdata issue on cross-compiled postgresql

2008-01-09 Thread Tom Lane

Tim Yardley [EMAIL PROTECTED] writes:
 Can you strace the backend while it's doing this and see if there's a
 difference in the series of kernel calls issued?

 See attached strace.  Let me know if you see anything enlightening.

Nope :-(.  The strace output is *exactly* the same across all four
instances, except for the number of bytes sent over to the client,
which reflects the difference between 504 and 0.  It's possible
that something is lurking in the data that strace didn't show, but
I think that's unlikely.  Somehow the misfeasance is happening within
Postgres proper.

I think you'll have to get out a debugger and trace through it to
find out where the rows are getting discarded ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] odd convert_from bug

2008-01-09 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 BTW, if calling pfree() at all here is actually a bug, then we should 
 probably fix it in the back branches. It looks more to me like the 
 problem was that pg_convert_from was calling pfree() with the wrong 
 argument - src_encoding_name instead of dest_encoding_name.

Right.  I just took it out because it wasn't really that useful;
in general SQL-callable functions can expect that they're called in
fairly short-lived contexts, and so retail pfree's aren't very
interesting unless you're talking about large chunks.

BTW, just for the record, the (void *) casts were poor style
too IMHO --- DatumGetPointer() would be better.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Gavin Sherry

Hi Simon,

On Wed, Jan 09, 2008 at 03:08:08PM +, Simon Riggs wrote:
 Do people really like running all that DDL? There is significant
 manpower cost in implementing and maintaining a partitioning scheme,
 plus significant costs in getting it wrong. 

Well... that's impossible for me to say. Most of the people who use lots
of data like using MDX (a large and quite complex reporting and
modelling language from MS, which is more complex than SQL IMHO) so...

 
 If people with large tables like partitioning why is Oracle moving
 towards automated partitioning in 11g? Automated partitioning was one of

Have you used Oracle's partitioning? I'm not surprised they're moving
away ;-).

More seriously, I didn't know that. Do you have a URL?

 the major goals for this next set of Postgres partitioning functionality
 also, whether or not we have declarative partitioning. My thinking is
 lets go for the best ideas and skip over the stuff that will be (is)
 obsolete before its left the design stage.

Well, you're assuming that because declarative partitioning has been around a
while, it's obselete. I don't think that's the case. Postgres has been
around about as long...

 I see many more systems in the Terabyte range now than I did 10 years
 ago, but I see about the same number of DBAs. We'll always need experts,
 but I feel we should be using our expertise to simplify the standard
 cases, not just maintain the same level of difficulty in managing them.
 One of the big benefits of the dynamic partitioning approach is that it
 needs no DDL. So it will work out-of-the-box, for anybody.

It's fine to say that, but people have already started talking about
grammar extensions for dynamic partitions. People want the tools to
manage it. There's difficulty in learning how and when to use the
different VACUUM options that have been discussed.

 Deleting older data would be optimised under the proposed scheme, so
 that's not really a problem. Loading data is actually slightly harder
 and slower with declarative partitioning (see below).
 
 Merging and splitting partitions are tools for fine tuning a very
 complex partitioning scheme. They do also allow a non-linear
 segmentation scheme, which might aid performance in some cases.

What about adding partitions into a set of partitions. Greg's post on
that was dismissed and it might not be representative but we have users
doing that all the time.

 One major advantage of the dynamic approach is that it can work on
 multiple dimensions simultaneously, which isn't possible with
 declarative partitioning. For example if you have a table of Orders then
 you will be able to benefit from Segment Exclusion on all of these
 columns, rather than just one of them: OrderId, OrderDate,
 RequiredByDate, LastModifiedDate. This will result in some sloppiness
 in the partitioning, e.g. if we fill 1 partition a day of Orders, then
 the OrderId and OrderData columns will start out perfectly arranged. Any
 particular RequiredByDate will probably be spread out over 7 partitions,
 but thats way better than being spread out over 365+ partitions.
 
 When we look at the data in the partition we can look at any number of
 columns. When we declaratively partition, you get only one connected set
 of columns, which is one of the the reasons you want multi-dimensional
 partitioning in the first place.
 
  To this end, we (well, Jeff Cohen) looked at the syntax and semantics of
  partitining in leading databases (Oracle, Informix, DB2) and came up
  with a highly expressive grammar which takes the best of each I think
  (I'll post details on the grammar in a seperate thread). The idea is
  that range (for example, a date range), list (a list of distinct values)
  and hash partitioning be supported on multiple columns. Partitions can
  be added, merged, truncated. Partitions can reside on different
  tablespaces. The existing issues with the rewriter, COPY support and the
  like go away by smartening up the backend.
 
  To explore the grammar and semantics Jeff and I (to a small extent) have
  implemented the parsing of such a grammar in the backend. At the moment,
  this just results in the generation of a bunch of rules (i.e., it
  produces the current behaviour, as if someone had written rules
  themselves) and debugging output.
  
  The fully fledged approach will see partitioning rules stored in
  a new system catalog which can be interrogated by the planner or COPY to
  determine which partition a given tuple belongs in or which partition(s)
  a WHERE clause matches.
 
 When loading data we would need to examine each incoming tuple and
 distribute it to the correct place in the partitioned heap. That's a
 non-trivial overhead on data loading that I would like to avoid, since
 if we rely on the natural locality of loaded data then that overhead is
 zero.

Yes, but it's a lot faster than what we have at the moment and the
dynamic approach is not without its costs. I've pointed it out before,
but I will

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Alvaro Herrera

Warren Turkal escribió:
 The year to month and day to second intervals should not overlap. The
 standard doesn't actually allow it IIRC.

They do on Postgres anyway.  Otherwise the type is not all that useful,
is it?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Brendan Jurd

On Jan 10, 2008 2:17 AM, Tom Lane [EMAIL PROTECTED] wrote:
 You'd have to define exactly what that means, which seems a little
 tricky for incommensurate intervals.  For instance what is the
 result of '1 month' / '1 day' ?


Postgres has already made such definitions, to allow direct
interval-interval comparison.

1 month is deemed equal to 30 days, 1 day is deemed equal to 24 hours
(although for some reason we ignore the issue of years vs. days).

I argued in a long-dead thread that we should disallow these kinds of
comparisons altogether, but I didn't manage to generate much
enthusiasm.  The overall sentiment seemed to be that the slightly
bogus results were more useful than no result at all.

That being the case, if we're comfortable making these kinds of
arbitrary definitions for comparison purposes, it doesn't seem like
much of a stretch to allow multiplication and division of intervals
using the same rules.

Regards,
BJ

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Warren Turkal

On Jan 9, 2008 8:33 PM, Brendan Jurd [EMAIL PROTECTED] wrote:
 I argued in a long-dead thread that we should disallow these kinds of
 comparisons altogether, but I didn't manage to generate much
 enthusiasm.  The overall sentiment seemed to be that the slightly
 bogus results were more useful than no result at all.

I was wondering why PostgreSQL allowed these types of comparisons. It
really shouldn't allow them.

wt

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] flex/bison output wrongly created in the source directory

2008-01-09 Thread Warren Turkal

I was wondering if there is a reason that the flex and bison and other
generated source files end up in the source directory when doing an
out-of-tree build. Would a patch that puts those files in the build
trees be accepted?

wt

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Brendan Jurd

On Jan 10, 2008 3:33 PM, Brendan Jurd [EMAIL PROTECTED] wrote:
 1 month is deemed equal to 30 days, 1 day is deemed equal to 24 hours
 (although for some reason we ignore the issue of years vs. days).


Sorry, a correction.  The issue of years vs. days isn't ignored.  A
year is just 12 months, which yields 12 * 30 = 360 days, which is
actually a pretty significant error (1.4% on average).

# select interval '1 year' = interval '360 days';
 ?column?
--
 t
(1 row)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Warren Turkal

On Jan 9, 2008 9:29 PM, Brendan Jurd [EMAIL PROTECTED] wrote:
 Sorry, a correction.  The issue of years vs. days isn't ignored.  A
 year is just 12 months, which yields 12 * 30 = 360 days, which is
 actually a pretty significant error (1.4% on average).

YEAR TO MONTH and DAY TO {HOUR,MINUTE,SECOND} intervals should not
combine. PostgreSQL correctly doesn't allow {YEAR,MONTH} TO
{DAY,HOUR,MINUTE,SECOND} intervals, and it shouldn't allow operating
on invalid intervals combinations either. One mistake that PG does
make is that it allows a FULL RANGE interval. This weirdness is
essentially a YEAR TO SECOND interval that isn't allowed when
explicitly requested.

wt-time= select INTERVAL '1 year 1 month 1 day 1:1:1';
  interval
-
 1 year 1 mon 1 day 01:01:01
(1 row)

wt-time= select INTERVAL '1 year 1 month 1 day 1:1:1' YEAR TO SECOND;
ERROR:  syntax error at or near SECOND
LINE 1: select INTERVAL '1 year 1 month 1 day 1:1:1' YEAR TO SECOND;
 ^
This is inconsistent. I would like to ultimately not allow operations
on interval combinations that are not allowed by the SQL standard.

wt

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] flex/bison output wrongly created in the source directory

2008-01-09 Thread Tom Lane

Warren Turkal [EMAIL PROTECTED] writes:
 I was wondering if there is a reason that the flex and bison and other
 generated source files end up in the source directory when doing an
 out-of-tree build. Would a patch that puts those files in the build
 trees be accepted?

Probably not, since our intention is that those files be distributed as
part of source tarballs.

Also, since they are (or should be) architecture-independent, what's the
point?  Out-of-tree builds are intended to support building for multiple
architectures in parallel; but there's no reason to force independent
reconstructions of these common derived files.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Tom Lane

Warren Turkal [EMAIL PROTECTED] writes:
 YEAR TO MONTH and DAY TO {HOUR,MINUTE,SECOND} intervals should not
 combine. PostgreSQL correctly doesn't allow {YEAR,MONTH} TO
 {DAY,HOUR,MINUTE,SECOND} intervals,

Really?  I think you've confused some unimplemented decorative syntax
with what the underlying datatype will or won't do.

 This is inconsistent. I would like to ultimately not allow operations
 on interval combinations that are not allowed by the SQL standard.

The spec's approach to datetime operations in general is almost totally
brain-dead, and so you won't find a lot of support around here for hewing
to the straight-and-narrow-spec-compliance approach.  If they have not
even heard of daylight-savings time, how can anyone credit them with any
meaningful contact with the real world?  We'll cite the spec where it
suits us, but in this area the spec says you can't do that carries
very little weight.

Or were you planning to lobby for removal of our DST support, too?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: Wednesday, January 09, 2008 10:00 PM
 To: Warren Turkal
 Cc: Brendan Jurd; Ilya А. Кovalenko; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] operator suggest  interval / interval = numeric

 Warren Turkal [EMAIL PROTECTED] writes:
  YEAR TO MONTH and DAY TO {HOUR,MINUTE,SECOND} intervals should not
  combine. PostgreSQL correctly doesn't allow {YEAR,MONTH} TO
  {DAY,HOUR,MINUTE,SECOND} intervals,

 Really?  I think you've confused some unimplemented decorative syntax
 with what the underlying datatype will or won't do.

  This is inconsistent. I would like to ultimately not allow operations
  on interval combinations that are not allowed by the SQL standard.

 The spec's approach to datetime operations in general is almost totally
 brain-dead, and so you won't find a lot of support around here for hewing
 to the straight-and-narrow-spec-compliance approach.  If they have not
 even heard of daylight-savings time, how can anyone credit them with any
 meaningful contact with the real world?  We'll cite the spec where it
 suits us, but in this area the spec says you can't do that carries
 very little weight.

 Or were you planning to lobby for removal of our DST support, too?

Don't forget indexes.  The standard does not breathe a word about them.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Tom Lane

Brendan Jurd [EMAIL PROTECTED] writes:
 On Jan 10, 2008 2:17 AM, Tom Lane [EMAIL PROTECTED] wrote:
 You'd have to define exactly what that means, which seems a little
 tricky for incommensurate intervals.  For instance what is the
 result of '1 month' / '1 day' ?

 Postgres has already made such definitions, to allow direct
 interval-interval comparison.

Sure.  I was just twitting the OP for having not considered these
issues.

Given that you can get at that behavior by dividing extract(epoch)
results, I tend to think we should leave well enough alone.  If someone
did come up with a brilliant definition of what interval division should
do, it would be pretty annoying to have already locked ourselves into a
not-so-brilliant definition ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Brendan Jurd

On Jan 10, 2008 5:00 PM, Tom Lane [EMAIL PROTECTED] wrote:
 The spec's approach to datetime operations in general is almost totally
 brain-dead, and so you won't find a lot of support around here for hewing
 to the straight-and-narrow-spec-compliance approach.  If they have not
 even heard of daylight-savings time, how can anyone credit them with any
 meaningful contact with the real world?  We'll cite the spec where it
 suits us, but in this area the spec says you can't do that carries
 very little weight.

It's true that the spec fails to consider DST, in that it doesn't
partition day and second intervals separately.

But is that really a reason to reject the concept of interval
partitioning altogether?  It seems the spec has the right idea, it
just doesn't take it far enough to cover all the bases.

Whether the spec is braindead w.r.t intervals or not, Postgres is
clearly giving the wrong answer.  A year interval is not 360 day
intervals long.  A month interval is not shorter than 31 day
intervals.  And, thanks to the geniuses who came up with DST, a day
interval is not the same as 24 hour intervals anymore.  None of these
comparisons are sane.

Regards,
BJ

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Warren Turkal

On Jan 9, 2008 10:44 PM, Brendan Jurd [EMAIL PROTECTED] wrote:
 On Jan 10, 2008 5:00 PM, Tom Lane [EMAIL PROTECTED] wrote:
  The spec's approach to datetime operations in general is almost totally
  brain-dead, and so you won't find a lot of support around here for hewing
  to the straight-and-narrow-spec-compliance approach.  If they have not
  even heard of daylight-savings time, how can anyone credit them with any
  meaningful contact with the real world?  We'll cite the spec where it
  suits us, but in this area the spec says you can't do that carries
  very little weight.

 It's true that the spec fails to consider DST, in that it doesn't
 partition day and second intervals separately.

Should the standard really do that? I mean, a day really is defined as
some number of seconds.

 But is that really a reason to reject the concept of interval
 partitioning altogether?  It seems the spec has the right idea, it
 just doesn't take it far enough to cover all the bases.

I think the standard does a good job with the partitioning.

 Whether the spec is braindead w.r.t intervals or not, Postgres is
 clearly giving the wrong answer.  A year interval is not 360 day
 intervals long.  A month interval is not shorter than 31 day
 intervals.  And, thanks to the geniuses who came up with DST, a day
 interval is not the same as 24 hour intervals anymore.  None of these
 comparisons are sane.

DST has no bearing on the fact that a day is still 86400 in the mean
solar system. There really is no partition for Day down through
seconds. It just means that for timestamp operations the day where we
spring forward is 23 hours long, and the day where we fall back is 1
day 1 hour.

Having said all this, neither a month nor a year is not a fixed number
of days. The partitioning system used by the SQL standard seems to
deal with this problem pretty well.

wt

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Tom Lane

Brendan Jurd [EMAIL PROTECTED] writes:
 On Jan 10, 2008 5:00 PM, Tom Lane [EMAIL PROTECTED] wrote:
 The spec's approach to datetime operations in general is almost totally
 brain-dead, ...

 It's true that the spec fails to consider DST, in that it doesn't
 partition day and second intervals separately.

That's only one of the ways in which they ignore DST, and not even the
most important one --- my vote for the spectacularly bad omission is
that SET TIME ZONE only allows constant offsets from UTC.

 Whether the spec is braindead w.r.t intervals or not, Postgres is
 clearly giving the wrong answer.

Sure, but it's not clear that there *is* a right answer.  As noted
upthread, a useful approximate answer can be better than no answer
at all.

 None of these comparisons are sane.

You can always refrain from making such comparisons, if you think they
are incapable of yielding useful answers.

This whole area is pretty messy, and I don't think that there is or can
be a simple uniform solution :-(.  We need to tread carefully in
introducing new behaviors that we might regret later.  So I'm not in
favor of inventing an interval division operator that just duplicates
functionality that's already there in a more-cumbersome notation.
We might want that operator back someday.  Who even wants to argue that
the result datatype should be numeric?  Dividing a three-component
quantity by another one doesn't sound to me like an operation that
naturally yields a scalar result.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Warren Turkal

On Jan 9, 2008 11:06 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Brendan Jurd [EMAIL PROTECTED] writes:
  On Jan 10, 2008 5:00 PM, Tom Lane [EMAIL PROTECTED] wrote:
  The spec's approach to datetime operations in general is almost totally
  brain-dead, ...

  It's true that the spec fails to consider DST, in that it doesn't
  partition day and second intervals separately.

 That's only one of the ways in which they ignore DST, and not even the
 most important one --- my vote for the spectacularly bad omission is
 that SET TIME ZONE only allows constant offsets from UTC.

I am assuming that you are advocating the use of the names for
timezones that can indicate what happens over a DST change. I think
that it would be useful to be able specify a timezone like PST8PDT.

  Whether the spec is braindead w.r.t intervals or not, Postgres is
  clearly giving the wrong answer.

 Sure, but it's not clear that there *is* a right answer.  As noted
 upthread, a useful approximate answer can be better than no answer
 at all.

I am not sure that I agree with that. If you need to keep track of the
days, you should probably be using intervals using day to second (or
narrower) resolution.

  None of these comparisons are sane.

 You can always refrain from making such comparisons, if you think they
 are incapable of yielding useful answers.

Maybe a way to enable strict compliance to the standard would be useful.

 This whole area is pretty messy, and I don't think that there is or can
 be a simple uniform solution :-(.  We need to tread carefully in
 introducing new behaviors that we might regret later.  So I'm not in
 favor of inventing an interval division operator that just duplicates
 functionality that's already there in a more-cumbersome notation.
 We might want that operator back someday.  Who even wants to argue that
 the result datatype should be numeric?  Dividing a three-component
 quantity by another one doesn't sound to me like an operation that
 naturally yields a scalar result.

I think this is reasonable.

wt

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-09 Thread Simon Riggs

On Thu, 2008-01-10 at 03:06 +0100, Gavin Sherry wrote:
 If the exclusion is executor driven, the planner cannot help but
 create a seq scan plan. The planner will think you're returning 100X
 rows when really you end up returning X rows. After that, all
 decisions made by the planner are totally bogus.

One of the most important queries on large tables is handling
WHERE Eventdate = CURRENT DATE;

We cannot perform partition exclusion using this type of WHERE clause at
planning time because the CURRENT DATE function is STABLE. 

We can decide that some form of partition exclusion is possible, but the
actual partition we access can *only* be decided within the executor.

That necessarily effects the selectivity estimates. The planner can see
we want some data, but it can't tell which data, so it doesn't know
whether we will hit the day with the most data or the date with the
least data.

You've mentioned this a few times, as if its a problem with dynamic
partitioning, yet its clearly an issue for any form of partitioning.

So it seems clear that we need to make partition exclusion work at
executor time, whatever else we do.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Warren Turkal

On Jan 9, 2008 10:00 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Really?  I think you've confused some unimplemented decorative syntax
 with what the underlying datatype will or won't do.

Fair enough. The underlying type certainly will do it since it works
without the opt_interval.

  This is inconsistent. I would like to ultimately not allow operations
  on interval combinations that are not allowed by the SQL standard.

 The spec's approach to datetime operations in general is almost totally
 brain-dead, and so you won't find a lot of support around here for hewing
 to the straight-and-narrow-spec-compliance approach.  If they have not
 even heard of daylight-savings time, how can anyone credit them with any
 meaningful contact with the real world?  We'll cite the spec where it
 suits us, but in this area the spec says you can't do that carries
 very little weight.

DST in the sense of doing arithmetic on timestamps? I was not aware
that the standard defined the result in such a way that precluded
allowing for DST and leap seconds and whatever other time warps you
wanted to allow in your database. In fact, looking over the draft of
the 2003 standard looks like it takes DST into consideration just
fine. It just doesn't allow the use of a non-constant timezone
identifier, which admittidly would be useful.

 Or were you planning to lobby for removal of our DST support, too?

No. The DST support makes sense.

wt

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] flex/bison output wrongly created in the source directory

2008-01-09 Thread Warren Turkal

On Jan 9, 2008 9:51 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Warren Turkal [EMAIL PROTECTED] writes:
  I was wondering if there is a reason that the flex and bison and other
  generated source files end up in the source directory when doing an
  out-of-tree build. Would a patch that puts those files in the build
  trees be accepted?

 Probably not, since our intention is that those files be distributed as
 part of source tarballs.

That makes sense.

 Also, since they are (or should be) architecture-independent, what's the
 point?  Out-of-tree builds are intended to support building for multiple
 architectures in parallel; but there's no reason to force independent
 reconstructions of these common derived files.

I was not building multiple builds in parallel as I thought they might
get fouled by the different builds. Since they do appear to not
change, I think they are just fine. Thanks for the help!

wt

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

82 matches

Mail list logo