[HACKERS] timestamp datatype cleanup

2008-03-09 Thread Warren Turkal
PosgreSQL hackers,

Here's an initial bit of my attempt at cleaning up the the timestamp datatype.
I have gone through the backend and made a couple small changes to stop using
the HAVE_INT64_TIMESTAMP define to select a type in code by creating typedefs
in a header and using the typedef in the code. I think this small bit is ready
for inclusion for this small bit, but I have a couple questions for further
work.

1) Is there a reason that header information is duplicated between normal
posgresql include and ecpg includes instead of defining the info in one place
and #including it into the files that need it?

2) Would it be reasonable to change timestamp.h into a file that includes other
files that define the specific parts depending on HAVE_INT64_TIMESTAMP instead
of testing for HAVE_INT64_TIMESTAMP many times throughout timestamp.h? I think
this might more cleanly separate the logic for the different timestamp types.

Thanks,
wt

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Clean up date/time variable type selection

2008-03-09 Thread Warren Turkal
I have removed all instances of using HAVE_INT64_TIMESTAMP to determine the
type of a variable in files depending on timestamp.h.
---
 src/backend/utils/adt/nabstime.c |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/adt/nabstime.c b/src/backend/utils/adt/nabstime.c
index 992f7ab..5a662f3 100644
--- a/src/backend/utils/adt/nabstime.c
+++ b/src/backend/utils/adt/nabstime.c
@@ -832,11 +832,7 @@ interval_reltime(PG_FUNCTION_ARGS)
month,
day;
 
-#ifdef HAVE_INT64_TIMESTAMP
-   int64   span;
-#else
-   double  span;
-#endif
+   TimeOffset  span;
 
year = interval-month / MONTHS_PER_YEAR;
month = interval-month % MONTHS_PER_YEAR;
-- 
1.5.2.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Add TimeOffset and DateOffset typedefs

2008-03-09 Thread Warren Turkal
I added TimeOffset and DateOffset typedefs to get rid of the instances
using the HAVE_INT64_TIMESTAMP define being used to determine the
types of variables or functions in timestamp.c.
---
 src/backend/utils/adt/timestamp.c |   77 +++-
 src/include/utils/timestamp.h |4 ++
 2 files changed, 19 insertions(+), 62 deletions(-)

diff --git a/src/backend/utils/adt/timestamp.c 
b/src/backend/utils/adt/timestamp.c
index 9b90873..4759e22 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -44,11 +44,7 @@
 TimestampTz PgStartTime;
 
 
-#ifdef HAVE_INT64_TIMESTAMP
-static int64 time2t(const int hour, const int min, const int sec, const fsec_t 
fsec);
-#else
-static double time2t(const int hour, const int min, const int sec, const 
fsec_t fsec);
-#endif
+static TimeOffset time2t(const int hour, const int min, const int sec, const 
fsec_t fsec);
 static int EncodeSpecialTimestamp(Timestamp dt, char *str);
 static Timestamp dt2local(Timestamp dt, int timezone);
 static void AdjustTimestampForTypmod(Timestamp *time, int32 typmod);
@@ -977,11 +973,7 @@ AdjustIntervalForTypmod(Interval *interval, int32 typmod)
}
else if (range == INTERVAL_MASK(MINUTE))
{
-#ifdef HAVE_INT64_TIMESTAMP
-   int64   hour;
-#else
-   double  hour;
-#endif
+   TimeOffset  hour;
 
interval-month = 0;
interval-day = 0;
@@ -998,11 +990,7 @@ AdjustIntervalForTypmod(Interval *interval, int32 typmod)
}
else if (range == INTERVAL_MASK(SECOND))
{
-#ifdef HAVE_INT64_TIMESTAMP
-   int64   minute;
-#else
-   double  minute;
-#endif
+   TimeOffset  minute;
 
interval-month = 0;
interval-day = 0;
@@ -1076,11 +1064,7 @@ AdjustIntervalForTypmod(Interval *interval, int32 typmod)
else if (range == (INTERVAL_MASK(MINUTE) |
   INTERVAL_MASK(SECOND)))
{
-#ifdef HAVE_INT64_TIMESTAMP
-   int64   hour;
-#else
-   double  hour;
-#endif
+   TimeOffset  hour;
 
interval-month = 0;
interval-day = 0;
@@ -1342,11 +1326,7 @@ timestamptz_to_str(TimestampTz t)
 void
 dt2time(Timestamp jd, int *hour, int *min, int *sec, fsec_t *fsec)
 {
-#ifdef HAVE_INT64_TIMESTAMP
-   int64   time;
-#else
-   double  time;
-#endif
+   TimeOffset  time;
 
time = jd;
 
@@ -1547,13 +1527,8 @@ recalc_t:
 int
 tm2timestamp(struct pg_tm * tm, fsec_t fsec, int *tzp, Timestamp *result)
 {
-#ifdef HAVE_INT64_TIMESTAMP
-   int date;
-   int64   time;
-#else
-   double  date,
-   time;
-#endif
+   DateOffset  date;
+   TimeOffset  time;
 
/* Julian day routines are not correct for negative Julian days */
if (!IS_VALID_JULIAN(tm-tm_year, tm-tm_mon, tm-tm_mday))
@@ -1596,13 +1571,8 @@ tm2timestamp(struct pg_tm * tm, fsec_t fsec, int *tzp, 
Timestamp *result)
 int
 interval2tm(Interval span, struct pg_tm * tm, fsec_t *fsec)
 {
-#ifdef HAVE_INT64_TIMESTAMP
-   int64   time;
-   int64   tfrac;
-#else
-   double  time;
-   double  tfrac;
-#endif
+   TimeOffset  time;
+   TimeOffset  tfrac;
 
tm-tm_year = span.month / MONTHS_PER_YEAR;
tm-tm_mon = span.month % MONTHS_PER_YEAR;
@@ -1658,19 +1628,15 @@ tm2interval(struct pg_tm * tm, fsec_t fsec, Interval 
*span)
return 0;
 }
 
-#ifdef HAVE_INT64_TIMESTAMP
-static int64
+static TimeOffset
 time2t(const int hour, const int min, const int sec, const fsec_t fsec)
 {
+#ifdef HAVE_INT64_TIMESTAMP
return (hour * MINS_PER_HOUR) + min) * SECS_PER_MINUTE) + sec) * 
USECS_PER_SEC) + fsec;
-}  /* time2t() */
 #else
-static double
-time2t(const int hour, const int min, const int sec, const fsec_t fsec)
-{
return (((hour * MINS_PER_HOUR) + min) * SECS_PER_MINUTE) + sec + fsec;
-}  /* time2t() */
 #endif
+}  /* time2t() */
 
 static Timestamp
 dt2local(Timestamp dt, int tz)
@@ -2042,13 +2008,8 @@ timestamptz_cmp_timestamp(PG_FUNCTION_ARGS)
 static int
 interval_cmp_internal(Interval *interval1, Interval *interval2)
 {
-#ifdef HAVE_INT64_TIMESTAMP
-   int64   span1,
+   TimeOffset  span1,
span2;
-#else
-   double  span1,
-   span2;
-#endif
 
span1 = interval1-time;
span2 = interval2-time;
@@ -2388,11 +2349,7 @@ interval_justify_interval(PG_FUNCTION_ARGS)
Interval   *span = 

[HACKERS] timestamp datatype cleanup

2008-03-09 Thread Warren Turkal
PosgreSQL hackers,

Here's an initial bit of my attempt at cleaning up the the timestamp datatype.
I have gone through the backend and made a couple small changes to stop using
the HAVE_INT64_TIMESTAMP define to select a type in code by creating typedefs
in a header and using the typedef in the code. I think this small bit is ready
for inclusion for this small bit, but I have a couple questions for further
work.

1) Is there a reason that header information is duplicated between normal
posgresql include and ecpg includes instead of defining the info in one place
and #including it into the files that need it?

2) Would it be reasonable to change timestamp.h into a file that includes other
files that define the specific parts depending on HAVE_INT64_TIMESTAMP instead
of testing for HAVE_INT64_TIMESTAMP many times throughout timestamp.h? I think
this might more cleanly separate the logic for the different timestamp types.

Thanks,
wt

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] timestamp datatype cleanup

2008-03-09 Thread Michael Meskes
On Sun, Mar 09, 2008 at 12:32:20AM -0800, Warren Turkal wrote:
 1) Is there a reason that header information is duplicated between normal
 posgresql include and ecpg includes instead of defining the info in one place
 and #including it into the files that need it?

As long as both implementations are kept in sync I don't see a reason.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Lazy constraints / defaults

2008-03-09 Thread Michał Zaborowski
Hello,
 I would like to be able to add CONSTRAINT and/or DEFAULT with out
affecting old rows. Yes, it sounds strange, but... Let's say I have
big table, I want to add new column, with DEFAULT and NOT NULL.
Normally it means long exclusive lock. So - right now I'm adding plain
new column, then DEFAULT, then UPDATE on all rows in chunks, then NOT
NULL... Can it be little simpler?

-- 
Regards,
  Michał Zaborowski (TeXXaS)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] timestamp datatype cleanup

2008-03-09 Thread Andrew Chernow

Michael Meskes wrote:

On Sun, Mar 09, 2008 at 12:32:20AM -0800, Warren Turkal wrote:

1) Is there a reason that header information is duplicated between normal
posgresql include and ecpg includes instead of defining the info in one place
and #including it into the files that need it?


Merlin and I ran into this question while working on the libpq type system, 
found here:


http://archives.postgresql.org/pgsql-patches/2008-03/msg00057.php

It opens up the binary parameterized interface in libpq by providing data type 
converters, to and from external binary format and C data types, when performing 
queries or getting results.  This required copying and pasting some code from 
the backend for a couple types ... like datetime and numeric.  We had to work 
around several backend compile-time checks by using run-time checks; being how 
servers are compiled differently.


The most modular approach would be to abstract the backend/utils/adt API so the 
backend and client stuff can share the functionality.  We did mention this 
during one of our patch submissions, but didn't push it as it is a large change 
outside the scope of our patch.


 As long as both implementations are kept in sync I don't see a reason.

Sharing the backend data type converters with client stuff is an obvious win, 
but its a tedious process probably lacking any motivation.  We did look at it 
though, it is possible.


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: Comments on system catalogs?

2008-03-09 Thread Dawid Kuroczko
On Fri, Mar 7, 2008 at 3:51 PM, Bruce Momjian [EMAIL PROTECTED] wrote:
~ Jim C. Nasby wrote:
  On Wed, Jul 04, 2007 at 01:03:20PM +0200, Dawid Kuroczko wrote:
Hello.
   
I think it could be a nice idea to put descriptions from
http://www.postgresql.org/docs/8.2/static/catalogs.html
into system catalogs itself.  I.e., make a bunch of
   
COMMENT ON COLUMN pg_class.relname
   IS 'Name of the table, index, view, etc.';
...
COMMENT ON COLUMN pg_class.relkind
   IS 'r = ordinary table, i = index, S = sequence, v = view, c =
composite type, t = TOAST table';
   
   Actually, this does exist for some things in the catalog; I suspect it
   just wasn't done in the past (perhaps Postgres didn't originally have
   comments). I think it would be a useful addition. But I think it'd need
   to be more than just a .sql file (initdb would probably need to be
   modified). Ideally, we'd be able to suck the info out of the appropriate
   .sgml files.

  Added to TODO:

  * Add comments on system tables/columns using the information in
   catalogs.sgml

   Ideally the information would be pulled from the SGML file
   automatically.

Since I have raised the issue back then, I volunteer for doing this TODO.

As I am lazy by nature, I am sure I will try to use SGML files instead of
tediuos copypaste routine. ;-)

   Regards,
  Dawid

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lazy constraints / defaults

2008-03-09 Thread Tom Lane
=?ISO-8859-2?Q?Micha=B3_Zaborowski?= [EMAIL PROTECTED] writes:
  I would like to be able to add CONSTRAINT and/or DEFAULT with out
 affecting old rows.

You mean without actually checking that the old rows satisfy the
constraint?  There's approximately zero chance that that proposal
will be accepted.

 Yes, it sounds strange, but... Let's say I have
 big table, I want to add new column, with DEFAULT and NOT NULL.
 Normally it means long exclusive lock. So - right now I'm adding plain
 new column, then DEFAULT, then UPDATE on all rows in chunks, then NOT
 NULL... Can it be little simpler?

Just do it all in one ALTER command.

alter table tab add column col integer not null default 42 check (col  0);

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] timestamp datatype cleanup

2008-03-09 Thread Michael Meskes
  As long as both implementations are kept in sync I don't see a reason.

 Sharing the backend data type converters with client stuff is an obvious 
 win, but its a tedious process probably lacking any motivation.  We did 
 look at it though, it is possible.

I thought we were just talking about some definitions. If we were to
move the whole logic into one place we might be facing a
performance penalty inside the backend, maybe not a large one though.
Others around here are better suited to judge this.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] timestamp datatype cleanup

2008-03-09 Thread Andrew Chernow

I thought we were just talking about some definitions.

I was expanding on your #1 question, which was directly talking about shared 
headers rather than just cleaning out HAVE_INT64_TIMESTAMP.  I had the same 
experience but also ran into the need for shared library code; which BTW ecpg 
could benefit from as well.


 performance penalty inside the backend

The idea requires reorganizing, not reimplementing.  There shouldn't be a change 
in performance in either direction.


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum statistics target

2008-03-09 Thread Stephen Denne
Tom Lane wrote:
 Martijn van Oosterhout [EMAIL PROTECTED] writes:
  On Fri, Mar 07, 2008 at 07:25:25PM +0100, Peter Eisentraut wrote:
  What's the problem with setting it to ten million if I 
 have ten million values 
  in the table and I am prepared to spend the resources to 
 maintain those 
  statistics?
 
  That it'll probably take 10 million seconds to calculate the plans
  using it? I think Tom pointed there are a few places that are O(n^2)
  the number entries...
 
 I'm not wedded to the number 1000 in particular --- obviously that's
 just a round number.  But it would be good to see some 
 performance tests
 with larger settings before deciding that we don't need a limit.

I recently encountered a situation where I would have liked to be able to try a 
larger limit (amongst other ideas for improving my situation):

I have a field whose distribution of frequencies of values is roughly 
geometric, rather than flat.
Total rows = 36 million
relpages=504864
Distinct field values in use = 169
10 values account for 50% of the rows.
41 values account for 90% of the rows.

After setting statistics target to 1000 for that field, and analyzing the 
table, the statistics row for that field had 75 most frequent values and a 
histogram with 76 entries in it. Estimating 151 values in total.

For this situation using a larger statistics target should result in more pages 
being read, and a more accurate record of statistics. It shouldn't result in 
significantly more work for the planner.

It wouldn't solve my problem though, which is frequent over-estimation of rows 
when restricting by this field with values not known at plan time.

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal Business Quality 
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lazy constraints / defaults

2008-03-09 Thread Dawid Kuroczko
On Sun, Mar 9, 2008 at 7:25 PM, Tom Lane [EMAIL PROTECTED] wrote:
 =?ISO-8859-2?Q?Micha=B3_Zaborowski?= [EMAIL PROTECTED] writes:
I would like to be able to add CONSTRAINT and/or DEFAULT with out
   affecting old rows.

  You mean without actually checking that the old rows satisfy the
  constraint?  There's approximately zero chance that that proposal
  will be accepted.

I think the problem here is to minimize the time when table is held by
exclusive lock,
Something similar to the CREATE INDEX CONCURRENTLY (i.e. hold exclusive lock
for a jiffy, then do the actual work for the old tuples).

So, the proposal would read as to add the ability to perform:

  ALTER TABLE CONCURRENTLY ALTER COLUMN foo SET NOT NULL
...where exclusive lock would be held to place the constraint (so all new
tuples would satisfy it), lock would be released and the old tuples would
be checked to make sure the constraint is valid.

Should a NULL value be found or should the backend die, the constraint
should disappear or be marked invalid.

   Yes, it sounds strange, but... Let's say I have
   big table, I want to add new column, with DEFAULT and NOT NULL.
   Normally it means long exclusive lock. So - right now I'm adding plain
   new column, then DEFAULT, then UPDATE on all rows in chunks, then NOT
   NULL... Can it be little simpler?

  Just do it all in one ALTER command.

  alter table tab add column col integer not null default 42 check (col  0);

I think this will not solve the OP's problem.  He wants to minimize the time
a table is under exclusive lock, and this ALTER command will effectively
rewrite the whole table (to add new not null column).

Probably a workable solution would be to play with inheritance:
-- Add the NULL col colum:
  ALTER TABLE tab ADD COLUMN col integer;
-- Create a table which will have col NOT NULL
  CREATE TABLE tab_new (LIKE tab INCLUDING DEFAULTS INCLUDING
CONSTRAINTS INCLUDING INDEXES ) INHERITS (tab);
  ALTER TABLE tab_new ALTER COLUMN col SET NOT NULL;
-- Make the new values go to tab_new, if simple enough same might be
done for UPDATEs
  CREATE RULE insert_new AS ON INSERT TO tab DO INSTEAD INSERT INTO
tab_new VALUES (NEW.*);

-- Now, make a job which will do something like this:
  START TRANSACTION ISOLATON LEVEL SERIALIZABLE;
  UPDATE ONLY tab SET col = 42 WHERE id BETWEEN n AND n + 1000;
  INSERT INTO tab_new SELECT * FROM ONLY tab WHERE id BETWEEN n AND n + 1000;
  -- or better:
  -- INSERT INTO tab_new SELECT a,b,c,42 AS col FROM ONLY tab WHERE id
BETWEEN n AND n + 1000 FOR UPDATE;
  DELETE FROM ONLY tab WHERE id BETWEEN n AND n + 1000;
  COMMIT;

-- Finally, exhange parti^W^W get rid of old tab:
  SELECT count(*) FROM ONLY tab; -- should be zero
  ALTER TABLE tab RENAME TO tab_old;
  ALTER TABLE tab_new RENAME TO tab;
  ALTER TABLE tab NO INHERIT tab_old;

Of course each step should be done in transaction, probably starting
with explicit LOCK.  And extra care should be taken
with respect to the UNIQUE constraints.  In short: unless you are 100%
sure what you are doing, don't. :-)

   Regards,
  Dawid

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Estimating geometric distributions

2008-03-09 Thread Stephen Denne
I wrote:
 I have a field whose distribution of frequencies of values is 
 roughly geometric, rather than flat.
 Total rows = 36 million
 relpages=504864
 Distinct field values in use = 169
 10 values account for 50% of the rows.
 41 values account for 90% of the rows.
 
 After setting statistics target to 1000 for that field, and 
 analyzing the table, the statistics row for that field had 75 
 most frequent values and a histogram with 77 entries in it. 
 Estimating 152 values in total.

public;mytable;myfield;0;4;152;{202,179,8,181,173,207,6,118,107,205,182,4,54,247,168,77,169,53,120,159,149,174,167,156,148,150,56,108,66,119,5,99,96,175,97,208,1,130,10,102,228,101,121,50,11,152,32,12,78,221,55,244,241,252,203,116,103,184,154,153,238,65,49,220,83,98,111,85,139,242,240,260,7,109,114};{0.0836433,0.0781667,0.0738367,0.0598533,0.04629,0.04447,0.0359833,0.0314267,0.0278333,0.0268,0.0251433,0.0244867,0.02438,0.0223433,0.0207567,0.0189667,0.0168833,0.01582,0.0150267,0.0141767,0.0130467,0.0128933,0.0125767,0.0123567,0.0116567,0.0114967,0.01048,0.01037,0.00994667,0.00987667,0.00977667,0.00965333,0.00916333,0.00828667,0.00732667,0.00712,0.00629,0.00624,0.00576667,0.00558667,0.00477667,0.00475333,0.00410333,0.00405667,0.00371667,0.00334667,0.00334,0.00312667,0.00312667,0.00302,0.00300333,0.00295,0.00287333,0.00271,0.00267,0.00240667,0.00224,0.00221333,0.00215333,0.0021,0.00205667,0.00202667,0.00197333,0.00197333,0.00168667,0.00166,0.00159333,0.00159,0.00154667,0.00150333,0.00149,0.0013,0.00132,0.00112667,0.00104};{2,9,9,9,67,76,84,84,86,87,87,88,95,100,100,100,104,105,105,110,112,112,128,137,137,138,143,144,144,144,151,155,155,155,157,157,158,171,171,183,185,185,185,185,187,194,199,199,200,200,201,204,204,209,209,214,214,214,214,215,217,225,225,225,229,239,239,249,250,250,253,253,255,257,261,262,266};0.449246

My problem is frequent 
 over-estimation of rows when restricting by this field with 
 values not known at plan time.

examples:
select * from mytable where myfield = ?;
select * from mytable where myfield in (subquery);

My arithmetic mean of the frequencies is 214200
My geometric mean is 13444

However analyze didn't find all my values, and thinks that there are only 152 
of them, so it uses a mean of 238046


When the subquery is estimated to return three myfield values, the query 
estimates 714138 rows, and chooses a sequential scan over mytable (myfield is 
indexed).

explain select * from mytable where myfield in (values (1),(2),(3));

Hash IN Join  (cost=0.08..1009521.37 rows=714138 width=86)
  Hash Cond: (mytable.myfield = *VALUES*.column1)
  -  Seq Scan on mytable  (cost=0.00..866693.76 rows=36182976 width=86)
  -  Hash  (cost=0.04..0.04 rows=3 width=4)
-  Values Scan on *VALUES*  (cost=0.00..0.04 rows=3 width=4)

I think this query is much more likely to return around 4 rows, and a 
Bitmap Index Scan should be used.

explain select * from mytable where myfield in (values (1),(2));

Nested Loop  (cost=4445.11..931383.93 rows=476092 width=86)
  -  Unique  (cost=0.04..0.04 rows=2 width=4)
-  Sort  (cost=0.04..0.04 rows=2 width=4)
  Sort Key: *VALUES*.column1
  -  Values Scan on *VALUES*  (cost=0.00..0.03 rows=2 width=4)
  -  Bitmap Heap Scan on mytable  (cost=4445.08..462716.37 rows=238046 
width=86)
Recheck Cond: (mytable.myfield = *VALUES*.column1)
-  Bitmap Index Scan on myindex  (cost=0.00..4385.56 rows=238046 
width=0)
  Index Cond: (mytable.myfield = *VALUES*.column1)

The expected number of loops (2 here, 3 above) through the Bitmap Heap Scan * 
462716.37  1009521.37, but the cost estimate is far too high in the general 
case. It should be closer to 26000 per loop if adjusted for my expectation of 
the number of rows, being 13444 per loop. As such, you should need to expect 
close to 40 myfield values being returned by the subquery before choosing a 
sequential scan.

Is there any facility already in PostgreSQL to help me here?

Hopefully an index type that I don't know about yet? (Geometric distributions 
are similar to those found in word count distributions).

If not... is there any merit in this idea:

During the analyze process, the geometric mean of sampled rows was calculated, 
and if determined to be significantly different from the arithmetic mean, 
stored in a new stats column. When estimating the number of rows that will be 
returned by queries of the form shown above, if there is a geometric mean 
stored, use it instead of the arithmetic mean.

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal 

[HACKERS] Cleaning up the commit fest to-do list

2008-03-09 Thread Tom Lane
I've gone through the patch queue and marked all or at least most of
the threads that don't contain any commit-fest material.  (The remaining
threads contain actual patches, or at least design discussions that seem
worth getting more eyeballs on at the present time.)  Could you pull out
all that stuff into a separate TODO mailbox?  It should not go into
pgpatches-hold because it's not going to be material for future
commit-fests, either, until someone does some actual work.  I don't want
to have to repeat this sorting exercise for the next commit fest.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] timestamp datatype cleanup

2008-03-09 Thread Tom Lane
Michael Meskes [EMAIL PROTECTED] writes:
 As long as both implementations are kept in sync I don't see a reason.
 
 Sharing the backend data type converters with client stuff is an obvious 
 win, but its a tedious process probably lacking any motivation.  We did 
 look at it though, it is possible.

 I thought we were just talking about some definitions. If we were to
 move the whole logic into one place we might be facing a
 performance penalty inside the backend, maybe not a large one though.
 Others around here are better suited to judge this.

It's already a huge maintenance problem that there are two copies of the
datetime code (and no, they are *not* being kept in sync; most patches
to the core code have not gone into ecpg).  The prospect of extending
that mess to even more datatypes fills me with dread.

I don't know what we could do about it though.  We can't really
replicate the conveniences of the backend coding environment in a
frontend application, and yet I surely don't want to tighten up the
expectations on backend datatypes to try to make them work in a
frontend-ish environment.  The chances of that happening reliably
are about nil.

My feeling is that we should resist the temptation to provide any such
functionality on the frontend side (and yes, that inclues ripping out
ecpg's datetime stuff).  It'll be a black hole sucking infinite numbers
of maintenance man-hours, with not nearly enough benefit in return.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] timestamp datatype cleanup

2008-03-09 Thread Andrew Chernow

conveniences of the backend coding environment in a frontend application

I don't think this is required.  I'm thinking about light-weight wrappers around 
A-2-B style converters, no ties to the backend or client.  A free standing util 
library.  The idea doesn't require a backend environment.


want to tighten up the expectations on backend datatypes to try
to make them work in a frontend-ish environment.

Making timestamp2tm or tm2timestamp (for example) work in the backend as well as 
the client seems rather straight forward, unless I am missing something.  I am 
not talking about making the data type functional: like add, divide, multiply 
funtions.  I am only referring to data type converters.


The chances of that happening reliably are about nil.

Why?  Again, I could be missing something.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump additional options for performance

2008-03-09 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 Decibel! wrote:
 On Feb 26, 2008, at 4:36 PM, Tom Lane wrote:
  I think a sane way to think about what Simon would like to accomplish
  is not turn psql into a parallel job scheduler
 
 
 My $0.02: I often find myself wishing I could perform parallel  
 operations in psql. There was a proposal for that that came up during  
 8.3 development; whatever happened to it?

 The concurrent psql patch was never updated to an acceptable state for
 it to be reviewed.

Well, I got tied in knots trying to fix up the SIGINT handling working
properly. I want to try rewriting it from scratch to try to produce a cleaner
version.

But that doesn't mean nobody should look at the patch and give comments. It
would be silly for me to rewrite it keeping things basically the way they are
now just cleaned up -- only to then find out that people disagree with the
basic approach and don't accept it anyways. This may be a trivial example but
some pretty major patches which required a lot of work have been submitted
before where nobody read any of the WIP patches and then basic questions were
raised long after a lot of work had been committed.

We seem to often think of review as equivalent to commit. But patch
authors often want to get some confirmation they're on the right track before
they move on the next step. As a result many patches kind of get stuck in a
catch-22 where they're not ready for review and no ready for development
either.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cleaning up the commit fest to-do list

2008-03-09 Thread Bruce Momjian
Tom Lane wrote:
 I've gone through the patch queue and marked all or at least most of
 the threads that don't contain any commit-fest material.  (The remaining
 threads contain actual patches, or at least design discussions that seem
 worth getting more eyeballs on at the present time.)  Could you pull out
 all that stuff into a separate TODO mailbox?  It should not go into
 pgpatches-hold because it's not going to be material for future
 commit-fests, either, until someone does some actual work.  I don't want
 to have to repeat this sorting exercise for the next commit fest.

I will deal with them and add them TODO and remove them.  I was hoping
more people would add comments before I made them TODO's, but perhaps
not.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include Lists for Text Search

2008-03-09 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 I've coded a small patch to allow CaseSensitive synonyms.

Applied with corrections (it'd be good if you at least pretended to test
stuff before submitting it).

Would a similar parameter be useful for any of the other dictionary
types?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Negative LIMIT and OFFSET?

2008-03-09 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Thu, 2007-12-13 at 22:23 -0800, Neil Conway wrote:
 If LIMIT n means emit at most n tuples, then a query that produces 0
 rows with n  0 is arguably violating its spec, since it has produced
 more tuples than the LIMIT specified (0  n). Interpreted this way, no
 result set can be consistent with a negative limit, so I'd vote for
 throwing an error.

 I even found an existing, unused error message called
 ERRCODE_INVALID_LIMIT_VALUE

 so here's a patch.

Applied, but using just ERRCODE_INVALID_PARAMETER_VALUE rather than
guessing what the SQL committee intended with that SQLSTATE.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Need -fwrapv or -fno-strict-overflow for gcc-4.3

2008-03-09 Thread Kris Jurka


A while back Tom Lane presented the potential problem with gcc signed 
overflow handling, but now it's not just a theoretical problem.


http://archives.postgresql.org/pgsql-hackers/2005-12/msg00635.php

Gcc 4.3 has started to perform optimizations based on the denial of the 
existence of signed overflow. Building CVS HEAD with gcc 4.3rc2 I get the 
following warnings:


localtime.c:1223: warning: assuming signed overflow does not occur when 
assuming that (X + c)  X is always false
localtime.c:1227: warning: assuming signed overflow does not occur when 
assuming that (X - c)  X is always false
array_userfuncs.c:100: warning: assuming signed overflow does not occur 
when assuming that (X - c)  X is always false
float.c:2696: warning: assuming signed overflow does not occur when 
assuming that (X + c)  X is always false
float.c:2712: warning: assuming signed overflow does not occur when 
assuming that (X + c)  X is always false
oracle_compat.c:1479: warning: assuming signed overflow does not occur 
when assuming that (X + c)  X is always false


I don't understand the difference between -fwrapv and 
-fno-strict-aliasing, but it seems we need at least one of them.


http://www.airs.com/blog/archives/120

Kris Jurka

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers