Re: [HACKERS] optimizing constant quals within outer joins

2006-07-01 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > I note that the rowcount is not altered by the one-time filter.  Is this
> > an issue?  I imagine the problem is not being able to estimate the
> > number of rows that pass the filter.
> 
> That's intentional.  The filter is either going to pass all or none of
> the rows, not some fraction of them.  It clearly isn't very reasonable
> to guess that it will pass none of them (except if the qual is actually
> constant FALSE).
> 
> > I also wonder whether it wouldn't be better in this case to apply each
> > filter to each arm of the merge join.
> 
> Uh, why?  For the most part, I'd think the higher you can put the filter
> in the plan tree, the better.

Huh, sorry, I had misunderstood the meaning of a _one_-time filter :-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] optimizing constant quals within outer joins

2006-07-01 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> I note that the rowcount is not altered by the one-time filter.  Is this
> an issue?  I imagine the problem is not being able to estimate the
> number of rows that pass the filter.

That's intentional.  The filter is either going to pass all or none of
the rows, not some fraction of them.  It clearly isn't very reasonable
to guess that it will pass none of them (except if the qual is actually
constant FALSE).

> I also wonder whether it wouldn't be better in this case to apply each
> filter to each arm of the merge join.

Uh, why?  For the most part, I'd think the higher you can put the filter
in the plan tree, the better.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] optimizing constant quals within outer joins

2006-07-01 Thread Alvaro Herrera
Tom Lane wrote:

> regression=# explain select * from (select * from onek a where expensive(0)) 
> ss1 join (select * from onek b  where expensive(1)) ss2 using(unique1);
>   QUERY PLAN  
>  
> ---
>  Result  (cost=543.30..849.05 rows=19721 width=484)
>One-Time Filter: (expensive(0) AND expensive(1))
>->  Merge Join  (cost=543.30..849.05 rows=19721 width=484)
>  Merge Cond: (a.unique1 = b.unique1)

I note that the rowcount is not altered by the one-time filter.  Is this
an issue?  I imagine the problem is not being able to estimate the
number of rows that pass the filter.

>  ->  Sort  (cost=271.65..276.61 rows=1986 width=244)
>Sort Key: a.unique1
>->  Seq Scan on onek a  (cost=0.00..162.86 rows=1986 width=244)
>  ->  Sort  (cost=271.65..276.61 rows=1986 width=244)
>Sort Key: b.unique1
>->  Seq Scan on onek b  (cost=0.00..162.86 rows=1986 width=244)
> (10 rows)

I also wonder whether it wouldn't be better in this case to apply each
filter to each arm of the merge join.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Interval aggregate regression failure (expected seems

2006-07-01 Thread Tom Lane
Michael Glaesemann <[EMAIL PROTECTED]> writes:
> ... I think this just confirms that there is some kind of rounding (or  
> lack of) in interval_div. Kind of frustrating that it's not visible  
> in the result.

I think the fundamental problem is that the float8 results of division
are inaccurate, and yet we're assuming that we can (for instance) coerce
them to integer and get exactly the right answer.  For instance, in the
'41 months'/10 example, I get month_remainder_days being computed as

(gdb) p month_remainder
$19 = 0.099645
(gdb) s
2575result->day += (int32) month_remainder_days;
(gdb) p month_remainder_days
$20 = 2.9893

The only way we can really fix this is to be willing to round off
the numbers, and I think the only principled way to do that is to
settle on a specific target accuracy, probably 1 microsecond.
Then the thing to do would be to scale up all the intermediate
float results to microseconds and apply rint().  Something like
(untested)

month_remainder = rint(span->month * USECS_PER_MONTH / factor);
day_remainder = rint(span->day * USECS_PER_DAY / factor);
result->month = (int32) (month_remainder / USECS_PER_MONTH);
result->day = (int32) (day_remainder / USECS_PER_DAY);
month_remainder -= result->month * USECS_PER_MONTH;
day_remainder -= result->day * USECS_PER_DAY;

/*
 * Handle any fractional parts the same way as in interval_mul.
 */

/* fractional months full days into days */
month_remainder_days = month_remainder * DAYS_PER_MONTH;
extra_days = (int32) (month_remainder_days / USECS_PER_DAY);
result->day += extra_days;
/* fractional months partial days into time */
day_remainder += month_remainder_days - extra_days * USECS_PER_DAY;

#ifdef HAVE_INT64_TIMESTAMP
result->time = rint(span->time / factor + day_remainder);
#else
result->time = rint(span->time * 1.0e6 / factor + day_remainder) / 
1.0e6;
#endif

This might need a few more rint() calls --- I'm assuming that float ops
with exact integral inputs will be OK, which is an assumption used
pretty widely in the datetime code, but ...

Per the comment, if we do this here we probably want to make
interval_mul work similarly.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] optimizing constant quals within outer joins

2006-07-01 Thread Tom Lane
Phil Frost <[EMAIL PROTECTED]> writes:
> I have an optimization I'd like to see which I think should be pretty
> easy for someone familiar with the planner code to implement.

I've done something about this for 8.2.  It could possibly be improved
on, in that it's not terribly smart about where to place the gating
Result nodes, but at least it uses them correctly ...

regression=# explain select * from (select * from onek a where expensive(0)) 
ss1 join (select * from onek b  where expensive(1)) ss2 using(unique1);
  QUERY PLAN   
---
 Result  (cost=543.30..849.05 rows=19721 width=484)
   One-Time Filter: (expensive(0) AND expensive(1))
   ->  Merge Join  (cost=543.30..849.05 rows=19721 width=484)
 Merge Cond: (a.unique1 = b.unique1)
 ->  Sort  (cost=271.65..276.61 rows=1986 width=244)
   Sort Key: a.unique1
   ->  Seq Scan on onek a  (cost=0.00..162.86 rows=1986 width=244)
 ->  Sort  (cost=271.65..276.61 rows=1986 width=244)
   Sort Key: b.unique1
   ->  Seq Scan on onek b  (cost=0.00..162.86 rows=1986 width=244)
(10 rows)

regression=# explain select * from (select * from onek a where expensive(0)) 
ss1 left join (select * from onek b  where expensive(1)) ss2 using(unique1);
 QUERY PLAN 
 
-
 Result  (cost=543.30..849.05 rows=19721 width=484)
   One-Time Filter: expensive(0)
   ->  Merge Left Join  (cost=543.30..849.05 rows=19721 width=484)
 Merge Cond: (a.unique1 = b.unique1)
 ->  Sort  (cost=271.65..276.61 rows=1986 width=244)
   Sort Key: a.unique1
   ->  Seq Scan on onek a  (cost=0.00..162.86 rows=1986 width=244)
 ->  Sort  (cost=271.65..276.62 rows=1986 width=244)
   Sort Key: b.unique1
   ->  Result  (cost=0.00..162.86 rows=1986 width=244)
 One-Time Filter: expensive(1)
 ->  Seq Scan on onek b  (cost=0.00..162.86 rows=1986 
width=244)
(12 rows)


regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-07-01 Thread Tom Lane
Martijn van Oosterhout  writes:
> Something that has been discussed in the past is allowing non-strict
> type input/output functions to be evaluated at query execution time,
> rather than during parse time. This would give the type input/output
> functions the Expr node they need to extract this info.

We could make that happen for literals used in queries (see comment in
coerce_type()), but it's not appealing to expect all of the ad-hoc I/O
function calls in the whole system to supply dummy expression trees.
That would be adding overhead to all cases that's only useful in a few.
I think requiring the functions that need this info to do extra work
is probably the right answer.  (It's already possible to cache whatever
lookups you have to do, cf array_in or record_in, so the overhead isn't
*that* daunting.)

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-07-01 Thread Tom Lane
Martijn van Oosterhout  writes:
> On Sat, Jul 01, 2006 at 10:37:59AM -0400, Tom Lane wrote:
>> The output function, however, would be quite a trick.
>> It's not going to receive anything except the Datum itself.

> Hmm, you're right. With the taggedtypes module I made it work by
> cloning the output function with a new OID each time and setting the
> arg type so that procLookupArgType() would work. Similarly, the input
> function would use procLookupRettype() to find the desired type.

Oh, I see, you relied on flinfo->fn_oid and then did two cat lookups.
That would work as long as nothing tried to call the function with
DirectFunctionCall ... which is a pretty safe assumption I guess.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-07-01 Thread Martijn van Oosterhout
On Sat, Jul 01, 2006 at 10:58:05AM -0400, Andrew Dunstan wrote:
> Tom Lane wrote:
> 
> >The output function, however, would be quite a trick.
> >It's not going to receive anything except the Datum itself.
> 
> I understand the reasons for this limitation of output functions, but I 
> have now seen it bite several times. Maybe we need a little out of the 
> box thinking on this. I have spent a while taxing my meagre brain on it 
> over the last few months, without much success ;-(

The thing is, in a lot of other contexts it can work easily because
fcinfo->flinfo->fn_expr points the expression node for this function
call, which means you can extract the relevent data out of that. This
field is simply not filled in for type input/output functions.

Something that has been discussed in the past is allowing non-strict
type input/output functions to be evaluated at query execution time,
rather than during parse time. This would give the type input/output
functions the Expr node they need to extract this info.

I have no idea how easy/hard this would be.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-07-01 Thread Andrew Dunstan



Tom Lane wrote:


The output function, however, would be quite a trick.
It's not going to receive anything except the Datum itself.


 



I understand the reasons for this limitation of output functions, but I 
have now seen it bite several times. Maybe we need a little out of the 
box thinking on this. I have spent a while taxing my meagre brain on it 
over the last few months, without much success ;-(


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-07-01 Thread Martijn van Oosterhout
On Sat, Jul 01, 2006 at 10:37:59AM -0400, Tom Lane wrote:
> > Where these input/output functions would work for any given length, so
> > the 16 could be replaced by any number, or even -1 to make a variable
> > length type...
> 
> I believe you could make an input function that would support that,
> though it would have to do a catalog lookup to find out the desired
> type length.  The output function, however, would be quite a trick.
> It's not going to receive anything except the Datum itself.

Hmm, you're right. With the taggedtypes module I made it work by
cloning the output function with a new OID each time and setting the
arg type so that procLookupArgType() would work. Similarly, the input
function would use procLookupRettype() to find the desired type.

So the procedure would be slightly more complicated:

CREATE FUNCTION uuid_in(cstring) RETURNS uuid AS 'hex_raw_in' LANGUAGE internal;
CREATE FUNCTION uuid_out(uuid) RETURNS cstring AS 'hex_raw_out' LANGUAGE 
internal;
CREATE TYPE uuid (
  input = uuid_in,
  output = uuid_out,
  internallength = 16
);

The cat lookups are irritating, but that's what syscache is for, I
guess :)

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-07-01 Thread Tom Lane
Martijn van Oosterhout  writes:
> It would also be possible to provide two functions called hex_raw_in()
> and hex_raw_out() that people could use like so:

> CREATE TYPE uuid (
>   input = hex_raw_in,
>   output = hex_raw_out,
>   INTERNALLENGTH = 16
> );

> Where these input/output functions would work for any given length, so
> the 16 could be replaced by any number, or even -1 to make a variable
> length type...

I believe you could make an input function that would support that,
though it would have to do a catalog lookup to find out the desired
type length.  The output function, however, would be quite a trick.
It's not going to receive anything except the Datum itself.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-07-01 Thread Martijn van Oosterhout
On Fri, Jun 30, 2006 at 12:45:13PM -0500, Jim C. Nasby wrote:
> > If people agree to a generic 16-byte type, or a hex type with defined
> > fixed length with a set of standard functions and index operators that
> > it should work for, but nobody more qualified wants to make the patch
> > - I'll step up.
> 
> I think it'd be extremely useful to have a means of defining
> fixed-length hex types, such as UUIDs and hashes (ie: SHA1). I usually
> only see people trying to do the same thing for CHAR in poorly-designed
> systems, but I suspect anyone dealing with legacy stuff might welcome
> that ability as well.

It would also be possible to provide two functions called hex_raw_in()
and hex_raw_out() that people could use like so:

CREATE TYPE uuid (
  input = hex_raw_in,
  output = hex_raw_out,
  INTERNALLENGTH = 16
);

Where these input/output functions would work for any given length, so
the 16 could be replaced by any number, or even -1 to make a variable
length type...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature