Re: [HACKERS] optimizing constant quals within outer joins
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > I note that the rowcount is not altered by the one-time filter. Is this > > an issue? I imagine the problem is not being able to estimate the > > number of rows that pass the filter. > > That's intentional. The filter is either going to pass all or none of > the rows, not some fraction of them. It clearly isn't very reasonable > to guess that it will pass none of them (except if the qual is actually > constant FALSE). > > > I also wonder whether it wouldn't be better in this case to apply each > > filter to each arm of the merge join. > > Uh, why? For the most part, I'd think the higher you can put the filter > in the plan tree, the better. Huh, sorry, I had misunderstood the meaning of a _one_-time filter :-) -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] optimizing constant quals within outer joins
Alvaro Herrera <[EMAIL PROTECTED]> writes: > I note that the rowcount is not altered by the one-time filter. Is this > an issue? I imagine the problem is not being able to estimate the > number of rows that pass the filter. That's intentional. The filter is either going to pass all or none of the rows, not some fraction of them. It clearly isn't very reasonable to guess that it will pass none of them (except if the qual is actually constant FALSE). > I also wonder whether it wouldn't be better in this case to apply each > filter to each arm of the merge join. Uh, why? For the most part, I'd think the higher you can put the filter in the plan tree, the better. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] optimizing constant quals within outer joins
Tom Lane wrote: > regression=# explain select * from (select * from onek a where expensive(0)) > ss1 join (select * from onek b where expensive(1)) ss2 using(unique1); > QUERY PLAN > > --- > Result (cost=543.30..849.05 rows=19721 width=484) >One-Time Filter: (expensive(0) AND expensive(1)) >-> Merge Join (cost=543.30..849.05 rows=19721 width=484) > Merge Cond: (a.unique1 = b.unique1) I note that the rowcount is not altered by the one-time filter. Is this an issue? I imagine the problem is not being able to estimate the number of rows that pass the filter. > -> Sort (cost=271.65..276.61 rows=1986 width=244) >Sort Key: a.unique1 >-> Seq Scan on onek a (cost=0.00..162.86 rows=1986 width=244) > -> Sort (cost=271.65..276.61 rows=1986 width=244) >Sort Key: b.unique1 >-> Seq Scan on onek b (cost=0.00..162.86 rows=1986 width=244) > (10 rows) I also wonder whether it wouldn't be better in this case to apply each filter to each arm of the merge join. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Interval aggregate regression failure (expected seems
Michael Glaesemann <[EMAIL PROTECTED]> writes: > ... I think this just confirms that there is some kind of rounding (or > lack of) in interval_div. Kind of frustrating that it's not visible > in the result. I think the fundamental problem is that the float8 results of division are inaccurate, and yet we're assuming that we can (for instance) coerce them to integer and get exactly the right answer. For instance, in the '41 months'/10 example, I get month_remainder_days being computed as (gdb) p month_remainder $19 = 0.099645 (gdb) s 2575result->day += (int32) month_remainder_days; (gdb) p month_remainder_days $20 = 2.9893 The only way we can really fix this is to be willing to round off the numbers, and I think the only principled way to do that is to settle on a specific target accuracy, probably 1 microsecond. Then the thing to do would be to scale up all the intermediate float results to microseconds and apply rint(). Something like (untested) month_remainder = rint(span->month * USECS_PER_MONTH / factor); day_remainder = rint(span->day * USECS_PER_DAY / factor); result->month = (int32) (month_remainder / USECS_PER_MONTH); result->day = (int32) (day_remainder / USECS_PER_DAY); month_remainder -= result->month * USECS_PER_MONTH; day_remainder -= result->day * USECS_PER_DAY; /* * Handle any fractional parts the same way as in interval_mul. */ /* fractional months full days into days */ month_remainder_days = month_remainder * DAYS_PER_MONTH; extra_days = (int32) (month_remainder_days / USECS_PER_DAY); result->day += extra_days; /* fractional months partial days into time */ day_remainder += month_remainder_days - extra_days * USECS_PER_DAY; #ifdef HAVE_INT64_TIMESTAMP result->time = rint(span->time / factor + day_remainder); #else result->time = rint(span->time * 1.0e6 / factor + day_remainder) / 1.0e6; #endif This might need a few more rint() calls --- I'm assuming that float ops with exact integral inputs will be OK, which is an assumption used pretty widely in the datetime code, but ... Per the comment, if we do this here we probably want to make interval_mul work similarly. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] optimizing constant quals within outer joins
Phil Frost <[EMAIL PROTECTED]> writes: > I have an optimization I'd like to see which I think should be pretty > easy for someone familiar with the planner code to implement. I've done something about this for 8.2. It could possibly be improved on, in that it's not terribly smart about where to place the gating Result nodes, but at least it uses them correctly ... regression=# explain select * from (select * from onek a where expensive(0)) ss1 join (select * from onek b where expensive(1)) ss2 using(unique1); QUERY PLAN --- Result (cost=543.30..849.05 rows=19721 width=484) One-Time Filter: (expensive(0) AND expensive(1)) -> Merge Join (cost=543.30..849.05 rows=19721 width=484) Merge Cond: (a.unique1 = b.unique1) -> Sort (cost=271.65..276.61 rows=1986 width=244) Sort Key: a.unique1 -> Seq Scan on onek a (cost=0.00..162.86 rows=1986 width=244) -> Sort (cost=271.65..276.61 rows=1986 width=244) Sort Key: b.unique1 -> Seq Scan on onek b (cost=0.00..162.86 rows=1986 width=244) (10 rows) regression=# explain select * from (select * from onek a where expensive(0)) ss1 left join (select * from onek b where expensive(1)) ss2 using(unique1); QUERY PLAN - Result (cost=543.30..849.05 rows=19721 width=484) One-Time Filter: expensive(0) -> Merge Left Join (cost=543.30..849.05 rows=19721 width=484) Merge Cond: (a.unique1 = b.unique1) -> Sort (cost=271.65..276.61 rows=1986 width=244) Sort Key: a.unique1 -> Seq Scan on onek a (cost=0.00..162.86 rows=1986 width=244) -> Sort (cost=271.65..276.62 rows=1986 width=244) Sort Key: b.unique1 -> Result (cost=0.00..162.86 rows=1986 width=244) One-Time Filter: expensive(1) -> Seq Scan on onek b (cost=0.00..162.86 rows=1986 width=244) (12 rows) regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as
Martijn van Oosterhout writes: > Something that has been discussed in the past is allowing non-strict > type input/output functions to be evaluated at query execution time, > rather than during parse time. This would give the type input/output > functions the Expr node they need to extract this info. We could make that happen for literals used in queries (see comment in coerce_type()), but it's not appealing to expect all of the ad-hoc I/O function calls in the whole system to supply dummy expression trees. That would be adding overhead to all cases that's only useful in a few. I think requiring the functions that need this info to do extra work is probably the right answer. (It's already possible to cache whatever lookups you have to do, cf array_in or record_in, so the overhead isn't *that* daunting.) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as
Martijn van Oosterhout writes: > On Sat, Jul 01, 2006 at 10:37:59AM -0400, Tom Lane wrote: >> The output function, however, would be quite a trick. >> It's not going to receive anything except the Datum itself. > Hmm, you're right. With the taggedtypes module I made it work by > cloning the output function with a new OID each time and setting the > arg type so that procLookupArgType() would work. Similarly, the input > function would use procLookupRettype() to find the desired type. Oh, I see, you relied on flinfo->fn_oid and then did two cat lookups. That would work as long as nothing tried to call the function with DirectFunctionCall ... which is a pretty safe assumption I guess. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as
On Sat, Jul 01, 2006 at 10:58:05AM -0400, Andrew Dunstan wrote: > Tom Lane wrote: > > >The output function, however, would be quite a trick. > >It's not going to receive anything except the Datum itself. > > I understand the reasons for this limitation of output functions, but I > have now seen it bite several times. Maybe we need a little out of the > box thinking on this. I have spent a while taxing my meagre brain on it > over the last few months, without much success ;-( The thing is, in a lot of other contexts it can work easily because fcinfo->flinfo->fn_expr points the expression node for this function call, which means you can extract the relevent data out of that. This field is simply not filled in for type input/output functions. Something that has been discussed in the past is allowing non-strict type input/output functions to be evaluated at query execution time, rather than during parse time. This would give the type input/output functions the Expr node they need to extract this info. I have no idea how easy/hard this would be. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as
Tom Lane wrote: The output function, however, would be quite a trick. It's not going to receive anything except the Datum itself. I understand the reasons for this limitation of output functions, but I have now seen it bite several times. Maybe we need a little out of the box thinking on this. I have spent a while taxing my meagre brain on it over the last few months, without much success ;-( cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as
On Sat, Jul 01, 2006 at 10:37:59AM -0400, Tom Lane wrote: > > Where these input/output functions would work for any given length, so > > the 16 could be replaced by any number, or even -1 to make a variable > > length type... > > I believe you could make an input function that would support that, > though it would have to do a catalog lookup to find out the desired > type length. The output function, however, would be quite a trick. > It's not going to receive anything except the Datum itself. Hmm, you're right. With the taggedtypes module I made it work by cloning the output function with a new OID each time and setting the arg type so that procLookupArgType() would work. Similarly, the input function would use procLookupRettype() to find the desired type. So the procedure would be slightly more complicated: CREATE FUNCTION uuid_in(cstring) RETURNS uuid AS 'hex_raw_in' LANGUAGE internal; CREATE FUNCTION uuid_out(uuid) RETURNS cstring AS 'hex_raw_out' LANGUAGE internal; CREATE TYPE uuid ( input = uuid_in, output = uuid_out, internallength = 16 ); The cat lookups are irritating, but that's what syscache is for, I guess :) Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as
Martijn van Oosterhout writes: > It would also be possible to provide two functions called hex_raw_in() > and hex_raw_out() that people could use like so: > CREATE TYPE uuid ( > input = hex_raw_in, > output = hex_raw_out, > INTERNALLENGTH = 16 > ); > Where these input/output functions would work for any given length, so > the 16 could be replaced by any number, or even -1 to make a variable > length type... I believe you could make an input function that would support that, though it would have to do a catalog lookup to find out the desired type length. The output function, however, would be quite a trick. It's not going to receive anything except the Datum itself. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as
On Fri, Jun 30, 2006 at 12:45:13PM -0500, Jim C. Nasby wrote: > > If people agree to a generic 16-byte type, or a hex type with defined > > fixed length with a set of standard functions and index operators that > > it should work for, but nobody more qualified wants to make the patch > > - I'll step up. > > I think it'd be extremely useful to have a means of defining > fixed-length hex types, such as UUIDs and hashes (ie: SHA1). I usually > only see people trying to do the same thing for CHAR in poorly-designed > systems, but I suspect anyone dealing with legacy stuff might welcome > that ability as well. It would also be possible to provide two functions called hex_raw_in() and hex_raw_out() that people could use like so: CREATE TYPE uuid ( input = hex_raw_in, output = hex_raw_out, INTERNALLENGTH = 16 ); Where these input/output functions would work for any given length, so the 16 could be replaced by any number, or even -1 to make a variable length type... Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature