Re: [HACKERS] [GENERAL] Queries joining views

2006-08-23 Thread Tom Lane
Alban Hertroys <[EMAIL PROTECTED]> writes:
> Mm_object is always larger than any other table in the database, as 
> every table joins with (different) records in it to determine it's otype 
> and owner. So I don't understand how a fraction of any of those tables 
> could be larger than mm_object as a whole...

No, I said a larger fraction, not a larger absolute number of tuples.
The problem is that because mm_product contains only very small values
of "number", a mergejoin looks like a great way to join it to mm_object:
only the first 5% of mm_object will need to be scanned.  The bug
consists in applying that 5% number to mm_insrel, for which it's not
correct.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Queries joining views

2006-08-23 Thread Alban Hertroys

Tom Lane wrote:

Alban Hertroys <[EMAIL PROTECTED]> writes:

I'm confused too.  Would it be possible for you to send me a dump of
your database?


Attached is a cleaned out database, the full schema is included, but 
only the relevant tables contain any data.


Thanks.  After digging through it a bit, I understand what's happening,
but I'm not seeing any simple fix.  The problem is basically that
you've got

create or replace view mm_product as
 SELECT ... FROM mm_product_table JOIN mm_object mmop USING (number);

and then the problem query has WHERE mm_product.number = insrel.snumber
which causes the planner to conclude that mm_product_table.number,
mm_object.number, and mm_insrel_table.snumber are all basically
interchangeable.  In particular it ends up performing the join between
mm_product_table.number and mm_object.number as though
mm_product_table.number were being joined to mm_insrel_table.snumber.


It's even worse, I guess, as the mm_insrel view joins mm_insrel_table 
with mm_object again. So basically the query performs a self-join on 
mm_object with a detour through mm_insrel_table and mm_product_table...



Which is fine, except that it's thinking that the statistics for
mm_object.number are applicable in this context, and they're completely
misleading.  After the join to mm_insrel_table, the statistics of the
variable are really like mm_insrel_table.number --- in particular the
fraction of the table that has to be visited is much larger than it
would've been for mm_object as a whole.


I don't entirely understand what you're saying here.

Mm_object is always larger than any other table in the database, as 
every table joins with (different) records in it to determine it's otype 
and owner. So I don't understand how a fraction of any of those tables 
could be larger than mm_object as a whole...


In fact, originally the schema used inheritance; every table inherited 
(directly or indirectly) from mm_object. As this resulted in unions, 
which caused much more performance problems than the current 
view-approach, I implemented the current approach.

In fact, this approach was lent from what MMBase uses for the MSSQL layer.

Well, as I implemented the way the views are defined, there is room for 
changes in that area. Suggestions are welcome.



This is a problem we've understood in a generic form for awhile:
a join or selection might change the statistics of a variable,
and so the info stored in the catalogs ought to be modified somehow
to predict what will happen at upper join levels.  We've not seen
it in this particular form before, though.

I'm not sure if there's a whole lot you can do about it in the near term
other than refactor your schema to avoid having different tables joining
to different subranges of mm_object.number.  (You don't necessarily have
to get rid of mm_object --- just try assigning its keys from a serial,
or something, so that there's no correlation to the ranges of keys in
other tables.)


Unfortunately the number key is required to correlate to the number keys 
in other tables. That's the whole point of that table. It's also already 
generated from a sequence...


I am looking at a view options at the moment:

1.) Cluster mm_object on an index over otype - I'm not sure how that 
would influence the statistics; if it doesn't then this wouldn't change 
much.


2.) Change mm_object into a view over the tables that now join with it. 
I'll have to devise some way to get the otype and owner columns into the 
other tables.


3.) An extension to option 2; Creating seperate tables, only containing 
the relevant sections from mm_object, combining them into a view-version 
of mm_object. Like this:


CREATE TABLE mm_product_object (
number integer PRIMARY KEY,
otype integer,
owner text
);
CREATE TABLE mm_insrel_object (
number integer PRIMARY KEY,
otype integer,
owner text
);

(I recall seeing an inheritance-like statement that makes copies of 
table definitions - seems useful in this case)


CREATE OR REPLACE VIEW mm_object AS
SELECT * FROM mm_product_object
UNION ALL
SELECT * FROM mm_insrel_object;

It remains to be seen that MMBase can handle mm_object being a view, but 
 (if not) it probably will work if it's an updatable view.


I'm leaning to option 3, but if option 1 helps it saves me a lot of trouble.


We might be able to do something about actually solving the statistical
problem in 8.3, but I fear it's too late to think about it for 8.2.


Well, I had hoped for a suitable workaround, and I believe I may have a 
few options now. Waiting for the next PostgreSQL release never really 
was an option for us (deadline is somewhere next week). So it doesn't 
really matter to us that there won't be a solution until 8.3, or maybe 
even later.


Thanks for the help so far, glad to be able to point out an actual problem.

Regards,
--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: +

Re: [HACKERS] [GENERAL] Queries joining views

2006-08-22 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> We might be able to do something about actually solving the statistical
>> problem in 8.3, but I fear it's too late to think about it for 8.2.

> I take it you mean you already have a very concrete idea on how to solve
> it.  Come on, illuminate us poor dumb souls.

No, I don't :-( ... that was intended to suggest that we might think of
a solution given months to work on it rather than days.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Queries joining views

2006-08-22 Thread Alvaro Herrera
Tom Lane wrote:

> We might be able to do something about actually solving the statistical
> problem in 8.3, but I fear it's too late to think about it for 8.2.

I take it you mean you already have a very concrete idea on how to solve
it.  Come on, illuminate us poor dumb souls.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Queries joining views

2006-08-22 Thread Tom Lane
I wrote:
> Thanks.  After digging through it a bit, I understand what's happening,
> but I'm not seeing any simple fix.

I forgot to mention that although I could reproduce your bad plan in
8.1, CVS HEAD doesn't fall into the trap.  I don't believe we've done
anything to fix the fundamental problem however --- it may just be a
side effect of the changes in the indexscan cost model that cause it
to not go for the bogus plan.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Queries joining views

2006-08-22 Thread Tom Lane
Alban Hertroys <[EMAIL PROTECTED]> writes:
>> I'm confused too.  Would it be possible for you to send me a dump of
>> your database?

> Attached is a cleaned out database, the full schema is included, but 
> only the relevant tables contain any data.

Thanks.  After digging through it a bit, I understand what's happening,
but I'm not seeing any simple fix.  The problem is basically that
you've got

create or replace view mm_product as
 SELECT ... FROM mm_product_table JOIN mm_object mmop USING (number);

and then the problem query has WHERE mm_product.number = insrel.snumber
which causes the planner to conclude that mm_product_table.number,
mm_object.number, and mm_insrel_table.snumber are all basically
interchangeable.  In particular it ends up performing the join between
mm_product_table.number and mm_object.number as though
mm_product_table.number were being joined to mm_insrel_table.snumber.
Which is fine, except that it's thinking that the statistics for
mm_object.number are applicable in this context, and they're completely
misleading.  After the join to mm_insrel_table, the statistics of the
variable are really like mm_insrel_table.number --- in particular the
fraction of the table that has to be visited is much larger than it
would've been for mm_object as a whole.

This is a problem we've understood in a generic form for awhile:
a join or selection might change the statistics of a variable,
and so the info stored in the catalogs ought to be modified somehow
to predict what will happen at upper join levels.  We've not seen
it in this particular form before, though.

I'm not sure if there's a whole lot you can do about it in the near term
other than refactor your schema to avoid having different tables joining
to different subranges of mm_object.number.  (You don't necessarily have
to get rid of mm_object --- just try assigning its keys from a serial,
or something, so that there's no correlation to the ranges of keys in
other tables.)

We might be able to do something about actually solving the statistical
problem in 8.3, but I fear it's too late to think about it for 8.2.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend