Re: [GENERAL] Questions about horizontal partitioning
However, if the primary key is entirely within those six columns, there will have to be an index on it in both tables to enforce the primary key constraint. In that case, an inner join could be performed with an index lookup or an index scan plus hash join, for a query that didn't use any other columns. Whether that translates into a significant I/O reduction depends on how wide and how frequently non-NULL those other columns are. ... if someone is feeling pedagogical (and the answer isn't that complicated), could they explain why a simple index on the desired columns wouldn't be the best solution? Cheers Antoine ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Questions about horizontal partitioning
Ron Johnson wrote: On 01/08/07 20:39, Tom Lane wrote: John Sales mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] writes: By doing this, I'm hoping that the query optimizer is smart enough to see that if a query comes in and requests only the six columns (that are in the narrower table) that PostgreSQL won't have to load the wider table into the buffer pool, and thereby actually have to only access about 10% the amount of disk that it presently does. No. It still has to touch the second table to confirm the existence of rows to join to. But if a query /requests *only* the six columns (that are in the narrower table)/, why will the optimizer care about the other 224 columns? It would. A query that uses an inner join implies that a matching entry must exist in both tables - so the join must occur, otherwise you could be returning rows that don't satisfy the join condition. However, if the primary key is entirely within those six columns, there will have to be an index on it in both tables to enforce the primary key constraint. In that case, an inner join could be performed with an index lookup or an index scan plus hash join, for a query that didn't use any other columns. Whether that translates into a significant I/O reduction depends on how wide and how frequently non-NULL those other columns are.
Re: [GENERAL] Questions about horizontal partitioning
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/08/07 20:39, Tom Lane wrote: John Sales [EMAIL PROTECTED] writes: By doing this, I'm hoping that the query optimizer is smart enough to see that if a query comes in and requests only the six columns (that are in the narrower table) that PostgreSQL won't have to load the wider table into the buffer pool, and thereby actually have to only access about 10% the amount of disk that it presently does. Is this a sound theory? No. It still has to touch the second table to confirm the existence of rows to join to. But if a query /requests *only* the six columns (that are in the narrower table)/, why will the optimizer care about the other 224 columns? -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFo5ZfS9HxQb37XmcRAtDRAJ41kKEN1Dv1iKXosTjy6IvMZKGccACfcZc9 e4pV+u0uLFisHcLu/gyuCvE= =q44l -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Questions about horizontal partitioning
Ron Johnson wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/08/07 20:39, Tom Lane wrote: John Sales [EMAIL PROTECTED] writes: By doing this, I'm hoping that the query optimizer is smart enough to see that if a query comes in and requests only the six columns (that are in the narrower table) that PostgreSQL won't have to load the wider table into the buffer pool, and thereby actually have to only access about 10% the amount of disk that it presently does. Is this a sound theory? No. It still has to touch the second table to confirm the existence of rows to join to. But if a query /requests *only* the six columns (that are in the narrower table)/, why will the optimizer care about the other 224 columns? It would. A query that uses an inner join implies that a matching entry must exist in both tables - so the join must occur, otherwise you could be returning rows that don't satisfy the join condition. -- Chander Ganesan Open Technology Group, Inc. One Copley Parkway, Suite 210 Morrisville, NC 27560 Phone: 877-258-8987/919-463-0999 http://www.otg-nc.com Expert PostgreSQL Training: http://www.otg-nc.com/training-courses/category.php?cat_id=8 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFo5ZfS9HxQb37XmcRAtDRAJ41kKEN1Dv1iKXosTjy6IvMZKGccACfcZc9 e4pV+u0uLFisHcLu/gyuCvE= =q44l -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Questions about horizontal partitioning
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/09/07 07:28, Chander Ganesan wrote: Ron Johnson wrote: On 01/08/07 20:39, Tom Lane wrote: John Sales [EMAIL PROTECTED] writes: By doing this, I'm hoping that the query optimizer is smart enough to see that if a query comes in and requests only the six columns (that are in the narrower table) that PostgreSQL won't have to load the wider table into the buffer pool, and thereby actually have to only access about 10% the amount of disk that it presently does. Is this a sound theory? No. It still has to touch the second table to confirm the existence of rows to join to. But if a query /requests *only* the six columns (that are in the narrower table)/, why will the optimizer care about the other 224 columns? It would. A query that uses an inner join implies that a matching entry must exist in both tables - so the join must occur, otherwise you could be returning rows that don't satisfy the join condition. Sure, if you were selecting those 6 columns from the inner join view. pause Ah, now that I reread the OP, I see that that's what he seems to mean. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFo5tdS9HxQb37XmcRApwEAKDiqD86q3sh5eePFrgH3+o4LbTAYwCg1Oys 3/WT7eJvbxfE4RDY3E99NAo= =ix6x -END PGP SIGNATURE- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Questions about horizontal partitioning
On 09/01/07, Ron Johnson [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/08/07 20:39, Tom Lane wrote: John Sales [EMAIL PROTECTED] writes: By doing this, I'm hoping that the query optimizer is smart enough to see that if a query comes in and requests only the six columns (that are in the narrower table) that PostgreSQL won't have to load the wider table into the buffer pool, and thereby actually have to only access about 10% the amount of disk that it presently does. Is this a sound theory? No. It still has to touch the second table to confirm the existence of rows to join to. But if a query /requests *only* the six columns (that are in the narrower table)/, why will the optimizer care about the other 224 columns? If you are doing an inner join (read normal join) the column has to exist in both tables to be in the final result. If your doing an outer join it depends upon its type (left, right or full) and then postgres may not optimise it out. Peter Childs ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Questions about horizontal partitioning
Ron Johnson wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/09/07 07:28, Chander Ganesan wrote: Ron Johnson wrote: On 01/08/07 20:39, Tom Lane wrote: John Sales [EMAIL PROTECTED] writes: By doing this, I'm hoping that the query optimizer is smart enough to see that if a query comes in and requests only the six columns (that are in the narrower table) that PostgreSQL won't have to load the wider table into the buffer pool, and thereby actually have to only access about 10% the amount of disk that it presently does. Is this a sound theory? No. It still has to touch the second table to confirm the existence of rows to join to. But if a query /requests *only* the six columns (that are in the narrower table)/, why will the optimizer care about the other 224 columns? It would. A query that uses an inner join implies that a matching entry must exist in both tables - so the join must occur, otherwise you could be returning rows that don't satisfy the join condition. Sure, if you were selecting those 6 columns from the inner join view. pause Ah, now that I reread the OP, I see that that's what he seems to mean. In theory, if the table with 6 columns was the child of the table with 200+ columns, and a PK-FK relationship existed, then the optimizer wouldn't need to check for the existence of the rows, since the PK-FK relationship would indicate that the rows did/do exist. However, I don't *think* that the optimizer takes that into account (though with PostgreSQL you never know, it's pretty smart stuff ;-) ) . Of course, that might make insert operations difficult, but if you are using a view to perform inserts you could probably handle that fairly easily in the on insert rule... -- Chander Ganesan Open Technology Group, Inc. One Copley Parkway, Suite 210 Morrisville, NC 27560 Phone: 877-258-8987/919-463-0999 http://www.otg-nc.com Expert PostgreSQL training: http://www.otg-nc.com/training-courses/category.php?cat_id=8 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFo5tdS9HxQb37XmcRApwEAKDiqD86q3sh5eePFrgH3+o4LbTAYwCg1Oys 3/WT7eJvbxfE4RDY3E99NAo= =ix6x -END PGP SIGNATURE- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Questions about horizontal partitioning
Chander Ganesan [EMAIL PROTECTED] writes: In theory, if the table with 6 columns was the child of the table with 200+ columns, and a PK-FK relationship existed, then the optimizer wouldn't need to check for the existence of the rows, since the PK-FK relationship would indicate that the rows did/do exist. No, that's still not right. With a LEFT JOIN you know that each row of the narrow table will produce at least one row in the join view. What you don't know is whether the row could produce more than one join row --- ie, is there more than one wide-table row that joins to it? To optimize away the join, the planner would have to find a unique constraint on the wide table's join column(s). This is certainly doable in principle, though I find it questionable whether the planner should spend cycles on every join query checking for something that won't be true in the vast majority of real-world queries. The main reason we have not considered it to date is that the correctness of the plan would then depend on a constraint that could get dropped --- but the plan would not fail when run, as indeed it wouldn't be touching that table at all. We really need some plan-invalidation infrastructure to force re-planning whenever a table's schema changes, and only then will it be safe for the planner to start depending on constraints for correctness- related decisions. (This is why constraint exclusion is off by default at the moment.) I've been wanting plan invalidation for a long time ... maybe it will finally happen in 8.3. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Questions about horizontal partitioning
On Tue, Jan 09, 2007 at 08:28:29 -0500, Chander Ganesan [EMAIL PROTECTED] wrote: It would. A query that uses an inner join implies that a matching entry must exist in both tables - so the join must occur, otherwise you could be returning rows that don't satisfy the join condition. While this might not be worth doing, a foreign key constraint could be used to establish the existance of the matching row, so that you wouldn't actually need to look in the other table to verify that if you didn't need any of the other columns in the other table. There will be some tricky cases for this, such as when there are deferred constraints or when updating the foreign key field. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Questions about horizontal partitioning
On Tue, Jan 09, 2007 at 10:33:52 -0500, Tom Lane [EMAIL PROTECTED] wrote: No, that's still not right. With a LEFT JOIN you know that each row of the narrow table will produce at least one row in the join view. What you don't know is whether the row could produce more than one join row --- ie, is there more than one wide-table row that joins to it? Thanks for pointing that out. I only thought of half of the problem. To optimize away the join, the planner would have to find a unique constraint on the wide table's join column(s). This is certainly doable in principle, though I find it questionable whether the planner should spend cycles on every join query checking for something that won't be true in the vast majority of real-world queries. The main reason we In this case the test would only be applied when no columns were being used in a table being joined to. Since that is also an unusual case, if that case could be quickly checked for, then it might conceivably be worth doing the more expensive test for the proper not null foreign key relation and unique constraint. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Questions about horizontal partitioning
Bruno Wolff III [EMAIL PROTECTED] writes: In this case the test would only be applied when no columns were being used in a table being joined to. Since that is also an unusual case, if that case could be quickly checked for, then it might conceivably be worth doing the more expensive test for the proper not null foreign key relation and unique constraint. Yeah. Right offhand I think that would be fairly expensive too :-(. The problem is that the query *DOES* have a reference to at least one column of the wide table, namely its join key. So you'd have to distinguish whether any references appear above the join. In the current planner structure I think that this information is computed, but not until it's far too late to be removing joins from the tree. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Questions about horizontal partitioning
Suppose I have a table - lets say it was developed by someone with little or no understanding of database design - and it has 230 columns. Now, it turns out that 99% of the time only about 8 colums are required, but all 230 columns are populated. However, legacy applications (which are run nightly for batch processing, but not during the day, when heavy volume occurs) require the old table design. New applications only select a limited number of columns, but require the old design names. I want to do the following: 1. Split the table into two tables (one with 6 columns, the other with 224 columns), using the primary key to establish a 1-1 relationship between them. 2. Create a view that displays the join between the two tables. By doing this, I'm hoping that the query optimizer is smart enough to see that if a query comes in and requests only the six columns (that are in the narrower table) that PostgreSQL won't have to load the wider table into the buffer pool, and thereby actually have to only access about 10% the amount of disk that it presently does. Is this a sound theory? Is the query optimizer smart enough ignore part of a join when a portion of the join will have no effect on the end result set? thanks __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: [GENERAL] Questions about horizontal partitioning
John Sales [EMAIL PROTECTED] writes: By doing this, I'm hoping that the query optimizer is smart enough to see that if a query comes in and requests only the six columns (that are in the narrower table) that PostgreSQL won't have to load the wider table into the buffer pool, and thereby actually have to only access about 10% the amount of disk that it presently does. Is this a sound theory? No. It still has to touch the second table to confirm the existence of rows to join to. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Questions about horizontal partitioning
Tom Lane [EMAIL PROTECTED] wrote: John Sales writes: By doing this, I'm hoping that the query optimizer is smart enough to see that if a query comes in and requests only the six columns (that are in the narrower table) that PostgreSQL won't have to load the wider table into the buffer pool, and thereby actually have to only access about 10% the amount of disk that it presently does. Is this a sound theory? No. It still has to touch the second table to confirm the existence of rows to join to. regards, tom lane Good point.. What if I made the second table optional using an outer join, so as not to imply the existance of rows in the wide table? thanks __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com