Re: [GENERAL] Questions about horizontal partitioning

2007-01-28 Thread Anton Melser

However,  if the primary key is entirely within those six columns,  there
will have to be an index on it in both tables to enforce the primary key
constraint.  In that case,  an inner join could be performed with an index
lookup or an index scan plus hash join,  for a query that didn't use any
other columns.  Whether that translates into a significant I/O reduction
depends on how wide and how frequently non-NULL those other columns are.




... if someone is feeling pedagogical (and the answer isn't that
complicated), could they explain why a simple index on the desired
columns wouldn't be the best solution?
Cheers
Antoine

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Questions about horizontal partitioning

2007-01-24 Thread David Lee Lambert
 

Ron Johnson wrote: 

 
On 01/08/07 20:39, Tom Lane wrote:
  

John Sales  mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] writes:


By doing this, I'm hoping that the query optimizer is smart
enough to see that if a query comes in and requests only the
six columns (that are in the narrower table) that PostgreSQL
won't have to load the wider table into the buffer pool, and
thereby actually have to only access about 10% the amount of
disk that it presently does.

  

No.  It still has to touch the second table to confirm the
existence of rows to join to.


 
But if a query /requests *only* the six columns (that are in the
narrower table)/, why will the optimizer care about the other 224
columns?
  

It would.  A query that uses an inner join implies that a matching entry
must exist in both tables - so the join must occur, otherwise you could be
returning rows that don't satisfy the join condition.



However,  if the primary key is entirely within those six columns,  there
will have to be an index on it in both tables to enforce the primary key
constraint.  In that case,  an inner join could be performed with an index
lookup or an index scan plus hash join,  for a query that didn't use any
other columns.  Whether that translates into a significant I/O reduction
depends on how wide and how frequently non-NULL those other columns are.

 



Re: [GENERAL] Questions about horizontal partitioning

2007-01-09 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/08/07 20:39, Tom Lane wrote:
 John Sales [EMAIL PROTECTED] writes:
 By doing this, I'm hoping that the query optimizer is smart
 enough to see that if a query comes in and requests only the
 six columns (that are in the narrower table) that PostgreSQL
 won't have to load the wider table into the buffer pool, and
 thereby actually have to only access about 10% the amount of
 disk that it presently does.
 
 Is this a sound theory?
 
 No.  It still has to touch the second table to confirm the
 existence of rows to join to.

But if a query /requests *only* the six columns (that are in the
narrower table)/, why will the optimizer care about the other 224
columns?



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFo5ZfS9HxQb37XmcRAtDRAJ41kKEN1Dv1iKXosTjy6IvMZKGccACfcZc9
e4pV+u0uLFisHcLu/gyuCvE=
=q44l
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Questions about horizontal partitioning

2007-01-09 Thread Chander Ganesan

Ron Johnson wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/08/07 20:39, Tom Lane wrote:
  

John Sales [EMAIL PROTECTED] writes:


By doing this, I'm hoping that the query optimizer is smart
enough to see that if a query comes in and requests only the
six columns (that are in the narrower table) that PostgreSQL
won't have to load the wider table into the buffer pool, and
thereby actually have to only access about 10% the amount of
disk that it presently does.
  
Is this a sound theory?
  

No.  It still has to touch the second table to confirm the
existence of rows to join to.



But if a query /requests *only* the six columns (that are in the
narrower table)/, why will the optimizer care about the other 224
columns?
  
It would.  A query that uses an inner join implies that a matching entry 
must exist in both tables - so the join must occur, otherwise you could 
be returning rows that don't satisfy the join condition.


--
Chander Ganesan
Open Technology Group, Inc.
One Copley Parkway, Suite 210
Morrisville, NC  27560
Phone: 877-258-8987/919-463-0999
http://www.otg-nc.com
Expert PostgreSQL Training: 
http://www.otg-nc.com/training-courses/category.php?cat_id=8





-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFo5ZfS9HxQb37XmcRAtDRAJ41kKEN1Dv1iKXosTjy6IvMZKGccACfcZc9
e4pV+u0uLFisHcLu/gyuCvE=
=q44l
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/
  





Re: [GENERAL] Questions about horizontal partitioning

2007-01-09 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/09/07 07:28, Chander Ganesan wrote:
 Ron Johnson wrote:

 On 01/08/07 20:39, Tom Lane wrote:
  
 John Sales [EMAIL PROTECTED] writes:

 By doing this, I'm hoping that the query optimizer is smart
 enough to see that if a query comes in and requests only the
 six columns (that are in the narrower table) that PostgreSQL
 won't have to load the wider table into the buffer pool, and
 thereby actually have to only access about 10% the amount of
 disk that it presently does.
   Is this a sound theory?
   
 No.  It still has to touch the second table to confirm the
 existence of rows to join to.
 

 But if a query /requests *only* the six columns (that are in the
 narrower table)/, why will the optimizer care about the other 224
 columns?
   
 It would.  A query that uses an inner join implies that a matching entry
 must exist in both tables - so the join must occur, otherwise you could
 be returning rows that don't satisfy the join condition.

Sure, if you were selecting those 6 columns from the inner join
view.  pause  Ah, now that I reread the OP, I see that that's
what he seems to mean.



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFo5tdS9HxQb37XmcRApwEAKDiqD86q3sh5eePFrgH3+o4LbTAYwCg1Oys
3/WT7eJvbxfE4RDY3E99NAo=
=ix6x
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Questions about horizontal partitioning

2007-01-09 Thread Peter Childs

On 09/01/07, Ron Johnson [EMAIL PROTECTED] wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/08/07 20:39, Tom Lane wrote:
 John Sales [EMAIL PROTECTED] writes:
 By doing this, I'm hoping that the query optimizer is smart
 enough to see that if a query comes in and requests only the
 six columns (that are in the narrower table) that PostgreSQL
 won't have to load the wider table into the buffer pool, and
 thereby actually have to only access about 10% the amount of
 disk that it presently does.

 Is this a sound theory?

 No.  It still has to touch the second table to confirm the
 existence of rows to join to.

But if a query /requests *only* the six columns (that are in the
narrower table)/, why will the optimizer care about the other 224
columns?



If you are doing an inner join (read normal join) the column has to
exist in both tables to be in the final result. If your doing an outer
join it depends upon its type (left, right or full) and then postgres
may not optimise it out.


Peter Childs

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Questions about horizontal partitioning

2007-01-09 Thread Chander Ganesan

Ron Johnson wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/09/07 07:28, Chander Ganesan wrote:
  

Ron Johnson wrote:


On 01/08/07 20:39, Tom Lane wrote:
 
  

John Sales [EMAIL PROTECTED] writes:
   


By doing this, I'm hoping that the query optimizer is smart
enough to see that if a query comes in and requests only the
six columns (that are in the narrower table) that PostgreSQL
won't have to load the wider table into the buffer pool, and
thereby actually have to only access about 10% the amount of
disk that it presently does.
  Is this a sound theory?
  
  

No.  It still has to touch the second table to confirm the
existence of rows to join to.



But if a query /requests *only* the six columns (that are in the
narrower table)/, why will the optimizer care about the other 224
columns?
  
  

It would.  A query that uses an inner join implies that a matching entry
must exist in both tables - so the join must occur, otherwise you could
be returning rows that don't satisfy the join condition.



Sure, if you were selecting those 6 columns from the inner join
view.  pause  Ah, now that I reread the OP, I see that that's
what he seems to mean.
  
In theory, if the table with 6 columns was the child of the table with 
200+ columns, and a PK-FK relationship existed, then the optimizer 
wouldn't need to check for the existence of the rows, since the PK-FK 
relationship would indicate that the rows did/do exist.  However, I 
don't *think* that the optimizer takes that into account (though with 
PostgreSQL you never know, it's pretty smart stuff ;-) ) .  Of course, 
that might make insert operations difficult, but if you are using a view 
to perform inserts you could probably handle that fairly easily in the 
on insert rule...


--
Chander Ganesan
Open Technology Group, Inc.
One Copley Parkway, Suite 210
Morrisville, NC  27560
Phone: 877-258-8987/919-463-0999
http://www.otg-nc.com
Expert PostgreSQL training: 
http://www.otg-nc.com/training-courses/category.php?cat_id=8






-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFo5tdS9HxQb37XmcRApwEAKDiqD86q3sh5eePFrgH3+o4LbTAYwCg1Oys
3/WT7eJvbxfE4RDY3E99NAo=
=ix6x
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
  




Re: [GENERAL] Questions about horizontal partitioning

2007-01-09 Thread Tom Lane
Chander Ganesan [EMAIL PROTECTED] writes:
 In theory, if the table with 6 columns was the child of the table with 
 200+ columns, and a PK-FK relationship existed, then the optimizer 
 wouldn't need to check for the existence of the rows, since the PK-FK 
 relationship would indicate that the rows did/do exist.

No, that's still not right.  With a LEFT JOIN you know that each row of
the narrow table will produce at least one row in the join view.  What
you don't know is whether the row could produce more than one join row
--- ie, is there more than one wide-table row that joins to it?

To optimize away the join, the planner would have to find a unique
constraint on the wide table's join column(s).  This is certainly doable
in principle, though I find it questionable whether the planner should
spend cycles on every join query checking for something that won't be
true in the vast majority of real-world queries.  The main reason we
have not considered it to date is that the correctness of the plan would
then depend on a constraint that could get dropped --- but the plan
would not fail when run, as indeed it wouldn't be touching that table
at all.  We really need some plan-invalidation infrastructure to force
re-planning whenever a table's schema changes, and only then will it be
safe for the planner to start depending on constraints for correctness-
related decisions.  (This is why constraint exclusion is off by default
at the moment.)

I've been wanting plan invalidation for a long time ... maybe it will
finally happen in 8.3.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Questions about horizontal partitioning

2007-01-09 Thread Bruno Wolff III
On Tue, Jan 09, 2007 at 08:28:29 -0500,
  Chander Ganesan [EMAIL PROTECTED] wrote:
 It would.  A query that uses an inner join implies that a matching entry 
 must exist in both tables - so the join must occur, otherwise you could 
 be returning rows that don't satisfy the join condition.

While this might not be worth doing, a foreign key constraint could be used
to establish the existance of the matching row, so that you wouldn't actually
need to look in the other table to verify that if you didn't need any of the
other columns in the other table. There will be some tricky cases for this,
such as when there are deferred constraints or when updating the foreign key
field.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Questions about horizontal partitioning

2007-01-09 Thread Bruno Wolff III
On Tue, Jan 09, 2007 at 10:33:52 -0500,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 No, that's still not right.  With a LEFT JOIN you know that each row of
 the narrow table will produce at least one row in the join view.  What
 you don't know is whether the row could produce more than one join row
 --- ie, is there more than one wide-table row that joins to it?

Thanks for pointing that out. I only thought of half of the problem.

 To optimize away the join, the planner would have to find a unique
 constraint on the wide table's join column(s).  This is certainly doable
 in principle, though I find it questionable whether the planner should
 spend cycles on every join query checking for something that won't be
 true in the vast majority of real-world queries.  The main reason we

In this case the test would only be applied when no columns were being used
in a table being joined to. Since that is also an unusual case, if that case
could be quickly checked for, then it might conceivably be worth doing the
more expensive test for the proper not null foreign key relation and unique
constraint.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Questions about horizontal partitioning

2007-01-09 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
 In this case the test would only be applied when no columns were being used
 in a table being joined to. Since that is also an unusual case, if that case
 could be quickly checked for, then it might conceivably be worth doing the
 more expensive test for the proper not null foreign key relation and unique
 constraint.

Yeah.  Right offhand I think that would be fairly expensive too :-(.
The problem is that the query *DOES* have a reference to at least one
column of the wide table, namely its join key.  So you'd have to
distinguish whether any references appear above the join.  In the
current planner structure I think that this information is computed,
but not until it's far too late to be removing joins from the tree.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] Questions about horizontal partitioning

2007-01-08 Thread John Sales
Suppose I have a table - lets say it was developed by someone with little or no 
understanding of database design - and it has 230 columns.

Now, it turns out that 99% of the time only about 8 colums are required, but 
all 230 columns are populated.  However, legacy applications (which are run 
nightly for batch processing, but not during the day, when heavy volume occurs) 
require the old table design.  New applications only select a limited number of 
columns, but require the old design names.

I want to do the following:

1.  Split the table into two tables (one with 6 columns, the other with 224 
columns), using the primary key to establish a 1-1 relationship between them.
2.  Create a view that displays the join between the two tables.

By doing this, I'm hoping that the query optimizer is smart enough to see that 
if a query comes in and requests only the six columns (that are in the narrower 
table) that PostgreSQL won't have to load the wider table into the buffer pool, 
and thereby actually have to only access about 10% the amount of disk that it 
presently does.

Is this a sound theory?  Is the query optimizer smart enough ignore part of a 
join when a portion of the join will have no effect on the end result set?

thanks


 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: [GENERAL] Questions about horizontal partitioning

2007-01-08 Thread Tom Lane
John Sales [EMAIL PROTECTED] writes:
 By doing this, I'm hoping that the query optimizer is smart enough to see 
 that if a query comes in and requests only the six columns (that are in the 
 narrower table) that PostgreSQL won't have to load the wider table into the 
 buffer pool, and thereby actually have to only access about 10% the amount of 
 disk that it presently does.

 Is this a sound theory?

No.  It still has to touch the second table to confirm the existence of
rows to join to.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Questions about horizontal partitioning

2007-01-08 Thread John Sales
Tom Lane [EMAIL PROTECTED] wrote: John Sales  writes:
 By doing this, I'm hoping that the query optimizer is smart enough to see 
 that if a query comes in and requests only the six columns (that are in the 
 narrower table) that PostgreSQL won't have to load the wider table into the 
 buffer pool, and thereby actually have to only access about 10% the amount of 
 disk that it presently does.

 Is this a sound theory?

No.  It still has to touch the second table to confirm the existence of
rows to join to.

   regards, tom lane
Good point..

What if I made the second table optional using an outer join, so as not to 
imply the existance of rows in the wide table?

thanks

 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com