[GENERAL] Optimising full outer join where for muti-row to multi-column view

2006-12-28 Thread Phil Endecott
Dear PostgreSQL experts,

I have a database that records the EXIF data for a collection of 
photos.  (EXIF is a method for embedding arbitary name-value data 
in a JPEG, and digital cameras typically use it to record things 
like exposure information.)  My exif table looks something like 
this:

 photo_id |   tag|  value 
--+--+--
 1001 | DateTime | 2006:10:26 11:19:29
 1001 | Orientation  | top - left
 1001 | PixelXDimension  | 3888
 1001 | PixelYDimension  | 2592
 1002 | DateTimeOriginal | 2006:10:26 13:34:06
 1002 | Orientation  | left - bottom
 1002 | PixelXDimension  | 3888
 1002 | PixelYDimension  | 2592

photo_id and tag together form the primary key.  The data also 
includes many tags that I'm not currently interested in.  From 
this I create a view containing only the tags of interest:

 photo_id |  orientation  |  datetime   | xsize | ysize 
--+---+-+---+---
 1001 | top - left| 2006:10:30 11:19:29 | 3888  | 2592
 1002 | left - bottom | 2006:10:26 13:34:06 | 3888  | 2592

My first attempt did this in the obvious way by joining the 
exif table with itself once per column:

create view photo_info_v as
select photo_id,
   e1.value as orientation, 
   e2.value as datetime, 
   e3.value as xsize,
   e4.value as ysize
  from exif e1
  join exif e2 using(photo_id)
  join exif e3 using(photo_id)
  join exif e4 using(photo_id)
  where e1.tag='Orientation' 
and e2.tag='DateTime' 
and e3.tag='PixelXDimension' 
and e4.tag='PixelYDimension';

This works well for one important query, where I find one photo's 
information from the view:

photos= explain analyse select * from photo_info_v where photo_id=1201;
  QUERY PLAN
  
--
 Nested Loop  (cost=0.00..12.09 rows=1 width=60) (actual time=1.198..1.290 
rows=1 loops=1)
   -  Nested Loop  (cost=0.00..9.07 rows=1 width=46) (actual time=0.953..1.009 
rows=1 loops=1)
 -  Nested Loop  (cost=0.00..6.04 rows=1 width=32) (actual 
time=0.693..0.731 rows=1 loops=1)
   -  Index Scan using exif_pkey on exif e1  (cost=0.00..3.02 
rows=1 width=18) (actual time=0.384..0.394 rows=1 loops=1)
 Index Cond: ((photo_id = 1201) AND (tag = 
'Orientation'::text))
   -  Index Scan using exif_pkey on exif e4  (cost=0.00..3.02 
rows=1 width=18) (actual time=0.189..0.205 rows=1 loops=1)
 Index Cond: ((1201 = photo_id) AND (tag = 
'PixelYDimension'::text))
 -  Index Scan using exif_pkey on exif e3  (cost=0.00..3.02 rows=1 
width=18) (actual time=0.186..0.194 rows=1 loops=1)
   Index Cond: ((1201 = photo_id) AND (tag = 
'PixelXDimension'::text))
   -  Index Scan using exif_pkey on exif e2  (cost=0.00..3.02 rows=1 width=18) 
(actual time=0.171..0.195 rows=1 loops=1)
 Index Cond: ((1201 = photo_id) AND (tag = 'DateTime'::text))
 Total runtime: 3.064 ms

However, I might just want one column from the view:

photos= explain analyse select orientation from photo_info_v where 
photo_id=1201;
  QUERY PLAN
  
--
 Nested Loop  (cost=0.00..12.09 rows=1 width=14) (actual time=1.266..1.353 
rows=1 loops=1)
   -  Nested Loop  (cost=0.00..9.07 rows=1 width=18) (actual time=1.009..1.062 
rows=1 loops=1)
 -  Nested Loop  (cost=0.00..6.04 rows=1 width=18) (actual 
time=0.752..0.787 rows=1 loops=1)
   -  Index Scan using exif_pkey on exif e1  (cost=0.00..3.02 
rows=1 width=18) (actual time=0.410..0.418 rows=1 loops=1)
 Index Cond: ((photo_id = 1201) AND (tag = 
'Orientation'::text))
   -  Index Scan using exif_pkey on exif e4  (cost=0.00..3.02 
rows=1 width=4) (actual time=0.183..0.199 rows=1 loops=1)
 Index Cond: ((1201 = photo_id) AND (tag = 
'PixelYDimension'::text))
 -  Index Scan using exif_pkey on exif e3  (cost=0.00..3.02 rows=1 
width=4) (actual time=0.168..0.176 rows=1 loops=1)
   Index Cond: ((1201 = photo_id) AND (tag = 
'PixelXDimension'::text))
   -  Index Scan using exif_pkey on exif e2  (cost=0.00..3.02 rows=1 width=4) 
(actual time=0.168..0.191 rows=1 loops=1)
 Index Cond: ((1201 = photo_id) AND (tag = 'DateTime'::text))
 Total runtime: 3.123 ms

I only wanted the orientation information, which 

Re: [GENERAL] Optimising full outer join where for muti-row to multi-column view

2006-12-28 Thread Martijn van Oosterhout
On Thu, Dec 28, 2006 at 08:29:56PM +, Phil Endecott wrote:
 To try and avoid this, I tried using a full outer join in the view 
 definition.  In this case the row would always be present in the view, 
 even if the data for the other columns were not present.  I hoped that 
 the query would then be optimised to look up only the orientation 
 information:

snip

You really want a left outer join, not a full outer (is the full outer
join even producing the answer you expect?). A full outer join can
never be optimised away.

That said, I don't know if the logic exists to optimise away a left
join either. It would only be possible if the join were on the primary,
thus you could be assured of exactly one row.

The only other alternative would be to make each column a subselect
instead. If you don't take that column as output, maybe it optimises
the subquery out entirely?

Like so:

create view view photo_info_v as
select photo_id,
  (select orientation from e2 where e2.tag='Orientation' and e2.photo_id = 
e1.photo_id), 
etc

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature