Am Montag, 30. Januar 2006 17:42 schrieb Tom Lane:
> Philipp Reisner <[EMAIL PROTECTED]> writes:
> > Version: 8.0.2
>
> I don't think so ... neither bitmap scans nor slot_getattr exist in 8.0.
>

Sorry, a typo, 8.1.2 of course.

> > Is this bug-report of any use so far ?
>
> Not a lot.  You need to find out what the difference is between the
> cases where the query runs quickly and those where it doesn't.  I'm
> betting that the planner is choosing a very bad plan in the slow cases,
> but there's not a lot of evidence here to show what or why.
>
> The explain analyze output shows two levels of hash join underneath
> four levels of nestloop join, whereas the stack trace looks like there
> are five levels of nestloop and only one hash.  So this is some evidence
> that a different plan is being used in the slow cases.  The stack trace
> doesn't provide nearly enough info about what that plan is, though.
>

What reasons are there for the planer to choose a different plan ? 

> Do you have geqo_threshold set to less than its default value?  Do you
> sometimes execute the query with different sets of parameters?  Either
> of these might lead to changes of plan.

Since we had very bad experience with the generic query optimizer we
completely disabled it. The non geqo plans perform multitudes better 
on our data-set and work-load. -- And the time to plan the query is
always negligible compared to the execution time.

We have the endless loop about 10 times a day (but execute about 1.5 million 
queries a day) :

Program received signal SIGINT, Interrupt.
0x00002aaaaba687b4 in wcsrtombs () from /lib/libc.so.6
(gdb) bt
#0  0x00002aaaaba687b4 in wcsrtombs () from /lib/libc.so.6
#1  0x00002aaaaba163c8 in wcstombs () from /lib/libc.so.6
#2  0x000000000058dcc4 in text_oid ()
#3  0x000000000058deb9 in upper ()
#4  0x00000000004e216e in ExecMakeFunctionResult ()
#5  0x00000000004e210d in ExecMakeFunctionResult ()
#6  0x00000000004e5063 in ExecQual ()
#7  0x00000000004e554e in ExecScan ()
#8  0x00000000004e0cc8 in ExecProcNode ()
#9  0x00000000004ebeb6 in ExecEndHashJoin ()
#10 0x00000000004eb7fa in ExecHashJoin ()
#11 0x00000000004e0d50 in ExecProcNode ()
#12 0x00000000004ee29d in ExecNestLoop ()
#13 0x00000000004e0d36 in ExecProcNode ()
#14 0x00000000004ee261 in ExecNestLoop () <<-----------
#15 0x00000000004e0d36 in ExecProcNode ()
#16 0x00000000004ee261 in ExecNestLoop ()
#17 0x00000000004e0d36 in ExecProcNode ()
#18 0x00000000004ee261 in ExecNestLoop ()
#19 0x00000000004e0d36 in ExecProcNode ()
#20 0x00000000004ee261 in ExecNestLoop ()
#21 0x00000000004e0d36 in ExecProcNode ()
#22 0x00000000004ef57c in ExecSort ()
#23 0x00000000004e0d6a in ExecProcNode ()
#24 0x00000000004df73f in ExecEndPlan ()
#25 0x00000000004dec09 in ExecutorRun ()
#26 0x000000000055f87a in PortalRun ()
#27 0x000000000055f5f9 in PortalRun ()
#28 0x000000000055b561 in pg_plan_queries ()
#29 0x000000000055df8c in PostgresMain ()
#30 0x0000000000539b01 in ClosePostmasterPorts ()
#31 0x00000000005395d0 in ClosePostmasterPorts ()
#32 0x0000000000537b4d in PostmasterMain ()
#33 0x0000000000536e59 in PostmasterMain ()
#34 0x00000000004fda0c in main ()


After detaching with gdb, I killed it. I configured postgres to dump 
the statement to the logfile with "log_min_error_statement = error".
So I have the exact query that caused the problem. 

This time the plainer says, only explain:

 Sort  (cost=16257.75..16257.76 rows=1 width=262)
   Sort Key: con.shortname
   ->  Nested Loop  (cost=1.18..16257.74 rows=1 width=262)
         ->  Nested Loop  (cost=1.18..16254.57 rows=1 width=130)
               ->  Nested Loop  (cost=1.18..16251.50 rows=1 width=125)
                     ->  Nested Loop  (cost=1.18..16248.45 rows=1 width=111)
                           ->  Nested Loop  (cost=1.18..16245.31 rows=1 
width=86)
                                 Join Filter: ("outer".cel_objid = 
"inner".objid)
                                 ->  Seq Scan on devices dev  
(cost=0.00..11984.25 rows=1 width=63)
                                       Filter: ((upper((COALESCE(isactiv, 
'Y'::character varying))::text) = 'Y'::text) AND 
(upper((COALESCE(iscommittedsp, 'Y'::character varying))::text) = 'Y'::text) 
AND (upper((COALESCE(iscommittedcust, 'Y'::character varying))::text) = 
'Y'::text))
                                 ->  Hash Left Join  (cost=1.18..4256.22 
rows=387 width=27)
                                       Hash Cond: ("outer".ser_objid = 
"inner".objid)
                                       ->  Seq Scan on contractelements cel  
(cost=0.00..4253.10 rows=387 width=21)
                                             Filter: (upper((isactiv)::text) = 
'Y'::text)
                                       ->  Hash  (cost=1.14..1.14 rows=14 
width=14)
                                             ->  Seq Scan on servicetypes ser  
(cost=0.00..1.14 rows=14 width=14)
                           ->  Index Scan using contracts_pkey on contracts con 
 (cost=0.00..3.13 rows=1 width=33)
                                 Index Cond: ("outer".con_objid = con.objid)
                                 Filter: ((ccu_objid = 3837149) OR (ccu_objid = 
3837320) OR (ccu_objid = 3837375) OR (ccu_objid = 3974872) OR (ccu_objid = 
5032031) OR (ccu_objid = 7220313) OR (ccu_objid = 15684899) OR (ccu_objid = 
16052168) OR (ccu_objid = 19196573) OR (ccu_objid = 20095712) OR (ccu_objid = 
22399813) OR (ccu_objid = 42092332) OR (ccu_objid = 42092454) OR (ccu_objid = 
42092574) OR (ccu_objid = 42886947) OR (ccu_objid = 43813234))
                     ->  Index Scan using devicetypes_pkey on devicetypes dty  
(cost=0.00..3.04 rows=1 width=18)
                           Index Cond: (dty.objid = "outer".dty_objid)
               ->  Index Scan using bpartners_pkey on bpartners man  
(cost=0.00..3.06 rows=1 width=13)
                     Index Cond: (man.objid = "outer".bpa_objid)
         ->  Index Scan using locations_pkey on locations loc  (cost=0.00..3.12 
rows=1 width=136)
               Index Cond: (loc.objid = "outer".loc_objid)
               Filter: (upper((shortname)::text) = '5195'::text)
(26 rows)

Explain and execute does not terminate... even after hours.

This time I _CAN_ _REPRODUCE_ the issue in our staging system !!  Would it help
to run the database server with debug symbols ? That would give more information
in the stack traceback ...

Tom, could you give us an advise for what we should look for ? How we should
continue in tracing the bug down ?

In case you care, here is the query as well:

select cel.objid,
             cel.shortname, ser.shortname,
             con.isInventoryFlag,
             dev.objid, dty.objid, loc.objid,
             dev.shortname, loc.shortname, loc.name, man.shortName,
             dty.type, dev.serNr, dev.invNr, dev.component,
             loc.zip, loc.city, con.shortname, con.name,
            CASE
       WHEN CURRENT_DATE between loc.locationstateFrom  and loc.locationstateTo 
 THEN loc.locationstateText
       WHEN CURRENT_DATE between loc.locationstate2From and 
loc.locationstate2To THEN loc.locationstate2Text
       ELSE null
     END
     ,      CASE
       WHEN CURRENT_DATE between loc.locationstateFrom  and loc.locationstateTo 
 THEN loc.locationstateColorCode
       WHEN CURRENT_DATE between loc.locationstate2From and 
loc.locationstate2To THEN loc.locationstate2ColorCode
       ELSE null
     END
     ,
            loc.locationcategory, loc.province
       from contracts con, contractelements cel
            left outer join servicetypes ser on ser.objid = cel.ser_objid,
            devices dev, locations loc, devicetypes dty, bpartners man
     where con.ccu_objid in
  
(3837149,3837320,3837375,3974872,5032031,7220313,15684899,16052168,19196573,20095712,22399813,42092332,42092454,42092574,42886947,43813234)
  and cel.con_objid = con.objid
       and dev.cel_objid = cel.objid
       and loc.objid = dev.loc_objid
       and dty.objid = dev.dty_objid
       and man.objid = dty.bpa_objid
       and upper(cel.isActiv) = 'Y'
       and upper(coalesce(dev.isActiv,'Y')) = 'Y'
       and upper(coalesce(dev.IsCommittedSP,'Y')) = 'Y'
       and upper(coalesce(dev.IsCommittedCust,'Y')) = 'Y'
      and upper(loc.shortName)  = '5195'
       order by 18 ASC

-philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Reply via email to