Tom, et al,

Yes, thanks.

Another thing that we noticed is that when LIKE is used on an
index with a constant value like 'WILLIAMS%', a full table
scan occurs when the constant is 'W%', but the index is used
if the like string is 'WI%' or longer.

It seems to me that the selectivity of the string would vary 
with the length of the string, perhaps as a fraction of the 
length of the field.  In other words, I would have expected
the selectivity to vary something like this:

         W% -> 0.1
        WI% -> 0.01
       WIL% -> 0.001
      WILL% -> 0.0001
     WILLI% -> 0.00001
    WILLIA% -> 0.000001
   WILLIAM% -> 0.0000001
  WILLIAMS% -> 0.00000001

In other words, if I only give one letter, then I might expect
to get about 1/10 of the table, and a full scan might make sense.
But the cost should continue to decline as I give longer and longer
strings, up to the length of the field.  

Would this be a reasonable improvement to the optimizer?

Ray

[Charset iso-8859-1 unsupported, filtering to ASCII...]
> Tom-
> 
> Thanks for the info-
> 
> Based on your response plus some local issues, we're going to work around
> this by simply creating another column containing the results of the
> function & then index the column. That gets the results we want without
> tweaking something we may regret later.
> 
> Stats for function indexes would be nice, so add our vote for it to wherever
> such things are tallied to come up with priorities.
> 
> Regards,
> 
> -Nick
> 
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED]]On Behalf Of Tom Lane
> > Sent: Wednesday, June 26, 2002 10:37 PM
> > To: [EMAIL PROTECTED]
> > Cc: pgsql-admin
> > Subject: Re: [ADMIN] Are statistics gathered on function indexes?
> >
> >
> > "Nick Fankhauser" <[EMAIL PROTECTED]> writes:
> > > [see subject]
> >
> > Nope, they ain't.  I agree they should be.
> >
> > > Can someone tell me how the cost is estimated for retrieving a
> > column based
> > > on a function that is indexed?
> >
> > It falls back to a default selectivity estimate, which is something
> > like 1% or 0.5% (depending on which version you are running).
> >
> > > Also, even with 2168 rows to gather, my experience based on cases where
> > > several thousand rows really are returned indicates that the index would
> > > still be a good choice. Is there a way to make the planner
> > favor index scans
> > > a bit more? (Other than the drastic set enable_seqscan to off.)
> >
> > I'd suggest reducing random_page_cost; we've seen a number of anecdotal
> > reports that the default of 4.0 is too high, though nothing systematic
> > enough to refute the experiments I did to get that number awhile back.
> > (IMHO anyway.  Others may differ.)
> >
> >                     regards, tom lane
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to [EMAIL PROTECTED] so that your
> > message can get through to the mailing list cleanly
> >
> >
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 
> 

----------------------------------------------------------------------
Ray Ontko   [EMAIL PROTECTED]   Phone 1.765.935.4283   Fax 1.765.962.9788
Ray Ontko & Co.   Software Consulting Services   http://www.ontko.com/



---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


Reply via email to