Re: [GENERAL] How to inject knowledge into a Postgres database

2005-10-13 Thread han . holl
On Tuesday 11 October 2005 01:50, Jim C. Nasby wrote:
 What's the datatype on naamvrouw? Notice that it's being casted to text,
 which means an index on that column won't be used.

Datatype is text. My problem is that I want it to use the date of birth index.

I've increased the statistics to 1000, and only occasionally, because of the 
very uneven distribution of surnames, the planner does the wrong thing.
(Thinking that only 14 rows will match 'blaa' instead of 767.)

I've tried to set an even higher limit, but 1000 seems to be the maximum.
Is this hardcoded, or can I set a higher maximum somewhere ?

If the only penalty is slower analyzing, I don't care: we analyze at night 
when these system are idle.

Cheers,

Han Holl

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] How to inject knowledge into a Postgres database

2005-10-13 Thread Tom Lane
[EMAIL PROTECTED] writes:
 I've increased the statistics to 1000, and only occasionally, because of the 
 very uneven distribution of surnames, the planner does the wrong thing.
 I've tried to set an even higher limit, but 1000 seems to be the maximum.
 Is this hardcoded, or can I set a higher maximum somewhere ?

You'd have to change the source code, but it's a simple tweak in the
ALTER SET STATISTICS code.

 If the only penalty is slower analyzing, I don't care: we analyze at night 
 when these system are idle.

You'd be wrong about that --- the planner operations that use the data
would necessarily be slower, too.  I don't have any concrete information
about how much slower, but I'd be hesitant to raise the figure much
beyond 1000 ...

However, if you can show you have a real-world case that benefits, I'd
be willing to think about raising the wired-in limit to 1 or so.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] How to inject knowledge into a Postgres database

2005-10-13 Thread han . holl
On Thursday 13 October 2005 16:06, Tom Lane wrote:

 You'd have to change the source code, but it's a simple tweak in the
 ALTER SET STATISTICS code.

I don't think I'd want to do that.

  If the only penalty is slower analyzing, I don't care: we analyze at
  night when these system are idle.

 You'd be wrong about that --- the planner operations that use the data
 would necessarily be slower, too.  I don't have any concrete information
 about how much slower, but I'd be hesitant to raise the figure much
 beyond 1000 ...

 However, if you can show you have a real-world case that benefits, I'd
 be willing to think about raising the wired-in limit to 1 or so.

The example I gave earlier in the thread, date_of_birth = 'some-date' and 
surname like 'blaa%', was a real life example, but I had to pull it from a 
logfile that logs queries longer than 500 ms. It happens two or three times a 
day in a laboratory with 50 people querying the database all day.

Estimates for date_of_birth number of rows are quite good (even at the default 
stats of 10) but surnames are just too unevenly distributed.

But in 99% of all cases the guess is right, and by making it a nested query I 
could improve 1% and worsen 99%.

Cheers,

Han Holl


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] How to inject knowledge into a Postgres database

2005-10-10 Thread han . holl

Tom, Oleg, Yonathan,

thanks for the suggestions.
Indeed, upping the statistics from 10 to 100 helped.

But order by did not:
palga=# explain analyze select rapnaam from udps where geboortedatum = 
'1966-01-01' and naamvrouw like 'vos%' order by geboortedatum;
QUERY PLAN
--
 Sort  (cost=18.07..18.08 rows=1 width=18) (actual time=774.941..774.941 
rows=0 loops=1)
   Sort Key: main.geboortedatum
   -  Index Scan using nv on main  (cost=0.00..18.06 rows=1 width=18) (actual 
time=746.121..746.121 rows=0 loops=1)
 Index Cond: (((naamvrouw)::text = 'vos'::character varying) AND 
((naamvrouw)::text  'vot'::character varying))
 Filter: ((geboortedatum = '1966-01-01'::date) AND ((naamvrouw)::text 
~~ 'vos%'::text))
 Total runtime: 775.068 ms
(6 rows)

I got a similar problem with a functional index, but I guess my only option is 
to create a real column with the results of the function, and replace the 
functional index with a real one.

Thanks to all,

Han Holl

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] How to inject knowledge into a Postgres database

2005-10-10 Thread Jim C. Nasby
What's the datatype on naamvrouw? Notice that it's being casted to text,
which means an index on that column won't be used.

On Mon, Oct 10, 2005 at 11:29:38AM +0200, [EMAIL PROTECTED] wrote:
 
 Tom, Oleg, Yonathan,
 
 thanks for the suggestions.
 Indeed, upping the statistics from 10 to 100 helped.
 
 But order by did not:
 palga=# explain analyze select rapnaam from udps where geboortedatum = 
 '1966-01-01' and naamvrouw like 'vos%' order by geboortedatum;
 QUERY PLAN
 --
  Sort  (cost=18.07..18.08 rows=1 width=18) (actual time=774.941..774.941 
 rows=0 loops=1)
Sort Key: main.geboortedatum
-  Index Scan using nv on main  (cost=0.00..18.06 rows=1 width=18) 
 (actual 
 time=746.121..746.121 rows=0 loops=1)
  Index Cond: (((naamvrouw)::text = 'vos'::character varying) AND 
 ((naamvrouw)::text  'vot'::character varying))
  Filter: ((geboortedatum = '1966-01-01'::date) AND ((naamvrouw)::text 
 ~~ 'vos%'::text))
  Total runtime: 775.068 ms
 (6 rows)
 
 I got a similar problem with a functional index, but I guess my only option 
 is 
 to create a real column with the results of the function, and replace the 
 functional index with a real one.
 
 Thanks to all,
 
 Han Holl
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] How to inject knowledge into a Postgres database

2005-10-09 Thread Yonatan Ben-Nes

Oleg Bartunov wrote:

You could increase statistics or try contrib/tsearch2

Oleg
On Fri, 7 Oct 2005, [EMAIL PROTECTED] wrote:



Hello,

We have a table of people with a date-of-birth and a surname, both 
indexed.

We have queries like this:
select report from table where dateofbirth = '1966-12-12' and surname 
like

'boer%'

The planner estimates that in a 1.5M record table 'boer%' will have 
one record

and 1966-12-12 about 40.
Unfortunately, names are not evenly distributed, and in some 
combinations it

will have to process many thousands of records. On some older and slower
systems, this hurts.

Is there a way to tell the planner to always prefer the dateofbirth 
index ?
Alternatively, to inform it about the wildly uneven distribution of 
surnames

(this must be even worse in China).

A third possibility would be to rewrite such a query as a nested 
query: is

there a rewrite query - hook (like apaches mod_rewrite) available ? Or is
there a somewhere a proxy-server that could do this ?

Cheers,

Han Holl

---(end of broadcast)---
TIP 6: explain analyze is your friend



Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


You can also force it to use the index using ORDER BY dateofbirth.

Cheers,
  Ben-Nes Yonatan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] How to inject knowledge into a Postgres database

2005-10-07 Thread han . holl

Hello,

We have a table of people with a date-of-birth and a surname, both indexed.
We have queries like this:
select report from table where dateofbirth = '1966-12-12' and surname like 
'boer%'

The planner estimates that in a 1.5M record table 'boer%' will have one record 
and 1966-12-12 about 40.
Unfortunately, names are not evenly distributed, and in some combinations it 
will have to process many thousands of records. On some older and slower 
systems, this hurts.

Is there a way to tell the planner to always prefer the dateofbirth index ?
Alternatively, to inform it about the wildly uneven distribution of surnames 
(this must be even worse in China).

A third possibility would be to rewrite such a query as a nested query: is 
there a rewrite query - hook (like apaches mod_rewrite) available ? Or is 
there a somewhere a proxy-server that could do this ?

Cheers,

Han Holl

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] How to inject knowledge into a Postgres database

2005-10-07 Thread Tom Lane
[EMAIL PROTECTED] writes:
 Is there a way to tell the planner to always prefer the dateofbirth index ?
 Alternatively, to inform it about the wildly uneven distribution of surnames 
 (this must be even worse in China).

Increase the statistics target for that column, and re-ANALYZE the
table.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] How to inject knowledge into a Postgres database

2005-10-07 Thread Oleg Bartunov

You could increase statistics or try contrib/tsearch2

Oleg
On Fri, 7 Oct 2005, [EMAIL PROTECTED] wrote:



Hello,

We have a table of people with a date-of-birth and a surname, both indexed.
We have queries like this:
select report from table where dateofbirth = '1966-12-12' and surname like
'boer%'

The planner estimates that in a 1.5M record table 'boer%' will have one record
and 1966-12-12 about 40.
Unfortunately, names are not evenly distributed, and in some combinations it
will have to process many thousands of records. On some older and slower
systems, this hurts.

Is there a way to tell the planner to always prefer the dateofbirth index ?
Alternatively, to inform it about the wildly uneven distribution of surnames
(this must be even worse in China).

A third possibility would be to rewrite such a query as a nested query: is
there a rewrite query - hook (like apaches mod_rewrite) available ? Or is
there a somewhere a proxy-server that could do this ?

Cheers,

Han Holl

---(end of broadcast)---
TIP 6: explain analyze is your friend



Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match