Jan,

On Sun, 25 Jul 2010, Jan Urbaski wrote:

On 02/07/10 14:33, Teodor Sigaev wrote:
Patch implements much more accuracy estimation of cost for GIN index
scan than generic cost estimation function.

Hi,

I'm reviewing this patch, and to begin with it I tried to reproduce the
problem that originally came up on -performance in
http://archives.postgresql.org/pgsql-performance/2009-10/msg00393.php

I attached scripts


The links from that mail are now dead, so I set up my own test environment:
* one table testfts(id serial, body text, body_fts tsvector)
* 50000 rows, each with 1000 random words taken from
/usr/share/dict/british-english-insane (the wbritish-insane Debian
package) separated by a single space
* each row also had the word "commonterm" at the end, 80% had
commonterm80, 60% had commonterm60 etc (using the same methodology as
Jesper, that commonterm60 can appear only if commonterm80 is in the row)
* a GIN index on the tsvectors

I was able to reproduce his issue, that is: select id from ftstest where
body_fts @@ to_tsquery('commonterm80'); was choosing a sequential scan,
which was resulting in much longer execution than the bitmap index plan
that I got after disabling seqscans.

I then applied the patch, recompiled PG and tried again... and nothing
changed. I first tried running ANALYSE and then dropping and recreating
the GIN index, but the planner still chooses the seq scan.

read thread
http://archives.postgresql.org/pgsql-hackers/2010-04/msg01407.php
There is always a fuzz factor, as Tom said, about 1% in path cost comparisons.
You may compare plans for 'commonterm60', 'commonterm40'.


Full explains below (the NOTICE is a debugging aid from the patch, which
I temporarily enabled to see if it's picking up the code).

from this debug you can see that cost estimation now are much accurate
than before.


I'll continue reading the code and trying to understand what it does,
but in the meantime: am I doing something wrong that I don't see the
planner switching to the bitmap index plan? I see that the difference in
costs is small, so maybe I just need to tweak the planner knobs a bit?
Is the output below expected?

I think Tom explained this
http://archives.postgresql.org/pgsql-hackers/2010-04/msg01426.php



Cheers,
Jan


wulczer=# explain analyse select id from ftstest where body_fts @@
to_tsquery('commonterm80');
NOTICE:  GIN stats: nEntryPages: 49297.000000 nDataPages: 16951.000000
nPendingPages :0.000000 nEntries: 277521.000000
                                                   QUERY PLAN

------------------------------------------------------------------------------------------------------------------
Seq Scan on ftstest  (cost=0.00..1567.00 rows=39890 width=4) (actual
time=221.893..33179.794 rows=39923 loops=1)
  Filter: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 33256.661 ms
(3 rows)

wulczer=# set enable_seqscan to false;
SET
Time: 0.257 ms
wulczer=# explain analyse select id from ftstest where body_fts @@
to_tsquery('commonterm80');
NOTICE:  GIN stats: nEntryPages: 49297.000000 nDataPages: 16951.000000
nPendingPages :0.000000 nEntries: 277521.000000
                                                            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ftstest  (cost=449.15..1864.50 rows=39890 width=4)
(actual time=107.421..181.284 rows=39923 loops=1)
  Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
  ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..439.18
rows=39890 width=0) (actual time=97.057..97.057 rows=39923 loops=1)
        Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 237.218 ms
(5 rows)

Time: 237.999 ms



        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

a
able
about
account
acid
across
act
addition
adjustment
advertisement
after
again
against
agreement
air
all
almost
among
amount
amusement
and
angle
angry
animal
answer
ant
any
apparatus
apple
approval
arch
argument
arm
army
art
as
at
attack
attempt
attention
attraction
authority
automatic
awake
baby
back
bad
bag
balance
ball
band
base
basin
basket
bath
be
beautiful
because
bed
bee
before
behaviour
belief
bell
bent
berry
between
bird
birth
bit
bite
bitter
black
blade
blood
blow
blue
board
boat
body
boiling
bone
book
boot
bottle
box
boy
brain
brake
branch
brass
bread
breath
brick
bridge
bright
broken
brother
brown
brush
bucket
building
bulb
burn
burst
business
but
butter
button
by
cake
camera
canvas
card
care
carriage
cart
cat
cause
certain
chain
chalk
chance
change
cheap
cheese
chemical
chest
chief
chin
church
circle
clean
clear
clock
cloth
cloud
coal
coat
cold
collar
colour
comb
come
comfort
committee
common
company
comparison
competition
complete
complex
condition
connection
conscious
control
cook
copper
copy
cord
cork
cotton
cough
country
cover
cow
crack
credit
crime
cruel
crush
cry
cup
cup
current
curtain
curve
cushion
damage
danger
dark
daughter
day
dead
dear
death
debt
decision
deep
degree
delicate
dependent
design
desire
destruction
detail
development
different
digestion
direction
dirty
discovery
discussion
disease
disgust
distance
distribution
division
do
dog
door
doubt
down
drain
drawer
dress
drink
driving
drop
dry
dust
ear
early
earth
east
edge
education
effect
egg
elastic
electric
end
engine
enough
equal
error
even
event
ever
every
example
exchange
existence
expansion
experience
expert
eye
face
fact
fall
false
family
far
farm
fat
father
fear
feather
feeble
feeling
female
fertile
fiction
field
fight
finger
fire
first
fish
fixed
flag
flame
flat
flight
floor
flower
fly
fold
food
foolish
foot
for
force
fork
form
forward
fowl
frame
free
frequent
friend
from
front
fruit
full
future
garden
general
get
girl
give
glass
glove
go
goat
gold
good
government
grain
grass
great
green
grey
grip
group
growth
guide
gun
hair
hammer
hand
hanging
happy
harbour
hard
harmony
hat
hate
have
he
head
healthy
hear
hearing
heart
heat
help
high
history
hole
hollow
hook
hope
horn
horse
hospital
hour
house
how
humour
I
ice
idea
if
ill
important
impulse
in
increase
industry
ink
insect
instrument
insurance
interest
invention
iron
island
jelly
jewel
join
journey
judge
jump
keep
kettle
key
kick
kind
kiss
knee
knife
knot
knowledge
land
language
last
late
laugh
law
lead
leaf
learning
leather
left
leg
let
letter
level
library
lift
light
like
limit
line
linen
lip
liquid
list
little
living
lock
long
look
loose
loss
loud
love
low
machine
make
male
man
manager
map
mark
market
married
mass
match
material
may
meal
measure
meat
medical
meeting
memory
metal
middle
military
milk
mind
mine
minute
mist
mixed
money
monkey
month
moon
morning
mother
motion
mountain
mouth
move
much
muscle
music
nail
name
narrow
nation
natural
near
necessary
neck
need
needle
nerve
net
new
news
night
no
noise
normal
north
nose
not
note
now
number
nut
observation
of
off
offer
office
oil
old
on
only
open
operation
opinion
opposite
or
orange
order
organization
ornament
other
out
oven
over
owner
page
pain
paint
paper
parallel
parcel
part
past
paste
payment
peace
pen
pencil
person
physical
picture
pig
pin
pipe
place
plane
plant
plate
play
please
pleasure
plough
pocket
point
poison
polish
political
poor
porter
position
possible
pot
potato
powder
power
present
price
print
prison
private
probable
process
produce
profit
property
prose
protest
public
pull
pump
punishment
purpose
push
put
quality
question
quick
quiet
quite
rail
rain
range
rat
rate
ray
reaction
reading
ready
reason
receipt
record
red
regret
regular
relation
religion
representative
request
respect
responsible
rest
reward
rhythm
rice
right
ring
river
road
rod
roll
roof
room
root
rough
round
rub
rule
run
sad
safe
sail
salt
same
sand
say
scale
school
science
scissors
screw
sea
seat
second
secret
secretary
see
seed
seem
selection
self
send
sense
separate
serious
servant
sex
shade
shake
shame
sharp
sheep
shelf
ship
shirt
shock
shoe
short
shut
side
sign
silk
silver
simple
sister
size
skin

skirt
sky
sleep
slip
slope
slow
small
smash
smell
smile
smoke
smooth
snake
sneeze
snow
so
soap
society
sock
soft
solid
some

son
song
sort
sound
soup
south
space
spade
special
sponge
spoon
spring
square
stage
stamp
star
start
statement
station
steam
steel
stem
step
stick
sticky
stiff
still
stitch
stocking
stomach
stone
stop
store
story
straight
strange
street
stretch
strong
structure
substance
such
sudden
sugar
suggestion
summer
sun
support
surprise
sweet
swim
system
table
tail
take
talk
tall
taste
tax
teaching
tendency
test
than
that
the
then
theory
there
thick
thin
thing
this
thought
thread
throat
through
through
thumb
thunder
ticket
tight
till
time
tin
tired
to
toe
together
tomorrow
tongue
tooth
top
touch
town
trade
train
transport
tray
tree
trick
trouble
trousers
true
turn
twist
umbrella
under
unit
up
use
value
verse
very
vessel
view
violent
voice
waiting
walk
wall
war
warm
wash
waste
watch
water
wave
wax
way
weather
week
weight
well
west
wet
wheel
when
where
while
whip
whistle
white
who
why
wide
will
wind
window
wine
wing
winter
wire
wise
with
woman
wood
wool
word
work
worm
wound
writing
wrong
year
yellow
yes
yesterday
you
young
#!/usr/bin/perl

use strict;
use warnings;
use DBI;
use File::Slurp;

my @words = map { chomp; $_; } read_file("words.txt");
my $words = scalar(@words);

my $avg = 200;
my $spread = 50;
my $docs = 50000;

my $dbh = DBI->connect("dbi:Pg:dbname=ftstest","jk","jk",{autocommit => 1});

my $create = <<END;
drop table if exists ftstest;
create table ftstest (
    id SERIAL UNIQUE, 
    body TEXT not null,
    body_fts tsvector
);

create index ftstest_gin_idx on ftstest using GIN (body_fts);



END

$dbh->do($create);

my $sth = $dbh->prepare("insert into ftstest(body,body_fts) 
values(?,to_tsvector('english',?))");

for(my $i= 0; $i < $docs; $i++){
    my @doc = ();
    my $commonpos = int(rand($spread/2))+$avg;
    for(my $j = 0; $j < $spread/2 + $avg; $j++){
        my $word = int(rand($words));
        push @doc,$words[$word];
        push @doc,"commonterm" if $commonpos == $j; # This should only match 
once. 
        my $rand = rand();
        push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8;
        push @doc,"commonterm60" if $commonpos == $j && $rand < 0.6;
        push @doc,"commonterm40" if $commonpos == $j && $rand < 0.4;
        push @doc,"rareterm30" if $commonpos == $j && $rand < 0.3;
        push @doc,"rareterm20" if $commonpos == $j && $rand < 0.2;
        push @doc,"rareterm10" if $commonpos == $j && $rand < 0.1;
        push @doc,"rareterm5" if $commonpos == $j && $rand < 0.05;
        push @doc,"rareterm1" if $commonpos == $j && $rand < 0.01;
    }
    my $doc = join(" ",@doc);
    $sth->execute($doc,$doc);
}

$dbh->do("analyze");

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to