Re: [HACKERS] workaround for expensive KNN?

PostgreSQL - Hans-Jürgen Schönig Mon, 11 Apr 2011 11:47:48 -0700

price has a problem :(.
"iphone" can be a 20 cents bag or a sticker or a 900 euro thing signed by 
whoever ...
so, words and the sort-number / price are not related in anyway. price is in 
this case no way to narrow down the problem (e.g. evaluate first or so).


        many thanks,

                hans


On Apr 8, 2011, at 5:25 PM, Oleg Bartunov wrote:

> Hans,
> 
> what if you create index (price,title) ?
> 
> 
> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
> 
>> hello ...
>> 
>> i got that one ...
>> 
>>   "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, 
>> title), int_price)
>> 
>> so, i have a combined index on text + number.
>> to me the plan seems fine ... it looks like a prober KNN traversal.
>> the difference between my plan and your plan seems to be the fact that i 
>> have, say, 1 mio rows which have "handy" or so in it (1 mio out of 11 mio or 
>> so). you are moving out from one specific place.
>> 
>> my maths is like that:
>>      11 mio in total
>>      1 mio matching "iphone"
>>      cheapest / most expensive 10 out of this mio needed.
>> 
>> operator classes are all nice and in place:
>> 
>> SELECT 10 <-> 4 as distance;
>> distance
>> ----------
>>       6
>> (1 row)
>> 
>> what does "buffers true" in your case say?
>> 
>> many thanks,
>> 
>>      hans
>> 
>> 
>> On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:
>> 
>>> Probably, you miss two-columnt index. From my early post:
>>> http://www.sai.msu.su/~megera/wiki/knngist
>>> 
>>> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, 
>>> to_tsvector('french',address));
>>> =# SELECT id, address,  (coordinates <-> 
>>> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE 
>>> coordinates >< '(2.29470491409302,48.858263472125)'::point AND 
>>> to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
>>>  id    |                           address                           |      
>>>    dist 
>>> ---------+-------------------------------------------------------------+---------------------
>>> 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 
>>> 2.32488941293945e-05
>>> 4356328 | r Champ de Mars 75007 PARIS                                 |  
>>> 0.00421854756964406
>>> 5200167 | Champ De Mars 75007 Paris                                   |  
>>> 0.00453564562587288
>>> 9301676 | Champ de Mars, 75007 Paris,                                 |  
>>> 0.00453564562587288
>>> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France     |  
>>> 0.00624152097590896
>>> 1923818 | Champ de Mars Paris, France                                 |  
>>> 0.00838214733539654
>>> 5165953 | 39 Rue Champ De Mars Paris, France                          |  
>>> 0.00874410234569529
>>> 7395870 | 39 Rue Champ De Mars Paris, France                          |  
>>> 0.00874410234569529
>>> 4358671 | 32 Rue Champ De Mars Paris, France                          |  
>>> 0.00876089659276339
>>> 1923742 | 12 rue du Champ de Mars Paris, France                       |  
>>> 0.00876764731845995
>>> (10 rows)
>>> 
>>> Time: 7.859 ms
>>> 
>>> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< 
>>> '(2.29470491409302,48.858263472125)'::point
>>> AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
>>> 
>>>                           QUERY PLAN
>>> --------------------------------------------------------------------------------------------------------------------------------------
>>> Limit
>>>  ->  Index Scan using spots_idx on spots
>>>        Index Cond: ((coordinates >< 
>>> '(2.29470491409302,48.858263472125)'::point) AND 
>>> (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery))
>>> (3 rows)
>>> 
>>> 
>>> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
>>> 
>>>> hello all ...
>>>> 
>>>> given oleg's posting before i also wanted to fire up some KNN related 
>>>> question.
>>>> let us consider a simple example. i got some million lines and i want all 
>>>> rows matching a tsquery sorted by price.
>>>> i did some tests:
>>>> 
>>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
>>>> product.t_product WHERE to_tsvector('german', title) @@ 
>>>> to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
>>>>                                                                          
>>>> QUERY PLAN
>>>> 
>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>> --------------------------------------
>>>> Limit  (cost=0.00..41.11 rows=10 width=16) (actual 
>>>> time=36391.717..45542.590 rows=10 loops=1)
>>>> Buffers: shared hit=9 read=5004
>>>> ->  Index Scan using idx_product_t_product_titleprice on t_product  
>>>> (cost=0.00..13251.91 rows=3224 width=16) (actual time=
>>>> 36391.715..45542.573 rows=10 loops=1)
>>>>       Index Cond: (to_tsvector('german'::regconfig, title) @@ 
>>>> '''iphon'''::tsquery)
>>>>       Order By: (int_price <-> 0::bigint)
>>>>       Buffers: shared hit=9 read=5004
>>>> Total runtime: 45542.676 ms
>>>> (7 rows)
>>>> 
>>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
>>>> product.t_product WHERE to_tsvector('german', title) @@ 
>>>> to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10;
>>>>                                                                          
>>>> QUERY PLAN
>>>> 
>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>> -------------------------------------
>>>> Limit  (cost=0.00..41.03 rows=10 width=16) (actual 
>>>> time=7243.526..10935.227 rows=10 loops=1)
>>>> Buffers: shared hit=3 read=2316
>>>> ->  Index Scan using idx_product_t_product_titleprice on t_product  
>>>> (cost=0.00..29762.61 rows=7255 width=16) (actual time=
>>>> 7243.524..10935.217 rows=10 loops=1)
>>>>       Index Cond: (to_tsvector('german'::regconfig, title) @@ 
>>>> '''handy'''::tsquery)
>>>>       Order By: (int_price <-> 0::bigint)
>>>>       Buffers: shared hit=3 read=2316
>>>> Total runtime: 10935.265 ms
>>>> (7 rows)
>>>> 
>>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
>>>> product.t_product WHERE to_tsvector('german', title) @@ 
>>>> to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1;
>>>>                                                                       
>>>> QUERY PLAN
>>>> 
>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>> -------------------------------
>>>> Limit  (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 
>>>> rows=1 loops=1)
>>>> Buffers: shared hit=1 read=1577
>>>> ->  Index Scan using idx_product_t_product_titleprice on t_product  
>>>> (cost=0.00..29762.61 rows=7255 width=16) (actual time=
>>>> 28.525..28.525 rows=1 loops=1)
>>>>       Index Cond: (to_tsvector('german'::regconfig, title) @@ 
>>>> '''handy'''::tsquery)
>>>>       Order By: (int_price <-> 0::bigint)
>>>>       Buffers: shared hit=1 read=1577
>>>> Total runtime: 28.558 ms
>>>> (7 rows)
>>>> 
>>>> 
>>>> under any circumstances - there is no way to reduce the number of buffers 
>>>> needed for a query like that.
>>>> if everything is cached this is still ok but as soon as you have to take a 
>>>> single block from disk you will die a painful random I/O death.
>>>> is there any alternative which does not simply die when i try to achieve 
>>>> what i want?
>>>> 
>>>> the use case is quite simple: all products with a certain word (10 
>>>> cheapest or so).
>>>> 
>>>> is there any alternative approach to this?
>>>> i was putting some hope into KNN but it seems it needs too much random I/O 
>>>> :(.
>>>> 
>>>>    many thanks,
>>>> 
>>>>            hans
>>>> 
>>>> --
>>>> Cybertec Sch?nig & Sch?nig GmbH
>>>> Gr?hrm?hlgasse 26
>>>> A-2700 Wiener Neustadt, Austria
>>>> Web: http://www.postgresql-support.de
>>>> 
>>>> 
>>>> 
>>> 
>>>     Regards,
>>>             Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>> Sternberg Astronomical Institute, Moscow University, Russia
>>> Internet: [email protected], http://www.sai.msu.su/~megera/
>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>> 
>>> --
>>> Sent via pgsql-hackers mailing list ([email protected])
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>> 
>> 
>> --
>> Cybertec Sch?nig & Sch?nig GmbH
>> Gr?hrm?hlgasse 26
>> A-2700 Wiener Neustadt, Austria
>> Web: http://www.postgresql-support.de
>> 
> 
>       Regards,
>               Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: [email protected], http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
> 
> -- 
> Sent via pgsql-hackers mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] workaround for expensive KNN?

Reply via email to