On Tue, Jun 25, 2013 at 7:31 PM, Heikki Linnakangas <hlinnakan...@vmware.com
> wrote:

> On 25.06.2013 01:24, Alexander Korotkov wrote:
>
>> On Wed, Jun 19, 2013 at 1:21 AM, Alexander Korotkov<aekorot...@gmail.com>
>> **wrote:
>>
>>  On Mon, Jun 17, 2013 at 10:27 PM, Heikki Linnakangas<
>>> hlinnakan...@vmware.com>  wrote:
>>>
>>> That has some obvious limitations. First of all, you can run out of
>>>
>>>> memory.
>>>>
>>>
>>> Yes, it is so. qsort should be replaced with tuplesort.
>>>
>>
>> In attached patch qsort is replaced with tuplesort. As expected, it leads
>> to some performance drawback, but it's not dramatic.
>> Also, some doc is added for new distance method of GIN.
>>
>
> Thanks. But the fact remains that with this patch, you fetch all the
> tuples and then you sort them, which is exactly the same thing you do
> without the patch. The way it happens without the patch just seems to be
> slower. Time to break out the profiler..
>

Actually, it's no exactly so. gingettuple doesn't touch any heap tuple. It
gets all required information to calculate relevance directly from index
itself. It's possible with respect to additional information which is word
positions for fts. So, patch is doing similar but with significant
difference: source of data for ranking changes from heap to index. The
information required for ranking in index is much more well-localized than
it is in heap.


> I downloaded what I believe to be the same DBLP titles dataset that you
> tested with. Without the patch:
>
> postgres=# explain analyze select * from dblp_titles where tsvector @@
> to_tsquery('english', 'statistics') order by  ts_rank(tsvector,
> to_tsquery('english', 'statistics')) limit 10;
>                                                                 QUERY PLAN
>
> ------------------------------**------------------------------**
> ---------------------
> ------------------------------**---------------------------
>  Limit  (cost=42935.81..42935.84 rows=10 width=64) (actual
> time=57.945..57.948 ro
> ws=10 loops=1)
>    ->  Sort  (cost=42935.81..42980.86 rows=18020 width=64) (actual
> time=57.943..5
> 7.944 rows=10 loops=1)
>          Sort Key: (ts_rank(tsvector, '''statist'''::tsquery))
>
>          Sort Method: top-N heapsort  Memory: 28kB
>          ->  Bitmap Heap Scan on dblp_titles  (cost=211.66..42546.41
> rows=18020 w
> idth=64) (actual time=13.061..46.358 rows=15142 loops=1)
>                Recheck Cond: (tsvector @@ '''statist'''::tsquery)
>                ->  Bitmap Index Scan on gin_title  (cost=0.00..207.15
> rows=18020
> width=0) (actual time=8.339..8.339 rows=15142 loops=1)
>
>                      Index Cond: (tsvector @@ '''statist'''::tsquery)
>  Total runtime: 57.999 ms
> (9 rows)
>
> And the profile looks like this:
>
> 6,94%  postgres            tbm_iterate
> 6,12%  postgres            hash_search_with_hash_value
> 4,40%  postgres            tbm_comparator
> 3,79%  libc-2.17.so        __memcpy_ssse3_back
> 3,68%  postgres            heap_hot_search_buffer
> 2,62%  postgres            slot_deform_tuple
> 2,47%  postgres            nocachegetattr
> 2,37%  postgres            heap_page_prune_opt
> 2,27%  libc-2.17.so        __memcmp_sse4_1
> 2,21%  postgres            heap_fill_tuple
> 2,18%  postgres            pg_qsort
> 1,96%  postgres            tas
> 1,88%  postgres            palloc0
> 1,83%  postgres            calc_rank_or
>
> Drilling into that, tbm_iterate, tbm_comparator and pq_sort calls come
> from the Tidbitmap code, as well as about 1/3 of the
> hash_search_with_hash_value calls. There seems to be a fair amount of
> overhead in building and iterating the tid bitmap. Is that what's killing
> us?
>
> For comparison, this is the same with your patch:
>
> postgres=# explain analyze select * from dblp_titles where tsvector @@
>
> to_tsquery('english', 'statistics') order by tsvector ><
> to_tsquery('english', 'statistics') limit 10;
>                                                                QUERY PLAN
>
> ------------------------------**------------------------------**
> ---------------------
> ------------------------------**--------------------------
>  Limit  (cost=16.00..52.81 rows=10 width=136) (actual time=9.957..9.980
> rows=10 l
> oops=1)
>    ->  Index Scan using gin_title on dblp_titles  (cost=16.00..52198.94
> rows=1417
> 5 width=136) (actual time=9.955..9.977 rows=10 loops=1)
>
>          Index Cond: (tsvector @@ '''statist'''::tsquery)
>          Order By: (tsvector >< '''statist'''::tsquery)
>  Total runtime: 10.084 ms
> (5 rows)
>
> 9,57%  postgres            scanGetItemFast
> 7,02%  postgres            calc_rank_or
> 5,71%  postgres            FunctionCall10Coll
> 5,59%  postgres            entryGetNextItem
> 5,19%  postgres            keyGetOrdering
> 5,13%  postgres            ginDataPageLeafReadItemPointer
> 4,89%  postgres            entryShift
> 4,85%  postgres            ginCompareItemPointers
> 3,44%  postgres            ginDataPageLeafRead
> 3,28%  postgres            AllocSetAlloc
> 3,27%  postgres            insertScanItem
> 3,18%  postgres            gin_tsquery_distance
> 2,38%  postgres            puttuple_common
> 2,26%  postgres            checkcondition_gin
> 2,20%  postgres            cmpEntries
> 2,17%  postgres            AllocSetFreeIndex
> 2,11%  postgres            calc_rank
>
> Unsurprisingly, the tidbitmap overhead is not visible in the profile with
> your patch.
>
> To see how much of the difference is caused by the tidbitmap overhead, I
> wrote the attached quick & dirty patch (it will crash and burn with queries
> that require tidbitmap unions or intersects etc.). When there are fewer
> than 10 items on a page, the tidbitmap keeps the offsets of those items in
> an ordered array of offsets, instead of setting the bits in the bitmap. In
> the test query, most pages have only 1, maybe 2 hits, so that's a lot
> cheaper. Here's what the results look like with that patch:
>
> postgres=# explain analyze select * from dblp_titles where tsvector @@
> to_tsquery('english', 'statistics') order by  ts_rank(tsvector,
> to_tsquery('english', 'statistics')) limit 10;
>                                                                QUERY PLAN
>
> ------------------------------**------------------------------**
> ---------------------
> ------------------------------**-------------------------
>  Limit  (cost=36396.80..36396.82 rows=10 width=136) (actual
> time=19.532..19.532 r
> ows=0 loops=1)
>    ->  Sort  (cost=36396.80..36432.23 rows=14175 width=136) (actual
> time=19.531..
> 19.531 rows=0 loops=1)
>          Sort Key: (ts_rank(tsvector, '''statist'''::tsquery))
>          Sort Method: quicksort  Memory: 25kB
>          ->  Bitmap Heap Scan on dblp_titles  (cost=169.86..36090.48
> rows=14175 w
> idth=136) (actual time=19.525..19.525 rows=0 loops=1)
>                Recheck Cond: (tsvector @@ '''statist'''::tsquery)
>                ->  Bitmap Index Scan on gin_title  (cost=0.00..166.31
> rows=14175
> width=0) (actual time=8.310..8.310 rows=15142 loops=1)
>
>                      Index Cond: (tsvector @@ '''statist'''::tsquery)
>  Total runtime: 19.583 ms
> (9 rows)
>
> So, that's about 3x faster than with no patch, but your patch is still 2x
> faster than this. However, I believe there's still room for improvement
> with this approach. The profile with this quick & dirty patch looks like
> this:
>
> 13,01%  postgres            hash_search_with_hash_value
> 11,03%  postgres            tbm_comparator
>  7,10%  postgres            heap_page_prune_opt
>  6,60%  postgres            pg_qsort
>  4,47%  postgres            expand_table
>  4,06%  postgres            scanGetItemFast
>  3,82%  postgres            tas
>  3,40%  postgres            tbm_iterate
>  2,70%  postgres            PinBuffer
>  2,30%  postgres            hash_any
>  2,16%  postgres            hash_seq_search
>  2,07%  postgres            tbm_get_pageentry
>  1,86%  postgres            calc_bucket
>
> So, a lot of time is still spent in tidbitmap. The overhead of tbm_iterate
> is gone, but much overhead remains elsewhere. The crux of the overhead is
> that when you have only 1-2 hits per page, the tidbitmap is really
> expensive, as it allocates an PagetableEntry struct for each page. I
> believe it would be much cheaper to just accumulate the ItemPointers into a
> simple array in tbm_add_tuples, sort it once, and return results.
>
> In summary: The test case you presented as motivation for this patch is a
> bit of a worst-case scenario for the current tidbitmap implementation. The
> speedup from your patch comes from avoiding the tidbitmap. However, it
> would be fairly easy to optimize the tidbitmap to handle this scenario
> better, which would benefit all kinds of queries that use bitmap scans.
> There is really no reason to complicate the GIN API for this. Let's just
> optimize tidbitmap.
>
> I'm not sure if I fullly understand your patch, though. Is there some
> other test scenario where it performs significantly better, which can not
> be attributed to a tidbitmap overhead? I'm assuming 'no' for now, and
> marking this patch as rejected in the commitfest app, but feel free to
> reopen if there is.


So, it's likely I've positioned this patch wrong from the begging, because
my examples were focused on CPU time improvement. But initial purpose of
this patch was to decrease IO. Actually, CPU time improvement just came as
some kind of bonus which you have unmasked now. On the simple example you
can see how dramatically differs the number of pages used for query
execution.

Without patch it's 8002.

postgres=# explain (analyze, buffers) select * from dblp_titles2 where ts
@@ to_tsquery('english', 'statistics') order by ts_rank(ts,
to_tsquery('english', 'statistics')) limit 10;
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=28651.81..28651.84 rows=10 width=134) (actual
time=18.133..18.135 rows=10 loops=1)
   Buffers: shared hit=*8002*
   ->  Sort  (cost=28651.81..28681.13 rows=11728 width=134) (actual
time=18.132..18.134 rows=10 loops=1)
         Sort Key: (ts_rank(ts, '''statist'''::tsquery))
         Sort Method: top-N heapsort  Memory: 27kB
         Buffers: shared hit=8002
         ->  Bitmap Heap Scan on dblp_titles2  (cost=138.89..28398.37
rows=11728 width=134) (actual time=4.005..14.866 rows=10298 loops=1)
               Recheck Cond: (ts @@ '''statist'''::tsquery)
               Buffers: shared hit=8002
               ->  Bitmap Index Scan on dblp_titles2_idx
 (cost=0.00..135.96 rows=11728 width=0) (actual time=2.546..2.546
rows=10298 loops=1)
                     Index Cond: (ts @@ '''statist'''::tsquery)
                     Buffers: shared hit=13
 Total runtime: 18.177 ms
(13 rows)

And with patch it's 34.

postgres=# explain (analyze, buffers) select * from dblp_titles2 where ts
@@ to_tsquery('english', 'statistics') order by ts >< to_tsquery('english',
'statistics') limit 10;
                                                                   QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=12.00..48.80 rows=10 width=136) (actual time=3.238..3.254
rows=10 loops=1)
   Buffers: shared hit=*34*
   ->  Index Scan using dblp_titles2_idx on dblp_titles2
 (cost=12.00..36929.74 rows=10033 width=136) (actual time=3.237..3.252
rows=10 loops=1)
         Index Cond: (ts @@ '''statist'''::tsquery)
         Order By: (ts >< '''statist'''::tsquery)
         Buffers: shared hit=34
 Total runtime: 3.448 ms
(7 rows)

So, we need in 234 times less pages to execute the query. DBLP titles is
small dataset, but on large dataset ratio of used pages will be similar. On
large databases which isn't entirely in cache the difference is dramatic. I
don't have my experiments on large database now on hand (will provide them
a bit later), but I think you see the point. I return patch to "need
review" because IO aspect worth reviewing.

------
With best regards,
Alexander Korotkov.

Reply via email to