On Sat, Dec 14, 2013 at 7:04 PM, Andres Freund <and...@2ndquadrant.com>wrote:

> Hi,
>
> > > >  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
> > > > time=0.097..0.099 rows=10 loops=1)
> > > >    ->  Sort  (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> > > > time=0.096..0.097 rows=10 loops=1)
> > > >          Sort Key: v1, v2
> > > >          Sort Method: top-N heapsort  Memory: 25kB
> > > >          ->  Index Scan using test_v1_idx on test
>  (cost=0.42..47604.42
> > > > rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
> > > >  Total runtime: 0.125 ms
> > > > (6 rows)
> > >
> > > Is that actually all that beneficial when sorting with a bog standard
> > > qsort() since that doesn't generally benefit from data being
> pre-sorted?
> > > I think we might need to switch to a different algorithm to really
> > > benefit from mostly pre-sorted input.
> > >
> >
> > In this patch I don't do full sort of dataset. For instance, index
> returns
> > data ordered by first column and we need to order them also by second
> > column.
>
> Ah, that makes sense.
>
> > But, I don't think we should expect pre-sorted values of second column
> > inside a group.
>
> Yes, if you do it that way, there doesn't seem to any need to assume
> that any more than we usually do.
>
> I think you should make the explain output reflect the fact that we're
> assuming v1 is presorted and just sorting v2. I'd be happy enough with:
> Sort Key: v1, v2
> Partial Sort: v2
> or even just
> "Partial Sort Key: [v1,] v2"
> but I am sure others disagree.
>

Sure, I just didn't change explain output yet. It should look like what you
propose.

> > > *partial-knn-1.patch*
>
> > > Rechecking from the heap means adding a sort node though, which I don't
> > > see here? Or am I misunderstanding something?
>
> > KNN-GiST contain RB-tree of scanned items. In this patch item is
> rechecked
> > inside GiST and reinserted into same RB-tree. It appears to be much
> easier
> > implementation for PoC and also looks very efficient. I'm not sure what
> is
> > actually right design for it. This is what I like to discuss.
>
> I don't have enough clue about gist to say wether it's the right design,
> but it doesn't look wrong to my eyes. It'd probably be useful to export
> the knowledge that we are rechecking and how often that happens to the
> outside.
> While I didn't really look into the patch, I noticed in passing that you
> pass a all_dead variable to heap_hot_search_buffer without using the
> result - just pass NULL instead, that performs a bit less work.


Useful notice, thanks.

------
With best regards,
Alexander Korotkov.

Reply via email to