i think there is one more thing which would be really good in GIN and which
would solve a ton of issues.
atm GIN entries are sorted by item pointer.
if we could sort them by a "column" it would fix a couple of real work issues
such as ...
SELECT ... FROM foo WHERE "tsearch_query" ORDER BY price DESC LIMIT 10
... or so.
it many cases you want to search for a, say, product and find the cheapest /
most expensive one.
if the tsearch_query yields a high number of rows (which it often does) the
subsequent sort will kill you.
many thanks,
hans
On Feb 6, 2014, at 12:36 PM, Heikki Linnakangas wrote:
> While hacking on the GIN patches, I've come up with a few different ideas for
> improving performance. It's too late for 9.4, but I'll list them here if
> someone wants to work on them later:
>
> * Represent ItemPointers as uint64's, to speed up comparisons.
> ginCompareItemPointers is inlined into only a few instructions, but it's
> still more expensive than a single-instruction 64-bit comparison.
> ginCompareItemPointers is called very heavily in a GIN scan, so even a small
> improvement there would make for a noticeable speedup. It might be an
> improvement in code clarity, too.
>
> * Keep the entry streams of a GinScanKey in a binary heap, to quickly find
> the minimum curItem among them.
>
> I did this in various versions of the fast scan patch, but then I realized
> that the straightforward way of doing it is wrong, because a single
> GinScanEntry can be part of multiple GinScanKeys. If an entry's curItem is
> updated as part of advancing one key, and the entry is in a heap of another
> key, updating the curItem can violate the heap property of the other entry's
> heap.
>
> * Build a truth table (or cache) of consistent-function's results, and use
> that instead of calling consistent for every item.
>
> * Deduce AND or OR logic from the consistent function. Or have the opclass
> provide a tree of AND/OR/NOT nodes directly, instead of a consistent
> function. For example, if the query is "foo & bar", we could avoid calling
> consistent function altogether, and only return items that match both.
>
> * Delay decoding segments during a scan. Currently, we decode all segments of
> a posting tree page into a single array at once. But with "fast scan", we
> might be able to skip over all entries in some of the segments. So it would
> be better to copy the segments into backend-private memory in compressed
> format, and decode them one segment at a time (or maybe even one item at a
> time), when needed. That would avoid the unnecessary decoding of segments
> that can be skipped over, and would also reduce memory usage of a scan.
>
> I'll add these to the TODO.
>
> - Heikki
>
>
> --
> Sent via pgsql-hackers mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers