On Mon, Feb 3, 2014 at 6:31 PM, Alexander Korotkov <aekorot...@gmail.com>wrote:
> On Mon, Jan 27, 2014 at 7:30 PM, Alexander Korotkov > <aekorot...@gmail.com>wrote: > >> On Mon, Jan 27, 2014 at 2:32 PM, Alexander Korotkov <aekorot...@gmail.com >> > wrote: >> >>> On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas < >>> hlinnakan...@vmware.com> wrote: >>> >>>> On 01/26/2014 08:24 AM, Tomas Vondra wrote: >>>> >>>>> Hi! >>>>> >>>>> On 25.1.2014 22:21, Heikki Linnakangas wrote: >>>>> >>>>>> Attached is a new version of the patch set, with those bugs fixed. >>>>>> >>>>> >>>>> I've done a bunch of tests with all the 4 patches applied, and it seems >>>>> to work now. I've done tests with various conditions (AND/OR, number of >>>>> words, number of conditions) and I so far I did not get any crashes, >>>>> infinite loops or anything like that. >>>>> >>>>> I've also compared the results to 9.3 - by dumping the database and >>>>> running the same set of queries on both machines, and indeed I got 100% >>>>> match. >>>>> >>>>> I also did some performance tests, and that's when I started to worry. >>>>> >>>>> For example, I generated and ran 1000 queries that look like this: >>>>> >>>>> SELECT id FROM messages >>>>> WHERE body_tsvector @@ to_tsquery('english','(header & 53 & 32 & >>>>> useful & dropped)') >>>>> ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header & 53 >>>>> & >>>>> 32 & useful & dropped)')) DESC; >>>>> >>>>> i.e. in this case the query always was 5 words connected by AND. This >>>>> query is a pretty common pattern for fulltext search - sort by a list >>>>> of >>>>> words and give me the best ranked results. >>>>> >>>>> On 9.3, the script was running for ~23 seconds, on patched HEAD it was >>>>> ~40. It's perfectly reproducible, I've repeated the test several times >>>>> with exactly the same results. The test is CPU bound, there's no I/O >>>>> activity at all. I got the same results with more queries (~100k). >>>>> >>>>> Attached is a simple chart with x-axis used for durations measured on >>>>> 9.3.2, y-axis used for durations measured on patched HEAD. It's obvious >>>>> a vast majority of queries is up to 2x slower - that's pretty obvious >>>>> from the chart. >>>>> >>>>> Only about 50 queries are faster on HEAD, and >700 queries are more >>>>> than >>>>> 50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes >>>>> >150ms >>>>> on HEAD). >>>>> >>>>> Typically, the EXPLAIN ANALYZE looks something like this (on 9.3): >>>>> >>>>> http://explain.depesz.com/s/5tv >>>>> >>>>> and on HEAD (same query): >>>>> >>>>> http://explain.depesz.com/s/1lI >>>>> >>>>> Clearly the main difference is in the "Bitmap Index Scan" which takes >>>>> 60ms on 9.3 and 120ms on HEAD. >>>>> >>>>> On 9.3 the "perf top" looks like this: >>>>> >>>>> 34.79% postgres [.] gingetbitmap >>>>> 28.96% postgres [.] ginCompareItemPointers >>>>> 9.36% postgres [.] TS_execute >>>>> 5.36% postgres [.] check_stack_depth >>>>> 3.57% postgres [.] FunctionCall8Coll >>>>> >>>>> while on 9.4 it looks like this: >>>>> >>>>> 28.20% postgres [.] gingetbitmap >>>>> 21.17% postgres [.] TS_execute >>>>> 8.08% postgres [.] check_stack_depth >>>>> 7.11% postgres [.] FunctionCall8Coll >>>>> 4.34% postgres [.] shimTriConsistentFn >>>>> >>>>> Not sure how to interpret that, though. For example where did the >>>>> ginCompareItemPointers go? I suspect it's thanks to inlining, and that >>>>> it might be related to the performance decrease. Or maybe not. >>>>> >>>> >>>> Yeah, inlining makes it disappear from the profile, and spreads that >>>> time to the functions calling it. >>>> >>>> The profile tells us that the consistent function is called a lot more >>>> than before. That is expected - with the fast scan feature, we're calling >>>> consistent not only for potential matches, but also to refute TIDs based on >>>> just a few entries matching. If that's effective, it allows us to skip many >>>> TIDs and avoid consistent calls, which compensates, but if it's not >>>> effective, it's just overhead. >>>> >>>> I would actually expect it to be fairly effective for that query, so >>>> that's a bit surprising. I added counters to see where the calls are coming >>>> from, and it seems that about 80% of the calls are actually coming from >>>> this little the feature I explained earlier: >>>> >>>> >>>> In addition to that, I'm using the ternary consistent function to check >>>>> if minItem is a match, even if we haven't loaded all the entries yet. >>>>> That's less important, but I think for something like "rare1 | (rare2 & >>>>> frequent)" it might be useful. It would allow us to skip fetching >>>>> 'frequent', when we already know that 'rare1' matches for the current >>>>> item. I'm not sure if that's worth the cycles, but it seemed like an >>>>> obvious thing to do, now that we have the ternary consistent function. >>>>> >>>> >>>> So, that clearly isn't worth the cycles :-). At least not with an >>>> expensive consistent function; it might be worthwhile if we pre-build the >>>> truth-table, or cache the results of the consistent function. >>>> >>>> Attached is a quick patch to remove that, on top of all the other >>>> patches, if you want to test the effect. >>> >>> >>> Every single change you did in fast scan seems to be reasonable, but >>> testing shows that something went wrong. Simple test with 3 words of >>> different selectivities. >>> >>> After applying your patches: >>> >>> # select count(*) from fts_test where fti @@ plainto_tsquery('english', >>> 'gin index select'); >>> count >>> ─────── >>> 627 >>> (1 row) >>> >>> Time: 21,252 ms >>> >>> In original fast-scan: >>> >>> # select count(*) from fts_test where fti @@ plainto_tsquery('english', >>> 'gin index select'); >>> count >>> ─────── >>> 627 >>> (1 row) >>> >>> Time: 3,382 ms >>> >>> I'm trying to get deeper into it. >>> >> >> I had two guesses about why it's become so slower than in my original >> fast-scan: >> 1) Not using native consistent function >> 2) Not sorting entries >> >> I attach two patches which rollback these two features (sorry for awful >> quality of second). Native consistent function accelerates thing >> significantly, as expected. Tt seems that sorting entries have almost no >> effect. However it's still not as fast as initial fast-scan: >> >> # select count(*) from fts_test where fti @@ plainto_tsquery('english', >> 'gin index select'); >> count >> ─────── >> 627 >> (1 row) >> >> Time: 5,381 ms >> >> Tomas, could you rerun your tests with first and both these patches >> applied against patches by Heikki? >> > > I found my patch "0005-Ternary-consistent-implementation.patch" to be > completely wrong. It introduces ternary consistent function to opclass, but > don't uses it, because I forgot to include ginlogic.c change into patch. > So, it shouldn't make any impact on performance. However, testing results > with that patch significantly differs. That makes me very uneasy. Can we > now reproduce exact same? > > Right version of these two patches in one against current head is > attached. I've rerun tests with it, results are > /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun > postprocessing including graph drawing? > > Sometimes test cases are not what we expect. For example: > > =# explain SELECT id FROM messages WHERE body_tsvector @@ > to_tsquery('english','(5alpha1-initdb''d)'); > QUERY PLAN > > > ───────────────────────────────────────────────────────────────────────────────────────────────────────── > Bitmap Heap Scan on messages (cost=84.00..88.01 rows=1 width=4) > Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' & > ''initdb'' & ''d'''::tsquery) > -> Bitmap Index Scan on messages_body_tsvector_idx (cost=0.00..84.00 > rows=1 width=0) > Index Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' & > ''initdb'' & ''d'''::tsquery) > Planning time: 0.257 ms > (5 rows) > > 5alpha1-initdb'd is 3 gin entries with different frequencies. > > Also, these patches are not intended to change relevance ordering speed. > When number of results are high, most of time is relevance calculating and > sorting. I propose to remove ORDER BY clause from test cases to see scan > speed more clear. > > I've dump of postgresql.org search queries from Magnus. We can add them > to our test case. > It turns out that version 10 contained bug in ternary consistent function implementation for tsvector. Fixed in attached version. ------ With best regards, Alexander Korotkov.
gin-fast-scan.11.patch.gz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers