Hi. Ok, ok thanks.
My problem is to shorten time of searching full text stored in text field. The table definition is like following: CREATE TABLE xxx ( ... title character varying, ... fts1body text, ... ) If user requests keywords, we use a kind of "stristr" that is targeting Japanese text encoded in UTF-8. "aaa bbb ccc" [Click here to search!] SELECT * FROM xxx WHERE TRUE AND (ddstrike(title,'aaa') OR ddstrike(fts1body,'aaa') OR ...) AND (ddstrike(title,'bbb') OR ddstrike(fts1body,'bbb') OR ...) AND ... As you can imagine easily, yes, it is very slow! So I need trial and error for speeding up. My trial is "Insert a light weight filter done by integer key, before text searching!" For example, filter('A') -> 1 filter('B') -> 2 filter('C') -> 4 filter('AAABBC') -> 7 or {1,2,4} It may fit to inverse index like GIN! So I began to study GIN. I'm sorry to say. Today I found I could apply int array GIN support at contrib/_int.sql. I made GIN index. CREATE INDEX xxx_idx_filter ON xxx USING GIN (filter(fts1body) gist__int_ops); The following sample query is very very fast! 11065 hits in 22 milli secs (total 215,278 records). SELECT COUNT(*) FROM xxx WHERE filter(fts1body) @> filter('ABC'); However the following query is very slow! 9,400ms. It uses "Seq Scan" lol. SELECT * FROM xxx WHERE TRUE AND (ddstrike(title,'ABC') OR (filter(fts1body) @> filter('AAA') AND ddstrike(fts1body,'AAA'))) Apply filter to "title" column too. The best query result costs 3,700ms. 18 hits. It surely uses expected query plan: two "Bitmap index scan" -> "Bitmap Or" -> "Bitmap Heap Scan". SELECT * FROM xxx WHERE TRUE AND (filter(title) @> filter('ABC') OR filter(fts1body) @> filter('ABC')) AND (ddstrike(title,'ABC') OR ddstrike(fts1body,'ABC')) The pure query costs 3,800ms. 18 hits. Single "Seq Scan". SELECT * FROM xxx WHERE TRUE AND (ddstrike(title,'ABC') OR ddstrike(fts1body,'ABC')) Finally I noticed I had spent useless time, and need to find another good one. Sorry. However, I may think good idea which uses inverted index. So I want to know... - the actual work of extractQuery and consistant. - the detail interface of extractValue/extractQuery/consistant. It may help understanding. I looked at contrib/_int.sql of PG8.2.22 There are definitions of int[] GIN support. --- CREATE OPERATOR CLASS gin__int_ops FOR TYPE _int4 USING gin AS OPERATOR 3 &&, OPERATOR 6 = (anyarray, anyarray) RECHECK, OPERATOR 7 @>, OPERATOR 8 <@ RECHECK, OPERATOR 13 @, OPERATOR 14 ~ RECHECK, OPERATOR 20 @@ (_int4, query_int), FUNCTION 1 btint4cmp (int4, int4), FUNCTION 2 ginarrayextract (anyarray, internal), FUNCTION 3 ginint4_queryextract (internal, internal, int2), FUNCTION 4 ginint4_consistent (internal, int2, internal), STORAGE int4; --- I checked the PG8.2.22 source code. Both ginint4_queryextract and ginint4_consistent assume that "query" argument is a PGARRAY (ArrayType *). Where is it decided? Is it array of STORAGE type? Both extractQuery(ginint4_queryextract) and extractValue(ginarrayextract) seem to return similar value type. They return Datum array of int4. Is it array of STORAGE type? I want to understand the overview of GIN extension. Thanks kenji uno -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers