On Mon, November 26, 2012 20:49, Alexander Korotkov wrote:
> trgm-regexp-0.6.patch.gz
I ran the simple-minded tests against generated data (similar to the ones I did
in January 2012).
The problems of that older version seem pretty much all removed. (although I
didn't do much work
on it -- just reran these tests).
I used two 2 instances, 'HEAD' and 'trgm_regex', which were both compiled with
'--enable-depend' '--with-openssl' '--with-perl' '--with-libxml'
Tables used:
rowcount size tablesize index (trgm)
azjunk4 10^4 rows 1,171,456 | 9,781,248
azjunk5 10^5 rows 11,706,368 | 65,093,632
azjunk6 10^6 rows117,030,912 | 726,310,912
azjunk7 10^7 rows 1,170,292,736 | 4,976,189,440
(See my previous emails for a generating script)
Tables contain random generated text:
table azjunk7 limit 5;
txt
--
i kzzhv ssaa zv x xlepzxsgbdkxev v wn dmvqkuwb qxkyvgab gpidaosaqbewqimmai
jxj
bvwn zbevtzyhibbn hoctxurutn pvlatjjyxf f runa owpltbcunrbq ux peoook
rxwoscbytz
bbjlbbhhkivjivklgbh tvapzogh rj ky ahvgkvvlfudotvqapznludohdoyqrp
kvothyclbckbxu
hvic gomewbp izsjifqggyqgzcghdat lb kud ltfqaxqxjjom qkw wqggikgvph yi
sftmbjt
edbjfl vtcasudjpgfgjaf swooxygsse flnqd pxzsdmesqhqbzgirswysote muakq agk p w
uq
(5 rows)
with index on column 'txt':
create index az7_idx on azjunk7 using gin (txt gin_trgm_ops);
Queries were of the form:
explain analyze select txt from azjunkXX where txt ~ '$REGEX';
The main problem with the January version was that it chose to use the trgm
index even when it
could take a long time (hours). This has been resolved as far as I can see,
and the results are
now very attractive.
(There does seem to be a very slight regression on the seqscan, but it's so
small that I'm not yet
sure it's not noise)
Hardware: AMD FX-8120 with Linux 2.6.32-279.14.1.el6.x86_64 x86_64 GNU/Linux
PostgreSQL
9.3devel-trgm_regex-20121127_2353-e78d288c895bd296e3cb1ca29c7fe2431eef3fcd on
x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.7.2, 64-bit
port instancetableregex rows method
expl.analyze timing
6543 HEADazjunk4 x[ae]q 46 Seq Scan
12.962 ms
6554 trgm_regex azjunk4 x[ae]q 46 Bitmap Heap Scan
0.800 ms
6543 HEADazjunk4 x[ae]{1}q 46 Seq Scan
12.487 ms
6554 trgm_regex azjunk4 x[ae]{1}q 46 Bitmap Heap Scan
0.209 ms
6543 HEADazjunk4 x[ae]{1,1}q 46 Seq Scan
12.266 ms
6554 trgm_regex azjunk4 x[ae]{1,1}q 46 Bitmap Heap Scan
0.210 ms
6543 HEADazjunk4 x[ae]{,2}q 0 Seq Scan
14.322 ms
6554 trgm_regex azjunk4 x[ae]{,2}q 0 Bitmap Heap Scan
0.610 ms
6543 HEADazjunk4 x[ae]{,10}q 0 Seq Scan
20.503 ms
6554 trgm_regex azjunk4 x[ae]{,10}q 0 Bitmap Heap Scan
0.511 ms
6543 HEADazjunk4 x[ae]{1,2}q 49 Seq Scan
13.060 ms
6554 trgm_regex azjunk4 x[ae]{1,2}q 49 Bitmap Heap Scan
0.429 ms
6543 HEADazjunk4 x[aei]q 81 Seq Scan
12.487 ms
6554 trgm_regex azjunk4 x[aei]q 81 Bitmap Heap Scan
0.367 ms
6543 HEADazjunk4 x[aei]{1}q 81 Seq Scan
12.132 ms
6554 trgm_regex azjunk4 x[aei]{1}q 81 Bitmap Heap Scan
0.336 ms
6543 HEADazjunk4 x[aei]{1,1}q81 Seq Scan
12.168 ms
6554 trgm_regex azjunk4 x[aei]{1,1}q81 Bitmap Heap Scan
0.319 ms
6543 HEADazjunk4 x[aei]{,2}q 0 Seq Scan
14.586 ms
6554 trgm_regex azjunk4 x[aei]{,2}q 0 Bitmap Heap Scan
0.621 ms
6543 HEADazjunk4 x[aei]{,10}q 0 Seq Scan
20.134 ms
6554 trgm_regex azjunk4 x[aei]{,10}q 0 Bitmap Heap Scan
0.552 ms
6543 HEADazjunk4 x[aei]{1,2}q89 Seq Scan
12.553 ms
6554 trgm_regex azjunk4 x[aei]{1,2}q89 Bitmap Heap Scan
0.916 ms
6543 HEADazjunk4 x[aei]{1,3}q89 Seq Scan
13.055 ms
6554 trgm_regex azjunk4 x[aei]{1,3}q89 Seq Scan
13.064 ms
6543 HEADazjunk4 x[aei]q 81 Seq Scan
11.856 ms
6554 trgm_regex azjunk4 x[aei]q 81 Bitmap Heap Scan
0.398 ms
6543 HEADazjunk4 x[aei]{1}q 81 Seq Scan
11.951 ms
6554 trgm_regex azjunk4 x[aei]{1}q 81 Bitmap Heap Scan
0.369 ms
6543 HEADazjunk4 x[aei]{1,1}q81 Seq Scan
12.750 ms
6554 trgm_regex azjunk4 x[aei