OK, I redid my trials with the same data set on 7.2.3 --with-multibyte and I get the same brutal performance hit, so it is definitely a multibyte-specific problem. WRT the distribution of the data in the table, I used the following: All g-words in /usr/share/dict with different processes attached: no process init caps. word || row_id etc...
There are only about 1000 words that appear more than once (2 or 3 times) in 27k rows. -Wade Klaver At 11:08 PM 2/3/03 -0500, Tom Lane wrote: >Next question: may I guess that you weren't using MULTIBYTE in 7.2? > >After still more digging, I'm coming round to the opinion that the >problem is that MULTIBYTE is forced on in 7.3, and this imposes a >factor-of-256 overhead in a bunch of the operations in regcomp.c. >In particular, compiling a case-independent regex is now hugely >more expensive than it used to be. > >The parties who wanted to force MULTIBYTE on promised that there >would be no such penalties :-( > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster