On Mon, Sep 29, 2014 at 11:48 AM, Heikki Linnakangas <
hlinnakan...@vmware.com> wrote:

> On 09/15/2014 06:28 PM, Alexander Korotkov wrote:
>
>> Hackers,
>>
>> some GIN opclasses uses collation-aware comparisons while they don't need
>> to do especially collation-aware comparison. Examples are text[] and
>> hstore
>> opclasses.
>>
>
> Hmm. It would be nice to use the index for inequality searches, at least
> on text[]. We don't support that currently, but it would require
> collation-awareness.
>
>  Depending on collation this may make them a much slower.
>>
>> See example.
>>
>> # show lc_collate ;
>>   lc_collate
>> ─────────────
>>   ru_RU.UTF-8
>> (1 row)
>>
>> # create table test as (select array_agg(i::text) from
>> generate_series(1,1000000) i group by (i-1)/10);
>> SELECT 100000
>>
>> # create index test_idx on test using gin(array_agg);
>> CREATE INDEX
>> Time: *26930,423 ms*
>>
>> # create index test_idx2 on test using gin(array_agg collate "C");
>> CREATE INDEX
>> Time: *5143,682 ms*
>>
>> Index creation with collation "ru_RU.UTF-8" is 5 times slower while
>> collation has absolutely no effect on index functionality.
>>
>
> It occurs to me that practically all of those comparisons happen when we
> populate the red-black Tree, during the index build. The purpose of the
> red-black tree is to collect identical keys together, but there is actually
> no requirement that the order of the red-black tree matches the order of
> the index. It also isn't strictly required that it recognizes equal keys as
> equal. The only requirement is that it doesn't incorrectly put two keys
> that are equal according to the compare-function, into two different nodes.
>
>
Good point, Heikki. I experienced several times this problem, fixed it with
C-locale and forgot again. Now, it's time to fix !


> We could therefore use plain memcmp() to compare the Datums while building
> the red-black tree. Keys that are bit-wise equal are surely considered as
> equal by the compare-function. That makes the index build a lot faster.
> With the attached quick patch:
>
> postgres=# create index test_idx on test using gin(array_agg );
> CREATE INDEX
> Time: 880.620 ms
>
> This is on my laptop. Without the patch, that takes about 4.7 seconds with
> the C locale, so this is much faster than even using the C locale.
>

Hmm, on my MBA I got
17277.734 (patch) vs 39151.562 for  ru_RU.UTF-8 and
6131.929 (patch)   vs 6131.929   for  C

Not much :(


>
> - Heikki
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Reply via email to