Andrew, All,

> On 5/22/07, Andrew Dunstan <[EMAIL PROTECTED]> wrote:
> > But before I commit this I'd appreciate seeing some more testing, both
> > for correctness and performance.

I finally found some time to test this patch on our data. As our
production database is still using 8.1, I made my tests with 8.1.10
and 8.3devel. As I had very weird results, I tested also 8.2.5.

The patch seems to work as expected in my locale. I didn't notice
problems during the tests I made except for the performance problem I
describe below.

The box is a recent dual core box using CentOS 5. It's a test box
installed specifically to test PostgreSQL 8.3. Every version is
compiled with the same compiler. Locale is fr_FR.UTF-8 and database is
UTF-8 too.
The table used to make the tests fits entirely in RAM.

I tested a simple ILIKE query on our data with 8.3devel and it was far
slower than with 8.1.10 (2 times slower). It was obviously not the
expected result as it should have been faster considering your work.
So I decided to test also with 8.2.5 and it seems a performance
regression was introduced in 8.2 (and not in 8.3 which is in fact a
bit faster than 8.2).

I saw this item in 8.2 release notes:
Allow ILIKE to work for multi-byte encodings (Tom)
Internally, ILIKE now calls lower() and then uses LIKE.
Locale-specific regular expression patterns still do not work in these
encodings.

Could it be responsible of such a slow down?

I attached the results of my tests. If anyone needs more information,
I'll be glad to provide them.

Regards,

--
Guillaume
** Environment **

cityvox=# select version();
                                               version                          
                     
-----------------------------------------------------------------------------------------------------
 PostgreSQL 8.1.10 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 
20070105 (Red Hat 4.1.1-52)
(1 row)

cityvox=# show lc_collate;
 lc_collate  
-------------
 fr_FR.UTF-8
(1 row)

cityvox=# show lc_ctype; 
  lc_ctype   
-------------
 fr_FR.UTF-8
(1 row)
     
cityvox=# \l          
        List of databases
   Name    |  Owner   | Encoding 
-----------+----------+----------
 cityvox   | postgres | UTF8
 postgres  | postgres | UTF8
 template0 | postgres | UTF8
 template1 | postgres | UTF8
(4 rows)

cityvox=# show shared_buffers;
 shared_buffers 
----------------
 16384
(1 row)

cityvox=# show work_mem;
 work_mem 
----------
 32768
(1 row)

** Seqscan on the table **

cityvox=# select count(*) from evenement;
 count  
--------
 128780
(1 row)

Time: 57.335 ms
cityvox=# select count(*) from evenement;
 count  
--------
 128780
(1 row)

Time: 57.317 ms

** Query with LIKE **

cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus 
pocus%' OR e.mots_cle LIKE '%hocus pocus%';
 numeve 
--------
(0 rows)

Time: 188.312 ms
cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus 
pocus%' OR e.mots_cle LIKE '%hocus pocus%';
 numeve 
--------
(0 rows)

Time: 188.235 ms

** Query with ILIKE **

cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus 
pocus%' OR e.mots_cle ILIKE '%hocus pocus%';
  numeve   
-----------
 900024298
     87578
    161108
(3 rows)

Time: 227.048 ms
cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus 
pocus%' OR e.mots_cle ILIKE '%hocus pocus%';
  numeve   
-----------
 900024298
     87578
    161108
(3 rows)

Time: 226.586 ms

cityvox=# EXPLAIN ANALYZE SELECT e.numeve FROM evenement e WHERE e.libgeseve 
ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%';
                                                QUERY PLAN                      
                           
-----------------------------------------------------------------------------------------------------------
 Seq Scan on evenement e  (cost=0.00..6743.01 rows=1 width=4) (actual 
time=45.907..226.702 rows=3 loops=1)
   Filter: (((libgeseve)::text ~~* '%hocus pocus%'::text) OR ((mots_cle)::text 
~~* '%hocus pocus%'::text))
 Total runtime: 226.736 ms
(3 rows)

Time: 227.216 ms

** Query with only one condition with ILIKE **

cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus 
pocus%';
  numeve   
-----------
 900024298
     87578
(2 rows)

Time: 177.318 ms

** Environment **

cityvox=# select version();
                                              version                           
                    
----------------------------------------------------------------------------------------------------
 PostgreSQL 8.2.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 
20070105 (Red Hat 4.1.1-52)
(1 row)

cityvox=# show lc_collate;
 lc_collate  
-------------
 fr_FR.UTF-8
(1 row)

cityvox=# show lc_ctype;
  lc_ctype   
-------------
 fr_FR.UTF-8
(1 row)

cityvox=# \l
        List of databases
   Name    |  Owner   | Encoding 
-----------+----------+----------
 cityvox   | postgres | UTF8
 postgres  | postgres | UTF8
 template0 | postgres | UTF8
 template1 | postgres | UTF8
(4 rows)

cityvox=# show shared_buffers ;
 shared_buffers 
----------------
 128MB
(1 row)

cityvox=# show work_mem;
 work_mem 
----------
 32MB
(1 row)

** Seqscan on the table **

cityvox=# select count(*) from evenement;
 count  
--------
 128780
(1 row)

Time: 34.524 ms
cityvox=# select count(*) from evenement;
 count  
--------
 128780
(1 row)

Time: 34.509 ms

** Query with LIKE **

cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus 
pocus%' OR e.mots_cle LIKE '%hocus pocus%';
 numeve 
--------
(0 rows)

Time: 180.982 ms
cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus 
pocus%' OR e.mots_cle LIKE '%hocus pocus%';
 numeve 
--------
(0 rows)

Time: 164.684 ms

** Query with ILIKE **

cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus 
pocus%' OR e.mots_cle ILIKE '%hocus pocus%';
  numeve   
-----------
 900024298
     87578
    161108
(3 rows)

Time: 617.654 ms
cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus 
pocus%' OR e.mots_cle ILIKE '%hocus pocus%';
  numeve   
-----------
 900024298
     87578
    161108
(3 rows)

Time: 617.568 ms

cityvox=# EXPLAIN ANALYZE SELECT e.numeve FROM evenement e WHERE e.libgeseve 
ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%';
                                                 QUERY PLAN                     
                            
------------------------------------------------------------------------------------------------------------
 Seq Scan on evenement e  (cost=0.00..6743.70 rows=1 width=4) (actual 
time=125.531..614.073 rows=3 loops=1)
   Filter: (((libgeseve)::text ~~* '%hocus pocus%'::text) OR ((mots_cle)::text 
~~* '%hocus pocus%'::text))
 Total runtime: 614.105 ms
(3 rows)

** Query with only one condition with ILIKE **

cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus 
pocus%';
  numeve   
-----------
 900024298
     87578
(2 rows)

Time: 411.671 ms
** Environment **

cityvox=# select version(); 
                                                version                         
                       
-------------------------------------------------------------------------------------------------------
 PostgreSQL 8.3devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 
20070105 (Red Hat 4.1.1-52)
(1 row)

cityvox=# show lc_collate;
 lc_collate  
-------------
 fr_FR.UTF-8
(1 row)

cityvox=# show lc_ctype;
  lc_ctype   
-------------
 fr_FR.UTF-8
(1 row)

cityvox=# \l
        List of databases
   Name    |  Owner   | Encoding 
-----------+----------+----------
 cityvox   | postgres | UTF8
 postgres  | postgres | UTF8
 template0 | postgres | UTF8
 template1 | postgres | UTF8
(4 rows)

cityvox=# show shared_buffers ;
 shared_buffers 
----------------
 128MB
(1 row)

cityvox=# show work_mem ;      
 work_mem 
----------
 32MB
(1 row)

** Seqscan on the table **

cityvox=# select count(*) from evenement;
 count  
--------
 128780
(1 row)

Time: 33.181 ms
cityvox=# select count(*) from evenement;
 count  
--------
 128780
(1 row)

Time: 33.152 ms

** Query with LIKE **

cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus 
pocus%' OR e.mots_cle LIKE '%hocus pocus%';
 numeve 
--------
(0 rows)

Time: 148.927 ms
cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve LIKE '%hocus 
pocus%' OR e.mots_cle LIKE '%hocus pocus%';
 numeve 
--------
(0 rows)

Time: 148.931 ms

** Query with ILIKE **

cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus 
pocus%' OR e.mots_cle ILIKE '%hocus pocus%';
  numeve   
-----------
 900024298
     87578
    161108
(3 rows)

Time: 597.008 ms
cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus 
pocus%' OR e.mots_cle ILIKE '%hocus pocus%';
  numeve   
-----------
 900024298
     87578
    161108
(3 rows)

Time: 597.340 ms

cityvox=# EXPLAIN ANALYZE SELECT e.numeve FROM evenement e WHERE e.libgeseve 
ILIKE '%hocus pocus%' OR e.mots_cle ILIKE '%hocus pocus%';
                                                 QUERY PLAN                     
                            
------------------------------------------------------------------------------------------------------------
 Seq Scan on evenement e  (cost=0.00..5821.35 rows=1 width=4) (actual 
time=122.567..598.229 rows=3 loops=1)
   Filter: (((libgeseve)::text ~~* '%hocus pocus%'::text) OR ((mots_cle)::text 
~~* '%hocus pocus%'::text))
 Total runtime: 598.263 ms
(3 rows)

Time: 598.899 ms

** Query with only one condition with ILIKE **

cityvox=# SELECT e.numeve FROM evenement e WHERE e.libgeseve ILIKE '%hocus 
pocus%';
  numeve   
-----------
 900024298
     87578
(2 rows)

Time: 399.534 ms
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to