Re: [GENERAL] best practise/pattern for large OR / LIKE searches

2009-08-30 Thread Pavel Stehule
Hello

> regex is compiled to a finite state machine and then the datanumber
> column is scanned in a single pass (for each row)
>
>> Searches are currently taking to long and we would like to optimize
>> them, but before we dive into our own solution we
>> where wondering if there already common solutions for this...
>
> try regex first if that's too slow you may need to write a
> dictionary function that splits datanuimber into it's components
> and use full text index/search. (this will slow down updates as they will do
> upto 20 inserts into the index)
>
> searches should then be optimally fast
>

I did some tests:

1) I fill test table
insert into test SELECT
array_to_string(array_agg(array_to_string(ARRAY(select
substring('01234567890' from (random()*10)::int + 1 for 1) from
generate_series(1,(random()*10+5)::int + i - i)),'')),',') as b from
generate_series(1,10) g(i) group by (random()*1000)::int;

2. I tested searching of 5 or 13 values. I did tests on 8.4 and 8.1

8.1
using like 190ms(440ms*)
using regexp 115ms(259ms*)

* for 13 values - so there regexp is faster than like

on 8.4
using like 80ms(151ms)
using regexp 131ms(267ms)

so like is faster then regexp on 8.4.

fulltext test (8.4)
420ms(470ms) -- without index
14ms(26ms) -- with GiST index
1ms(2ms) -- with Gin index

some samples of test queries:
select * from test where to_tsvector('simple',a) @@
to_tsquery('simple','296426496|7707431116555|98173598191|302598|53174827|02292064629|188631468777|4756243248|920473|16602317|76613513|78640|9176323');

select * from test where a ~
'296426496|7707431116555|98173598191|302598|53174827|02292064629|188631468777|4756243248|920473|16602317|76613513|78640|9176323';

select * from test where a like '%296426496%' or a like '%
7707431116555%' or a like '%98173598191%' or a like '%302598%' or a
like '%53174827%' or a like '%02292064629%' or a like '%188631468777%'
or a like '%4756243248%' or a like '%920473%' or a like '%16602317%'
or a like '%76613513%' or a like '%78640%' or a like '%9176323%';

regards
Pavel Stehule

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] best practise/pattern for large OR / LIKE searches

2009-08-30 Thread Jasen Betts
On 2009-08-26, Ries van Twisk  wrote:
>
> --Apple-Mail-1173-222712773
> Content-Type: text/plain;
>   charset=US-ASCII;
>   format=flowed;
>   delsp=yes
> Content-Transfer-Encoding: 7bit
>
> Hey All,
>
> I am wondering if there is a common pattern for these sort of queries :
>
> SELECT * FROM tbl WHERE datanumber LIKE '%12345%' OR LIKE '%54321%' OR  
> LIKE '%8766%' OR LIKE '%009%', ..

SELECT * FROM tbl WHERE datanumber LIKE ANY 
ARRAY('%12345%','%54321%','%8766%'...)

> The number of OR/LIKES are in the order of 50-100 items...
> the table tbl is a couple of million rows.

regex might perfrom better than LIKE ANY

SELECT * FROM tbl WHERE '12345|54321|8766|009' ~ datanumber;

regex is compiled to a finite state machine and then the datanumber
column is scanned in a single pass (for each row)

> Searches are currently taking to long and we would like to optimize  
> them, but before we dive into our own solution we
> where wondering if there already common solutions for this...

try regex first if that's too slow you may need to write a
dictionary function that splits datanuimber into it's components 
and use full text index/search. (this will slow down updates as they will do
upto 20 inserts into the index)

searches should then be optimally fast

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] best practise/pattern for large OR / LIKE searches

2009-08-26 Thread Christophe Pettus


SELECT * FROM tbl WHERE datanumber LIKE '%12345%' OR LIKE '%54321%'  
OR LIKE '%8766%' OR LIKE '%009%', ..


The number of OR/LIKES are in the order of 50-100 items...
the table tbl is a couple of million rows.


Are the fixed strings in the wildcards "words" (i.e., are they  
completely arbitrarily embedded in the text, or are they delimited in  
some regular way)? If they are "words," you might consider using the  
full text functionality to create an index of them, and searching  
using that.


--
-- Christophe Pettus
   x...@thebuild.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] best practise/pattern for large OR / LIKE searches

2009-08-26 Thread Ries van Twisk

The wildspeed function seems to be what I was looking for.

an dI remember that I have seen it before on the list... just I  
couldn't remember names or anything...


Ries


On Aug 26, 2009, at 7:28 AM, Pavel Stehule wrote:


2009/8/26  :

Hi Pavel,

can you provide some link or other directions to the proposal? I  
guess it

was posted to this list or somewhere else?


Please, ask to Oleg Bartunov

http://www.sai.msu.su/~megera/wiki/wildspeed

regards
Pavel Stehule



Tomas


Hello

one year ago there was proposal for index support for LIKE %some%.  
The

problem was extreme size of index size.

I thing so you can write own C function, that can check string  
faster

than repeated LIKE

some like

SELECT * FROM tbl WHERE contains(datanumber, '12345','54321',)

regards
Pavel Stehule

2009/8/26 Ries van Twisk :

Hey All,
I am wondering if there is a common pattern for these sort of  
queries :
SELECT * FROM tbl WHERE datanumber LIKE '%12345%' OR LIKE  
'%54321%' OR

LIKE
'%8766%' OR LIKE '%009%', ..
The number of OR/LIKES are in the order of 50-100 items...
the table tbl is a couple of million rows.
The datanumber is a string that are maximum 10 characters long, no
spaces
and can contain numbers and letters.
Apart from creating a couple of index table to make the LIKE left
anchored
something like this :
tbl <> tbl_4letters
tbl <> tbl_5letters
tbl <> tbl_3letters
or creating a functional index 'of some sort' are there any other
brilliant
ideas out there to solve such a problem (GIN/GIS???) ?
Searches are currently taking to long and we would like to optimize
them,
but before we dive into our own solution we
where wondering if there already common solutions for this...
Kind Regards,
Ries van Twisk







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general








regards, Ries van Twisk

-
tags: Freelance TYPO3 Glassfish JasperReports JasperETL Flex Blaze-DS  
WebORB PostgreSQL DB-Architect
email: r...@vantwisk.nlweb:   http://www.rvantwisk.nl/ 
skype: callto://r.vantwisk
Phone: +1-810-476-4196Cell: +593 9901 7694   SIP:  
+1-747-690-5133









--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] best practise/pattern for large OR / LIKE searches

2009-08-26 Thread Pavel Stehule
2009/8/26  :
> Hi Pavel,
>
> can you provide some link or other directions to the proposal? I guess it
> was posted to this list or somewhere else?

Please, ask to Oleg Bartunov

http://www.sai.msu.su/~megera/wiki/wildspeed

regards
Pavel Stehule

>
> Tomas
>
>> Hello
>>
>> one year ago there was proposal for index support for LIKE %some%. The
>> problem was extreme size of index size.
>>
>> I thing so you can write own C function, that can check string faster
>> than repeated LIKE
>>
>> some like
>>
>> SELECT * FROM tbl WHERE contains(datanumber, '12345','54321',)
>>
>> regards
>> Pavel Stehule
>>
>> 2009/8/26 Ries van Twisk :
>>> Hey All,
>>> I am wondering if there is a common pattern for these sort of queries :
>>> SELECT * FROM tbl WHERE datanumber LIKE '%12345%' OR LIKE '%54321%' OR
>>> LIKE
>>> '%8766%' OR LIKE '%009%', ..
>>> The number of OR/LIKES are in the order of 50-100 items...
>>> the table tbl is a couple of million rows.
>>> The datanumber is a string that are maximum 10 characters long, no
>>> spaces
>>> and can contain numbers and letters.
>>> Apart from creating a couple of index table to make the LIKE left
>>> anchored
>>> something like this :
>>> tbl <> tbl_4letters
>>> tbl <> tbl_5letters
>>> tbl <> tbl_3letters
>>> or creating a functional index 'of some sort' are there any other
>>> brilliant
>>> ideas out there to solve such a problem (GIN/GIS???) ?
>>> Searches are currently taking to long and we would like to optimize
>>> them,
>>> but before we dive into our own solution we
>>> where wondering if there already common solutions for this...
>>> Kind Regards,
>>> Ries van Twisk
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] best practise/pattern for large OR / LIKE searches

2009-08-26 Thread tv
Hi Pavel,

can you provide some link or other directions to the proposal? I guess it
was posted to this list or somewhere else?

Tomas

> Hello
>
> one year ago there was proposal for index support for LIKE %some%. The
> problem was extreme size of index size.
>
> I thing so you can write own C function, that can check string faster
> than repeated LIKE
>
> some like
>
> SELECT * FROM tbl WHERE contains(datanumber, '12345','54321',)
>
> regards
> Pavel Stehule
>
> 2009/8/26 Ries van Twisk :
>> Hey All,
>> I am wondering if there is a common pattern for these sort of queries :
>> SELECT * FROM tbl WHERE datanumber LIKE '%12345%' OR LIKE '%54321%' OR
>> LIKE
>> '%8766%' OR LIKE '%009%', ..
>> The number of OR/LIKES are in the order of 50-100 items...
>> the table tbl is a couple of million rows.
>> The datanumber is a string that are maximum 10 characters long, no
>> spaces
>> and can contain numbers and letters.
>> Apart from creating a couple of index table to make the LIKE left
>> anchored
>> something like this :
>> tbl <> tbl_4letters
>> tbl <> tbl_5letters
>> tbl <> tbl_3letters
>> or creating a functional index 'of some sort' are there any other
>> brilliant
>> ideas out there to solve such a problem (GIN/GIS???) ?
>> Searches are currently taking to long and we would like to optimize
>> them,
>> but before we dive into our own solution we
>> where wondering if there already common solutions for this...
>> Kind Regards,
>> Ries van Twisk
>>
>>
>>
>>
>>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] best practise/pattern for large OR / LIKE searches

2009-08-25 Thread Pavel Stehule
Hello

one year ago there was proposal for index support for LIKE %some%. The
problem was extreme size of index size.

I thing so you can write own C function, that can check string faster
than repeated LIKE

some like

SELECT * FROM tbl WHERE contains(datanumber, '12345','54321',)

regards
Pavel Stehule

2009/8/26 Ries van Twisk :
> Hey All,
> I am wondering if there is a common pattern for these sort of queries :
> SELECT * FROM tbl WHERE datanumber LIKE '%12345%' OR LIKE '%54321%' OR LIKE
> '%8766%' OR LIKE '%009%', ..
> The number of OR/LIKES are in the order of 50-100 items...
> the table tbl is a couple of million rows.
> The datanumber is a string that are maximum 10 characters long, no spaces
> and can contain numbers and letters.
> Apart from creating a couple of index table to make the LIKE left anchored
> something like this :
> tbl <> tbl_4letters
> tbl <> tbl_5letters
> tbl <> tbl_3letters
> or creating a functional index 'of some sort' are there any other brilliant
> ideas out there to solve such a problem (GIN/GIS???) ?
> Searches are currently taking to long and we would like to optimize them,
> but before we dive into our own solution we
> where wondering if there already common solutions for this...
> Kind Regards,
> Ries van Twisk
>
>
>
>
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] best practise/pattern for large OR / LIKE searches

2009-08-25 Thread Ries van Twisk

Hey All,

I am wondering if there is a common pattern for these sort of queries :

SELECT * FROM tbl WHERE datanumber LIKE '%12345%' OR LIKE '%54321%' OR  
LIKE '%8766%' OR LIKE '%009%', ..


The number of OR/LIKES are in the order of 50-100 items...
the table tbl is a couple of million rows.

The datanumber is a string that are maximum 10 characters long, no  
spaces and can contain numbers and letters.


Apart from creating a couple of index table to make the LIKE left  
anchored something like this :


tbl <> tbl_4letters
tbl <> tbl_5letters
tbl <> tbl_3letters

or creating a functional index 'of some sort' are there any other  
brilliant ideas out there to solve such a problem (GIN/GIS???) ?


Searches are currently taking to long and we would like to optimize  
them, but before we dive into our own solution we

where wondering if there already common solutions for this...

Kind Regards,
Ries van Twisk