[GENERAL] Regex Query Index question

2011-08-11 Thread Naoko Reeves
Hello,
I have query phone number in database as follows:

[123) 456-7890

(123) 456-7890

When I query like this:

SELECT * FROM phone

WHERE phone_number ~ ('^\[123' || '[-\s\)]{0,2}' || '456' || '[-\s\)]{0,2}'
|| '7890')

it use Index but if I query like this (notice first character is
open parenthesis instead of open square blacket ) :

SELECT phn_fk_key FROM phn WHERE

phn_fk_table = 14

AND llx_decrypt(phn_phone_enc) ~ ('^\(123' || '[-\s\)]{0,2}' || '456' ||
'[-\s\)]{0,2}' || '7890')

It doesn't use Index

co-worker suggested me to use chr(40) instead so I tried this:

SELECT phn_fk_key FROM phn WHERE

phn_fk_table = 14

AND llx_decrypt(phn_phone_enc) ~ ('^\' || chr(40) || '123' || '[-\s\)]{0,2}'
|| '456' || '[-\s\)]{0,2}' || '7890')

No success...

Also { and period doesn't seems to use index either but } ) [ ] $ #
works.

Could you guide me to right direction for me please?


Thank you very much for your time in advance.


Naoko Reeves


Re: [GENERAL] Regex Query Index question

2011-08-11 Thread Tom Lane
Naoko Reeves naokoree...@gmail.com writes:
 I have query phone number in database as follows:
 [123) 456-7890
 (123) 456-7890

 When I query like this:

 SELECT * FROM phone

 WHERE phone_number ~ ('^\[123' || '[-\s\)]{0,2}' || '456' || '[-\s\)]{0,2}'
 || '7890')

 it use Index but if I query like this (notice first character is
 open parenthesis instead of open square blacket ) :

 SELECT phn_fk_key FROM phn WHERE

 phn_fk_table = 14

 AND llx_decrypt(phn_phone_enc) ~ ('^\(123' || '[-\s\)]{0,2}' || '456' ||
 '[-\s\)]{0,2}' || '7890')

 It doesn't use Index

Probably it thinks the index isn't selective enough for that case.  How
many entries are there starting with (123?

(BTW, I assume you've got standard_conforming_strings turned on, else
there are some other issues with these backslashes ...)

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Regex Query Index question

2011-08-11 Thread Naoko Reeves
Tom,
Thank you for your quick reply. Data start with (123 only returns 28
records where as phone number start with[123 returns 1.
Changed the data so that both will return 1 row.

One with (999 query takes about 30 seconds (30983ms) without index.
One with [999 take about 28 ms with index.

Yes, standard_conforming_strings is ON.
Also forgot to mentioned the version:
select version()  PostgreSQL 8.4.6 on i386-apple-darwin, compiled by GCC
i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5370),
32-bit

Thank you very much for your time.

Naoko Reeves

On Thu, Aug 11, 2011 at 3:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Naoko Reeves naokoree...@gmail.com writes:
  I have query phone number in database as follows:
  [123) 456-7890
  (123) 456-7890

  When I query like this:

  SELECT * FROM phone

  WHERE phone_number ~ ('^\[123' || '[-\s\)]{0,2}' || '456' ||
 '[-\s\)]{0,2}'
  || '7890')

  it use Index but if I query like this (notice first character is
  open parenthesis instead of open square blacket ) :

  SELECT phn_fk_key FROM phn WHERE

  phn_fk_table = 14

  AND llx_decrypt(phn_phone_enc) ~ ('^\(123' || '[-\s\)]{0,2}' || '456' ||
  '[-\s\)]{0,2}' || '7890')

  It doesn't use Index

 Probably it thinks the index isn't selective enough for that case.  How
 many entries are there starting with (123?

 (BTW, I assume you've got standard_conforming_strings turned on, else
 there are some other issues with these backslashes ...)

regards, tom lane




-- 
Naoko Reeves
http://www.anypossibility.com/


Re: [GENERAL] Regex Query Index question

2011-08-11 Thread David Johnston


On Aug 11, 2011, at 18:26, Naoko Reeves naokoree...@gmail.com wrote:

 Hello,
 I have query phone number in database as follows:
 [123) 456-7890
 
 (123) 456-7890
 
Store phone numbers without formatting...the data is the numbers themselves the 
formatting is presentation.
 When I query like this:
 
 SELECT * FROM phone 
 
 WHERE phone_number ~ ('^\[123' || '[-\s\)]{0,2}' || '456' || '[-\s\)]{0,2}' 
 || '7890')
 
 it use Index but if I query like this (notice first character is open 
 parenthesis instead of open square blacket ) :
 
 SELECT phn_fk_key FROM phn WHERE 
 
 phn_fk_table = 14 
 
 AND llx_decrypt(phn_phone_enc) ~ ('^\(123' || '[-\s\)]{0,2}' || '456' || 
 '[-\s\)]{0,2}' || '7890')
 
 It doesn't use Index
 
The left side of the two where clauses are different fields/expressions.  Since 
you do not specify what your table and indexes look like your problem is 
impossible to solve but likely has nothing to do with RegEx.  Keep in mind, 
however, that an index can only be used if the pattern is fully anchored.  With 
alternation in the RegEx you want the^ outside of the part the part that uses 
| otherwise only the first expression ends up being anchored.  E.g,  '^(a|b)' 
!='^a|b'. The first one matches a string that stars with a or b whereas the 
second matches a string that starts with a or contains b anywhere in the 
string.  The second one cannot use the index since it is not guaranteed to be 
anchored at the start of a string.
 co-worker suggested me to use chr(40) instead so I tried this:
 
 SELECT phn_fk_key FROM phn WHERE 
 
 phn_fk_table = 14 
 
 AND llx_decrypt(phn_phone_enc) ~ ('^\' || chr(40) || '123' || '[-\s\)]{0,2}' 
 || '456' || '[-\s\)]{0,2}' || '7890')
 
 No success...
 
 Also { and period doesn't seems to use index either but } ) [ ] $ # 
 works.Could you guide me to right direction for me please?
 
Particularly with RegEx you want to tell people what you are trying to do and 
not just give the expressions themselves.

Not testing here but... and ignore whitespace

'^( [ \[\( ]? \s* \d{3} \s* [ -\s\]\) ] \d{3} [ -\s ] \d{4} )$'

The above should match both of your samples and use the index on the regular 
phone column. If you want to store encrypted and search the unencrypted you 
have to create a functional index.  See documentation for syntax and 
requirements.

In this case you can replace the \d{n} with your desired search strings.

It would be a lot simpler if you strip out the non-numbers, via functional 
index if needed, and perform an equality string search.  The question becomes, 
using the example data above, what happens if two people have the same phone 
number with only the format being different.  The answer is the difference 
between a unique index and a non-unique one...

example: create index name on table (clean_and_decrypt_phone(enc_phone))

Where clean_and_decrypt_phone(enc_phone) = clean_phone( search_string )

This can be done without changing columns, only indexes and queries.

David J.

Re: [GENERAL] Regex Query Index question

2011-08-11 Thread Tom Lane
Naoko Reeves naokoree...@gmail.com writes:
 Also forgot to mentioned the version:
 select version()  PostgreSQL 8.4.6 on i386-apple-darwin, compiled by GCC
 i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5370),

Oh --- there's your problem.  In 8.4 and earlier, we don't trust \( to be
a literal character in a regex pattern, because it's not a literal if
you have regex_flavor set to 'basic'.  9.0 and up removed that option,
so it works as-expected in newer versions.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Regex Query Index question

2011-08-11 Thread David Johnston
Now that I read more closely the alternation is actually concatenation.  My 
point still stands but your issue is that you have not created a functional 
index on the decryption result of the encrypted phone number.  PostgreSQL does 
not know that the decrypted phone number is equivalent to the unencrypted 
field.  It only can look at expressions to determine whether an index is usable 
- not values.  A table can have more than one index.

David J. 

On Aug 11, 2011, at 19:58, David Johnston pol...@yahoo.com wrote:

 
 
 Not testing here but... and ignore whitespace
 
 '^( [ \[\( ]? \s* \d{3} \s* [ -\s\]\) ] \d{3} [ -\s ] \d{4} )$'
 
 Some tweaks needed but seriously consider dropping RegEx and going the 
 functional index route.
 
 '^( [ \[\( ]? \s* \d{3} \s* [ -\s\]\) ] \s* \d{3} \s* [ -\s ] \s* \d{4} )$'
 
 Added some extra white-space checking but again not tested.
 
 You can probably find better/more flexible expressions online.
 
 David J.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Regex Query Index question

2011-08-11 Thread David Johnston

 
 Not testing here but... and ignore whitespace
 
 '^( [ \[\( ]? \s* \d{3} \s* [ -\s\]\) ] \d{3} [ -\s ] \d{4} )$'
 
Some tweaks needed but seriously consider dropping RegEx and going the 
functional index route.

 '^( [ \[\( ]? \s* \d{3} \s* [ -\s\]\) ] \s* \d{3} \s* [ -\s ] \s* \d{4} )$'

Added some extra white-space checking but again not tested.

You can probably find better/more flexible expressions online.

David J.
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general