Re: [SQL] Funtion to clean up strings?

2009-02-12 Thread Dickson S. Guedes
2009/2/12 Andreas :
> Hi,
>
> I need a function that removes characters in strings that aren't in a given
> set of chars.
> e.g
> Input:12-34/ 56(8)
> I want just numbers so Output should in this case be:  1234568
>
> Is there a solution?

One is:

SELECT regexp_replace('12-34/ 56(8)',E'[^0-9]','','g');

-- 
Dickson S. Guedes
-
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Funtion to clean up strings?

2009-02-12 Thread Andreas

Wow, that was fast
THANKS   :)

now ... lets get more complicated.
Phone numbers are entered:
0123/4567-89  national number
0049/123/4567-89 the same number
+49/123/4567-89 still the same number

should come out as   0123456789  to search in this column.
"0049" and "+49"  --> 0

while international numbers
+33/123456789
0033/123456789

should come as
+33123456789

I fear the hard part is that international codes have 1-3 digits   :(


Regards
Andreas


ivan Stoykov wrote:

regexp_replace('120323423 23(fdf_)fd','[^[:digit:]]','','gi')

Andreas wrote:
  

Hi,

I need a function that removes characters in strings that aren't in a
given set of chars.
e.g
Input:12-34/ 56(8)
I want just numbers so Output should in this case be:  1234568

Is there a solution?


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Funtion to clean up strings?

2009-02-12 Thread Raj Mathur
On Friday 13 Feb 2009, Andreas wrote:
> now ... lets get more complicated.
> Phone numbers are entered:
> 0123/4567-89  national number
> 0049/123/4567-89 the same number
> +49/123/4567-89 still the same number
>
> should come out as   0123456789  to search in this column.
> "0049" and "+49"  --> 0
>
> while international numbers
> +33/123456789
> 0033/123456789
>
> should come as
> +33123456789

TEST=> create table foo(p text);

TEST=> insert into foo (select regexp_split_to_table('0123/4567-89 
0049/123/4567-89 +49/123/4567-89 +33/123456789 0033/123456789',' '));

TEST=> select * from foo;
p
--
 0123/4567-89
 0049/123/4567-89
 +49/123/4567-89
 +33/123456789
 0033/123456789
(5 rows)

TEST=> select
(case
  when p ~ E'^(\\+|00)49'
then '0'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), 
E'^(?:\\+|00)49(.*)', E'\\1')
  when p ~ E'^(\\+|00)'
then '+'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), 
E'^(?:\\+||00)(.*)', E'\\1')
  else
regexp_replace(p, E'[^0-9]', '', 'g')
end)
from foo;
 regexp_replace

 0123456789
 0123456789
 0123456789
 +33123456789
 +33123456789
(5 rows)

That do what you want?  (Apologies for the wrapped lines.)

Regards,

-- Raju
-- 
Raj Mathurr...@kandalaya.org  http://kandalaya.org/
   GPG: 78D4 FC67 367F 40E2 0DD5  0FEF C968 D0EF CC68 D17F
PsyTrance & Chill: http://schizoid.in/   ||   It is the mind that moves

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Funtion to clean up strings?

2009-02-17 Thread Andreas

Hi
Thanks, that really works   :)

Now a last extension.
Some numbers were entered in a "110% perfect" way with an excessive (0).  
+49 (0) 123 / 456 789
I have to suspect the source liked to express that it's either +49  or  
0 if the +49 isn't applicable, but not both.
Both together are semantically wrong and your function results therefore 
to "00123456789".
Correct was "0123456789" or e.g. "+33123456789" if it were an 
international number.


This (0) should be silently dropped as long as the endresult has at 
least one 0 or + like in the allready covered cases.


I tried to use this RegEx magic myself as far as I could figure it out, 
yet and came up with replacing every p in your solution with another regex


case
 when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)49'
   then '0'||
  regexp_replace(
 regexp_replace(
regexp_replace(p, E'[^0-9+()]', '', 'g')
 , '\\(0\\)||\\(||\\)', '', 'g')
  , E'^(?:\\+|00)49(.*)', E'\\1')
 when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)'
   then '+'||
  regexp_replace(
regexp_replace(
   regexp_replace(p, E'[^0-9+()]', '', 'g')
, '\\(0\\)||\\(||\\)', '', 'g')
  , E'^(?:\\+||00)(.*)', E'\\1')
 else
   regexp_replace(p, E'[^0-9]', '', 'g')
end

That would catch the leading spaces in "  00 49 ( 0 ) 1 2 3 456 -0", too.
Creating a sql-function thows a WARNING:  nonstandard use of \\ in a 
string literal
but it still works. Do you know a better or more correct way to reach 
the same?


Perhaps one could find a way with less calls to regexp_replace ?

Regards
Andreas  :)


Raj Mathur wrote:

On Friday 13 Feb 2009, Andreas wrote:
  

now ... lets get more complicated.
Phone numbers are entered:
0123/4567-89  national number
0049/123/4567-89 the same number
+49/123/4567-89 still the same number

should come out as   0123456789  to search in this column.
"0049" and "+49"  --> 0

while international numbers
+33/123456789
0033/123456789

should come as
+33123456789



TEST=> create table foo(p text);

TEST=> insert into foo (select regexp_split_to_table('0123/4567-89 
0049/123/4567-89 +49/123/4567-89 +33/123456789 0033/123456789',' '));


TEST=> select * from foo;
p
--
 0123/4567-89
 0049/123/4567-89
 +49/123/4567-89
 +33/123456789
 0033/123456789
(5 rows)

TEST=> select
(case
  when p ~ E'^(\\+|00)49'
then '0'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), 
E'^(?:\\+|00)49(.*)', E'\\1')

  when p ~ E'^(\\+|00)'
then '+'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), 
E'^(?:\\+||00)(.*)', E'\\1')

  else
regexp_replace(p, E'[^0-9]', '', 'g')
end)
from foo;
 regexp_replace

 0123456789
 0123456789
 0123456789
 +33123456789
 +33123456789
(5 rows)

That do what you want?  (Apologies for the wrapped lines.)

Regards,

-- Raju
  



--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Funtion to clean up strings?

2009-02-17 Thread Raj Mathur
On Tuesday 17 Feb 2009, Andreas wrote:
> [snip]
> case
>   when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)49'
> then '0'||
>regexp_replace(
>   regexp_replace(
>  regexp_replace(p, E'[^0-9+()]', '', 'g')
>   , '\\(0\\)||\\(||\\)', '', 'g')
>, E'^(?:\\+|00)49(.*)', E'\\1')
>   when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)'
> then '+'||
>regexp_replace(
>  regexp_replace(
> regexp_replace(p, E'[^0-9+()]', '', 'g')
>  , '\\(0\\)||\\(||\\)', '', 'g')
>, E'^(?:\\+||00)(.*)', E'\\1')
>   else
> regexp_replace(p, E'[^0-9]', '', 'g')
> end
>
> That would catch the leading spaces in "  00 49 ( 0 ) 1 2 3 456 -0",
> too. Creating a sql-function thows a WARNING:  nonstandard use of \\
> in a string literal
> but it still works. Do you know a better or more correct way to reach
> the same?
>
> Perhaps one could find a way with less calls to regexp_replace ?

That is what I would have tried too :)  The only improvement I can think 
of is to replace one instance of regex_replace with a string replace, 
since the string (0) is fixed.

On the other hand, I'm not an expert at Pg functions by any means, so 
someone else may have a faster or more elegant solution.

Regards,

-- Raju
-- 
Raj Mathurr...@kandalaya.org  http://kandalaya.org/
   GPG: 78D4 FC67 367F 40E2 0DD5  0FEF C968 D0EF CC68 D17F
PsyTrance & Chill: http://schizoid.in/   ||   It is the mind that moves

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Funtion to clean up strings?

2009-02-17 Thread Andreas


string_replace would only replace one searchstring at a time.
In this case I need to replace 3 :   (0)  ( )
because there could be some braces not just as (0) since the innermost 
replace spares braces regardless where they are.


Could one express the following in one expression
1)  remove all spaces
2)  remove everything but  0-9+  or  the first occurance  (0)

Leading spaces had to be removed anyway because the distract the CASE 
... WHEN when it checks for "+49..." but gets " +49...".


Then I could use string_replace at the end.

Do you know the correct way to code the backslashes to avoid the Warning?


Regards
Andreas


Raj Mathur schrieb:

On Tuesday 17 Feb 2009, Andreas wrote:
  

[snip]
case
  when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)49'
then '0'||
   regexp_replace(
  regexp_replace(
 regexp_replace(p, E'[^0-9+()]', '', 'g')
  , '\\(0\\)||\\(||\\)', '', 'g')
   , E'^(?:\\+|00)49(.*)', E'\\1')
  when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)'
then '+'||
   regexp_replace(
 regexp_replace(
regexp_replace(p, E'[^0-9+()]', '', 'g')
 , '\\(0\\)||\\(||\\)', '', 'g')
   , E'^(?:\\+||00)(.*)', E'\\1')
  else
regexp_replace(p, E'[^0-9]', '', 'g')
end

That would catch the leading spaces in "  00 49 ( 0 ) 1 2 3 456 -0",
too. Creating a sql-function thows a WARNING:  nonstandard use of \\
in a string literal
but it still works. Do you know a better or more correct way to reach
the same?

Perhaps one could find a way with less calls to regexp_replace ?



That is what I would have tried too :)  The only improvement I can think 
of is to replace one instance of regex_replace with a string replace, 
since the string (0) is fixed.


On the other hand, I'm not an expert at Pg functions by any means, so 
someone else may have a faster or more elegant solution.


Regards,

-- Raju
  



--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Funtion to clean up strings?

2009-02-18 Thread Gregory Stark
Andreas  writes:

> Hi
> Thanks, that really works   :)
>
> Now a last extension.
> Some numbers were entered in a "110% perfect" way with an excessive (0).  +49
> (0) 123 / 456 789
> I have to suspect the source liked to express that it's either +49  or  0 if
> the +49 isn't applicable, but not both.

This is the standard format for phone numbers. Parenthesized digits -- as you
suspected -- represent digits which must only be dialled when using the number
locally and must be omitted from outside.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql