Re: [Tutor] Removing control characters

2009-02-19 Thread Kent Johnson
On Thu, Feb 19, 2009 at 10:14 AM, Dinesh B Vadhia
dineshbvad...@hotmail.com wrote:
 I want a regex to remove control characters ( chr(32) and  chr(126)) from
 strings ie.

 line = re.sub(r[^a-z0-9-';.],  , line)   # replace all chars NOT A-Z,
 a-z, 0-9, [-';.] with  

 1.  What is the best way to include all the required chars rather than list
 them all within the r ?

You have to list either the chars you want, as you have done, or the
ones you don't want. You could use
r'[\x00-\x1f\x7f-\xff]' or
r'[^\x20-\x7e]'

 2.  How do you handle the inclusion of the quotation mark  ?

Use \, that works even in a raw string.

By the way string.translate() is likely to be faster for this purpose
than re.sub(). This recipe might help:
http://code.activestate.com/recipes/303342/

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Removing control characters

2009-02-19 Thread Mark Tolonen
A regex isn't always the best solution:

 a=''.join(chr(n) for n in range(256))
 a
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f
 
!#$%\'()*+,-./0123456789:;=?...@abcdefghijklmnopqrstuvwxyz[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
 b=''.join(n for n in a if ord(n) = 32 and ord(n) = 126)
 b
' 
!#$%\'()*+,-./0123456789:;=?...@abcdefghijklmnopqrstuvwxyz[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~'

-Mark

  Dinesh B Vadhia dineshbvad...@hotmail.com wrote in message 
news:col103-ds55714842811febeb4a97ca3...@phx.gbl...
  I want a regex to remove control characters ( chr(32) and  chr(126)) from 
strings ie.

  line = re.sub(r[^a-z0-9-';.],  , line)   # replace all chars NOT A-Z, 
a-z, 0-9, [-';.] with   

  1.  What is the best way to include all the required chars rather than list 
them all within the r ?
  2.  How do you handle the inclusion of the quotation mark  ?

  Cheers

  Dinesh




--


  ___
  Tutor maillist  -  Tutor@python.org
  http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Removing control characters

2009-02-19 Thread Dinesh B Vadhia
At the bottom of the link http://code.activestate.com/recipes/303342/ there are 
list comprehensions for string manipulation ie.

import string

str = 'Chris Perkins : 224-7992'
set = '0123456789'
r = '$'

# 1) Keeping only a given set of characters.

print  ''.join([c for c in str if c in set])

 '2247992'

# 2) Deleting a given set of characters.

print  ''.join([c for c in str if c not in set])

 'Chris Perkins : -'

The missing one is

# 3) Replacing a set of characters with a single character ie.

for c in str:
if c in set:
string.replace (c, r)

to give

 'Chris Perkins : $$$-'

My solution is:

print ''.join[string.replace(c, r) for c in str if c in set]

But, this returns a syntax error.  Any idea why?

Ta!

Dinesh




From: Kent Johnson 
Sent: Thursday, February 19, 2009 8:03 AM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] Removing control characters


On Thu, Feb 19, 2009 at 10:14 AM, Dinesh B Vadhia
dineshbvad...@hotmail.com wrote:
 I want a regex to remove control characters ( chr(32) and  chr(126)) from
 strings ie.

 line = re.sub(r[^a-z0-9-';.],  , line)   # replace all chars NOT A-Z,
 a-z, 0-9, [-';.] with  

 1.  What is the best way to include all the required chars rather than list
 them all within the r ?

You have to list either the chars you want, as you have done, or the
ones you don't want. You could use
r'[\x00-\x1f\x7f-\xff]' or
r'[^\x20-\x7e]'

 2.  How do you handle the inclusion of the quotation mark  ?

Use \, that works even in a raw string.

By the way string.translate() is likely to be faster for this purpose
than re.sub(). This recipe might help:
http://code.activestate.com/recipes/303342/

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Removing control characters

2009-02-19 Thread Marc Tompkins
On Thu, Feb 19, 2009 at 11:25 AM, Dinesh B Vadhia dineshbvad...@hotmail.com
 wrote:

 My solution is:

 print ''.join[string.replace(c, r) for c in str if c in set]

 But, this returns a syntax error.  Any idea why?


Probably because you didn't use parentheses - join() is a function.

-- 
www.fsrtechnologies.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Removing control characters

2009-02-19 Thread Kent Johnson
On Thu, Feb 19, 2009 at 2:25 PM, Dinesh B Vadhia
dineshbvad...@hotmail.com wrote:

 # 3) Replacing a set of characters with a single character ie.

 for c in str:
 if c in set:
 string.replace (c, r)

 to give

 'Chris Perkins : $$$-'
 My solution is:

 print ''.join[string.replace(c, r) for c in str if c in set]

With the syntax corrected this will not do what you want; the if c in
set filters the characters in the result, so the result will contain
only the replacement characters. You would need something like
''.join([ (r if c in set else c) for c in str])

Note that both 'set' and 'str' are built-in names and therefore poor
choices for variable names.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Removing control characters

2009-02-19 Thread Dinesh B Vadhia
Okay, here is a combination of Mark's suggestions and yours:

 # string of all chars
 a = ''.join([chr(n) for n in range(256)])
 a
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f
 
!#$%\'()*+,-./0123456789:;=?...@abcdefghijklmnopqrstuvwxyz[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

 # string of wanted chars
 b = ''.join([n for n in a if ord(n) = 32 and ord(n) = 126])
 b
' 
!#$%\'()*+,-./0123456789:;=?...@abcdefghijklmnopqrstuvwxyz[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~'

 # string of unwanted chars  ord(126)
 c = ''.join([n for n in a if ord(n)  32 or ord(n)  126])
 c
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

 # the string to process
 s = Product Concepts\xe2\x80\x94Hard candy with an innovative twist, 
 Internet Archive: Wayback Machine. [online] Mar. 25, 2004. Retrieved from the 
 Internet URL: http://www.confectionery-innovations.com.

 # replace unwanted chars in string s with  
 t = .join([(  if n in c else n) for n in s if n not in c])
 t
'Product ConceptsHard candy with an innovative twist, Internet Archive: Wayback 
Machine. [online] Mar. 25, 2004. Retrieved from the Internet URL: 
http://www.confectionery-innovations.com.'

This last bit doesn't work ie. replacing the unwanted chars with   - eg. 
'ConceptsHard'.  What's missing?

Dinesh



From: Kent Johnson 
Sent: Thursday, February 19, 2009 12:36 PM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] Removing control characters


On Thu, Feb 19, 2009 at 2:25 PM, Dinesh B Vadhia
dineshbvad...@hotmail.com wrote:

 # 3) Replacing a set of characters with a single character ie.

 for c in str:
 if c in set:
 string.replace (c, r)

 to give

 'Chris Perkins : $$$-'
 My solution is:

 print ''.join[string.replace(c, r) for c in str if c in set]

With the syntax corrected this will not do what you want; the if c in
set filters the characters in the result, so the result will contain
only the replacement characters. You would need something like
''.join([ (r if c in set else c) for c in str])

Note that both 'set' and 'str' are built-in names and therefore poor
choices for variable names.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Removing control characters

2009-02-19 Thread Kent Johnson
On Thu, Feb 19, 2009 at 5:41 PM, Dinesh B Vadhia
dineshbvad...@hotmail.com wrote:
 Okay, here is a combination of Mark's suggestions and yours:

 # replace unwanted chars in string s with  
 t = .join([(  if n in c else n) for n in s if n not in c])
 t
 'Product ConceptsHard candy with an innovative twist, Internet Archive:
 Wayback Machine. [online] Mar. 25, 2004. Retrieved from the Internet URL:
 http://www.confectionery-innovations.com.'

 This last bit doesn't work ie. replacing the unwanted chars with   - eg.
 'ConceptsHard'.  What's missing?

The if n not in c at the end of the list comp rejects the unwanted
characters from the result immediately. What you wrote is the same as
t = .join([n for n in s if n not in c])

because n in c will never be true in the first conditional.

BTW if you care about performance, this is the wrong approach. At
least use a set for c; better would be to use translate().

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Removing control characters

2009-02-19 Thread Mark Tolonen


Kent Johnson ken...@tds.net wrote in message 
news:1c2a2c590902191500y71600feerff0b73a88fb49...@mail.gmail.com...

On Thu, Feb 19, 2009 at 5:41 PM, Dinesh B Vadhia
dineshbvad...@hotmail.com wrote:

Okay, here is a combination of Mark's suggestions and yours:



# replace unwanted chars in string s with  
t = .join([(  if n in c else n) for n in s if n not in c])
t

'Product ConceptsHard candy with an innovative twist, Internet Archive:
Wayback Machine. [online] Mar. 25, 2004. Retrieved from the Internet 
URL:

http://www.confectionery-innovations.com.'

This last bit doesn't work ie. replacing the unwanted chars with   - 
eg.

'ConceptsHard'.  What's missing?


The if n not in c at the end of the list comp rejects the unwanted
characters from the result immediately. What you wrote is the same as
t = .join([n for n in s if n not in c])

because n in c will never be true in the first conditional.

BTW if you care about performance, this is the wrong approach. At
least use a set for c; better would be to use translate().


Sorry, I didn't catch the replace with space part.  Kent is right, 
translate is what you want.  The join is still nice for making the 
translation table:


table = ''.join(' ' if n  32 or n  126 else chr(n) for n in 
xrange(256))

string.translate('here is\x01my\xffstring',table)

'here is my string'

-Mark


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor