Need a Regular expression to remove a char for Unicode text

2006-10-13 Thread శ్రీనివాస
Hai friends,
Can any one tell me how can i remove a character from a unocode text.
కల్‌హార is a Telugu word in Unicode. Here i want to
remove '' but not replace with a zero width char. And one more thing,
if any whitespaces are there before and after '' char, the text should
be kept as it is. Please tell me how can i workout this with regular
expressions.

Thanks and regards
Srinivasa Raju Datla

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Need a Regular expression to remove a char for Unicode text

2006-10-13 Thread harvey . thomas

శ్రీనివాస wrote:
 Hai friends,
 Can any one tell me how can i remove a character from a unocode text.
 కల్‌హార is a Telugu word in Unicode. Here i want to
 remove '' but not replace with a zero width char. And one more thing,
 if any whitespaces are there before and after '' char, the text should
 be kept as it is. Please tell me how can i workout this with regular
 expressions.

 Thanks and regards
 Srinivasa Raju Datla

Don't know anything about Telugu, but is this the approach you want?

 x=u'\xfe\xff  \xfe\xff \xfe\xff\xfe\xff'
 noampre = re.compile('(?!\s)(?!\s)', re.UNICODE).sub
 noampre('', x)
u'\xfe\xff  \xfe\xff \xfe\xff\xfe\xff'

The regular expression has negative look behind and look ahead
assertions to check that there is no whitespace surrounding the ''
character. Each match then found is then  replaced with the empty string

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Need a Regular expression to remove a char for Unicode text

2006-10-13 Thread Sybren Stuvel
శ్రీనివాస enlightened us with:
 Can any one tell me how can i remove a character from a unocode
 text.  కల్200cహార is a Telugu word in Unicode. Here i want to
 remove '' but not replace with a zero width char. And one more
 thing, if any whitespaces are there before and after '' char, the
 text should be kept as it is.

So basically, you want to match 200c and replace it with 200c,
but only if it's not surrounded by whitespace, right?

r(?!\s)\x200c(?!\s) should match. I'm sure you'll be able to take
it from there.

Sybren
-- 
Sybren Stüvel
Stüvel IT - http://www.stuvel.eu/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need a Regular expression to remove a char for Unicode text

2006-10-13 Thread Leo Kislov


On Oct 13, 4:44 am, [EMAIL PROTECTED] wrote:
 శ్రీనివాస wrote:
  Hai friends,
  Can any one tell me how can i remove a character from a unocode text.
  కల్‌హార is a Telugu word in Unicode. Here i want to
  remove '' but not replace with a zero width char. And one more thing,
  if any whitespaces are there before and after '' char, the text should
  be kept as it is. Please tell me how can i workout this with regular
  expressions.

  Thanks and regards
  Srinivasa Raju DatlaDon't know anything about Telugu, but is this the 
  approach you want?

  x=u'\xfe\xff  \xfe\xff \xfe\xff\xfe\xff'
  noampre = re.compile('(?!\s)(?!\s)', re.UNICODE).sub
  noampre('', x)

He wants to replace  with zero width joiner so the last call should be
noampre(u\u200D, x)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Need a Regular expression to remove a char for Unicode text

2006-10-13 Thread Leo Kislov
On Oct 13, 4:55 am, Leo Kislov [EMAIL PROTECTED] wrote:
 On Oct 13, 4:44 am, [EMAIL PROTECTED] wrote:

  శ్రీనివాస wrote:
   Hai friends,
   Can any one tell me how can i remove a character from a unocode text.
   కల్‌హార is a Telugu word in Unicode. Here i want to
   remove '' but not replace with a zero width char. And one more thing,
   if any whitespaces are there before and after '' char, the text should
   be kept as it is. Please tell me how can i workout this with regular
   expressions.

   Thanks and regards
   Srinivasa Raju DatlaDon't know anything about Telugu, but is this the 
   approach you want?

   x=u'\xfe\xff  \xfe\xff \xfe\xff\xfe\xff'
   noampre = re.compile('(?!\s)(?!\s)', re.UNICODE).sub
   noampre('', x)

 He wants to replace  with zero width joiner so the last call should be
 noampre(u\u200D, x)

Pardon my poor reading comprehension, OP doesn't want zero width
joiner. Though I'm confused why he mentioned it at all.

-- 
http://mail.python.org/mailman/listinfo/python-list