Need a Regular expression to remove a char for Unicode text
Hai friends, Can any one tell me how can i remove a character from a unocode text. కల్హార is a Telugu word in Unicode. Here i want to remove '' but not replace with a zero width char. And one more thing, if any whitespaces are there before and after '' char, the text should be kept as it is. Please tell me how can i workout this with regular expressions. Thanks and regards Srinivasa Raju Datla -- http://mail.python.org/mailman/listinfo/python-list
Re: Need a Regular expression to remove a char for Unicode text
శ్రీనివాస wrote: Hai friends, Can any one tell me how can i remove a character from a unocode text. కల్హార is a Telugu word in Unicode. Here i want to remove '' but not replace with a zero width char. And one more thing, if any whitespaces are there before and after '' char, the text should be kept as it is. Please tell me how can i workout this with regular expressions. Thanks and regards Srinivasa Raju Datla Don't know anything about Telugu, but is this the approach you want? x=u'\xfe\xff \xfe\xff \xfe\xff\xfe\xff' noampre = re.compile('(?!\s)(?!\s)', re.UNICODE).sub noampre('', x) u'\xfe\xff \xfe\xff \xfe\xff\xfe\xff' The regular expression has negative look behind and look ahead assertions to check that there is no whitespace surrounding the '' character. Each match then found is then replaced with the empty string -- http://mail.python.org/mailman/listinfo/python-list
Re: Need a Regular expression to remove a char for Unicode text
à°¶à±à°°à±à°¨à°¿à°µà°¾à°¸ enlightened us with: Can any one tell me how can i remove a character from a unocode text. à°à°²à±200cహార is a Telugu word in Unicode. Here i want to remove '' but not replace with a zero width char. And one more thing, if any whitespaces are there before and after '' char, the text should be kept as it is. So basically, you want to match 200c and replace it with 200c, but only if it's not surrounded by whitespace, right? r(?!\s)\x200c(?!\s) should match. I'm sure you'll be able to take it from there. Sybren -- Sybren Stüvel Stüvel IT - http://www.stuvel.eu/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Need a Regular expression to remove a char for Unicode text
On Oct 13, 4:44 am, [EMAIL PROTECTED] wrote: శ్రీనివాస wrote: Hai friends, Can any one tell me how can i remove a character from a unocode text. కల్హార is a Telugu word in Unicode. Here i want to remove '' but not replace with a zero width char. And one more thing, if any whitespaces are there before and after '' char, the text should be kept as it is. Please tell me how can i workout this with regular expressions. Thanks and regards Srinivasa Raju DatlaDon't know anything about Telugu, but is this the approach you want? x=u'\xfe\xff \xfe\xff \xfe\xff\xfe\xff' noampre = re.compile('(?!\s)(?!\s)', re.UNICODE).sub noampre('', x) He wants to replace with zero width joiner so the last call should be noampre(u\u200D, x) -- http://mail.python.org/mailman/listinfo/python-list
Re: Need a Regular expression to remove a char for Unicode text
On Oct 13, 4:55 am, Leo Kislov [EMAIL PROTECTED] wrote: On Oct 13, 4:44 am, [EMAIL PROTECTED] wrote: శ్రీనివాస wrote: Hai friends, Can any one tell me how can i remove a character from a unocode text. కల్హార is a Telugu word in Unicode. Here i want to remove '' but not replace with a zero width char. And one more thing, if any whitespaces are there before and after '' char, the text should be kept as it is. Please tell me how can i workout this with regular expressions. Thanks and regards Srinivasa Raju DatlaDon't know anything about Telugu, but is this the approach you want? x=u'\xfe\xff \xfe\xff \xfe\xff\xfe\xff' noampre = re.compile('(?!\s)(?!\s)', re.UNICODE).sub noampre('', x) He wants to replace with zero width joiner so the last call should be noampre(u\u200D, x) Pardon my poor reading comprehension, OP doesn't want zero width joiner. Though I'm confused why he mentioned it at all. -- http://mail.python.org/mailman/listinfo/python-list