The regular expression "split" behaves slightly differently than string split:

>>> import re
>>> kresplit = re.compile(r'[^\w\&]+',re.UNICODE)  

>>> kresplit2.split("   HELLO    THERE   ")
['', 'HELLO', 'THERE', '']

>>> kresplit2.split("VERISIGN INC.")
['VERISIGN', 'INC', '']

I'd thought that "split" would never produce an empty string, but
it will.

The regular string split operation doesn't yield empty strings:

>>> "   HELLO   THERE ".split()
['HELLO', 'THERE']

If I try to get the functionality of string split with re:

>>> s2 = "   HELLO   THERE  "
>>> kresplit4 = re.compile(r'\W+', re.UNICODE)
>>> kresplit4.split(s2)
['', 'HELLO', 'THERE', '']

I still get empty strings.

The documentation just describes re.split as "Split string by the occurrences of pattern", which is not too helpful.

                                        John Nagle
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to