Serhiy Storchaka added the comment: It is possible to change this behavior (see example patch). With this patch:
>>> re.split(r'(?<=CA)(?=GCTG)', 'ACGTCAGCTGAAACCCCAGCTGACGTACGT') ['ACGTCA', 'GCTGAAACCCCA', 'GCTGACGTACGT'] >>> re.split(r'\b', "the quick, brown fox") ['', 'the', ' ', 'quick', ', ', 'brown', ' ', 'fox', ''] But unfortunately this is backward incompatible change and will likely break existing code (and breaks tests). Consider following example: re.split('(:*)', 'ab'). Currently the result is ['ab'], but with the patch it is ['', '', 'a', '', 'b', '', '']. In third-part regex module [1] there is the V1 flag which switches incompatible bahavior change. >>> regex.split('(:*)', 'ab') ['ab'] >>> regex.split('(?V1)(:*)', 'ab') ['', '', 'a', '', 'b', '', ''] >>> regex.split(r'(?<=CA)(?=GCTG)', 'ACGTCAGCTGAAACCCCAGCTGACGTACGT') ['ACGTCAGCTGAAACCCCAGCTGACGTACGT'] >>> regex.split(r'(?V1)(?<=CA)(?=GCTG)', 'ACGTCAGCTGAAACCCCAGCTGACGTACGT') ['ACGTCA', 'GCTGAAACCCCA', 'GCTGACGTACGT'] >>> regex.split(r'\b', "the quick, brown fox") ['the quick, brown fox'] >>> regex.split(r'(?V1)\b', "the quick, brown fox") ['', 'the', ' ', 'quick', ', ', 'brown', ' ', 'fox', ''] I don't know how to solve this issue without introducing such flag (or adding special boolean argument to re.split()). As a workaround I suggest you to use the regex module. [1] https://pypi.python.org/pypi/regex ---------- keywords: +patch Added file: http://bugs.python.org/file37147/re_split_zero_width.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue22817> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com