On 27/05/2006 9:51 AM, BJörn Lindqvist wrote: >> how can i split a string that contains white spaces and '_' >> >> any clue? > > If the white spaces and the '_' should be applied equivalently on the > input and you can enumerate all white space characters, you could do > like this:
Yes, you could write out the whitespace characters for the 8-bit encoding of your choice, or you could find them using Python (and get some possibly surprising answers): >>> mkws = lambda enc, sz=256: "".join([chr(i) for i in range(sz) if chr(i).decode(enc, 'ignore').isspace()]) >>> mkws('cp1252') '\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \xa0' >>> mkws('latin1') '\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \x85\xa0' >>> mkws('cp1251') '\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \xa0' >>> mkws('ascii', 128) '\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f ' and compare the last one with the result for the C locale: >>> "".join([chr(i) for i in range(256) if chr(i).isspace()]) '\t\n\x0b\x0c\r ' > > def split_helper(list, delims): > if not delims: > return list > ch = delims[0] > lst = [] > for item in list: > lst += split_helper(item.split(ch), delims[1:]) > return lst > > def split(str, delims): > return split_helper([str], delims) > >>>> split("foo_bar eh", "_ ") > ['foo', 'bar', 'eh'] > > Though I bet someone will post a one-line solution in the next 30 > minutes. :) Two one-liners, depending on what the OP really wants: >>> re.split(r"[\s_]", "foo_bar zot plugh _ xyzzy") ['foo', 'bar', '', '', '', '', '', 'zot', 'plugh', '', '', 'xyzzy'] which is what your ever-so-slightly-baroque effort does :-) or >>> re.split(r"[\s_]+", "foo_bar zot plugh _ xyzzy") ['foo', 'bar', 'zot', 'plugh', 'xyzzy'] Cheers, John -- http://mail.python.org/mailman/listinfo/python-list