Re: split() and string.whitespace
MRAB: > I also had the thought that the backtick (`), which is not used in > Python 3, could be used to form character set literals (`aeiou` => > set("aeiou")), although that might only be worth while if character > sets were introduced as an specialised form of set. Python developers have removed it from the syntax mostly because lot of keyboards (probably most in the world) don't have "`" on them. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
On Nov 4, 8:00 pm, [EMAIL PROTECTED] wrote: > MRAB: > > > It's interesting, if you think about it, that here we have someone who > > wants to split on a set of characters but 'split' splits on a string, > > and others sometimes want to strip off a string but 'strip' strips on > > a set of characters (passed as a string). > > That can be seen as a little inconsistency in the language. But with > some practice you learn it. > > > You could imagine that if > > Python had had (character) sets from the start then 'split' and > > 'strip' could have accepted a string or a set depending on whether you > > wanted to split on or stripping off a string or a set. > > Too bad you haven't suggested this when they were designing Python > 3 :-) > This may be suggested for Python 3.1. > I might also add that str.startswith can accept a tuple of strings; shouldn't that have been a set? :-) I also had the thought that the backtick (`), which is not used in Python 3, could be used to form character set literals (`aeiou` => set("aeiou")), although that might only be worth while if character sets were introduced as an specialised form of set. -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
MRAB: > It's interesting, if you think about it, that here we have someone who > wants to split on a set of characters but 'split' splits on a string, > and others sometimes want to strip off a string but 'strip' strips on > a set of characters (passed as a string). That can be seen as a little inconsistency in the language. But with some practice you learn it. > You could imagine that if > Python had had (character) sets from the start then 'split' and > 'strip' could have accepted a string or a set depending on whether you > wanted to split on or stripping off a string or a set. Too bad you haven't suggested this when they were designing Python 3 :-) This may be suggested for Python 3.1. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
That is a very creative solution! Thank you Scott. > Or, for faster per-repetition (blending in to your use-case): > > import string > SEP = string.maketrans('abc \t', ' ') > ... > parts = 'whatever, abalone dudes'.translate(SEP).split() > print parts > > ['wh', 'tever,', 'lone', 'dudes'] -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
Steven D'Aprano wrote: On Fri, 31 Oct 2008 12:18:32 -0700, Chaim Krause wrote: I have arrived here while attempting to break down a larger problem. I got to this question when attempting to split a line on any whitespace character so that I could then add several other characters like ';' and ':'. Ultimately splitting a line on any char in a union of string.whitespace and some pre-designated chars. I am now beginning to think that I have outgrown split() and must move up to regular expressions. If that is the case, I will go off and RTFM on RegEx. Or just do this: s = "the quickbrown\tdog\njumps over\r\n\t the lazy dog" s = s.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ') s.split(' ') or even simpler: s.split() Or, for faster per-repetition (blending in to your use-case): import string SEP = string.maketrans('abc \t', ' ') ... parts = 'whatever, abalone dudes'.translate(SEP).split() print parts ['wh', 'tever,', 'lone', 'dudes'] --Scott David Daniels [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
On Fri, 31 Oct 2008 12:18:32 -0700, Chaim Krause wrote: > I have arrived here while attempting to break down a larger problem. I > got to this question when attempting to split a line on any whitespace > character so that I could then add several other characters like ';' and > ':'. Ultimately splitting a line on any char in a union of > string.whitespace and some pre-designated chars. > > I am now beginning to think that I have outgrown split() and must move > up to regular expressions. If that is the case, I will go off and RTFM > on RegEx. Or just do this: s = "the quickbrown\tdog\njumps over\r\n\t the lazy dog" s = s.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ') s.split(' ') or even simpler: s.split() -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
On Oct 31, 6:57 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > On Fri, 31 Oct 2008 11:53:30 -0700, Chaim Krause wrote: > > I am unable to figure out why the first two statements work as I expect > > them to and the next two do not. Namely, the first two spit the sentence > > into its component words, while the latter two return the whole sentence > > entact. > > > import string > > from string import whitespace > > mytext = "The quick brown fox jumped over the lazy dog.\n" > > > print mytext.split() > > print mytext.split(' ') > > This splits at the string ' '. > > > print mytext.split(whitespace) > > This splits at the string '\t\n\x0b\x0c\r ' which doesn't occur in > `mytext`. The argument is a string not a set of characters. > > > print string.split(mytext, sep=whitespace) > > Same here. > It's interesting, if you think about it, that here we have someone who wants to split on a set of characters but 'split' splits on a string, and others sometimes want to strip off a string but 'strip' strips on a set of characters (passed as a string). You could imagine that if Python had had (character) sets from the start then 'split' and 'strip' could have accepted a string or a set depending on whether you wanted to split on or stripping off a string or a set. -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
On Oct 31, 2:12 pm, Chaim Krause <[EMAIL PROTECTED]> wrote: > The documentation I am referencing states... > > The sep argument may consist of multiple characters (for example, "'1, > 2, 3'.split(', ')" returns "['1', '2', '3']"). > > So why doesn't the latter two split on *any* whitespace character, and > is instead looking for the sep string as a whole? Now, rereading the documentation in light of the replies to my origional posting, I see that I misinterpreted the example as using "comma OR space" when it was actually "commaspace". I am now properly enlightened. Thank you all for your help. -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
I have arrived here while attempting to break down a larger problem. I got to this question when attempting to split a line on any whitespace character so that I could then add several other characters like ';' and ':'. Ultimately splitting a line on any char in a union of string.whitespace and some pre-designated chars. I am now beginning to think that I have outgrown split() and must move up to regular expressions. If that is the case, I will go off and RTFM on RegEx. -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
The documentation I am referencing states... The sep argument may consist of multiple characters (for example, "'1, 2, 3'.split(', ')" returns "['1', '2', '3']"). So why doesn't the latter two split on *any* whitespace character, and is instead looking for the sep string as a whole? -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
On Fri, Oct 31, 2008 at 11:53 AM, Chaim Krause <[EMAIL PROTECTED]> wrote: > I am unable to figure out why the first two statements work as I > expect them to and the next two do not. Namely, the first two spit the > sentence into its component words, while the latter two return the > whole sentence entact. > > import string > from string import whitespace > mytext = "The quick brown fox jumped over the lazy dog.\n" > > print mytext.split() > print mytext.split(' ') > print mytext.split(whitespace) > print string.split(mytext, sep=whitespace) Also note that a plain 'mytext.split()' with no arguments will split on any whitespace character like you're trying to do here. Cheers, Chris -- Follow the path of the Iguana... http://rebertia.com > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
I am unable to figure out why the first two statements work as I expect them to and the next two do not. Namely, the first two spit the sentence into its component words, while the latter two return the whole sentence entact. import string from string import whitespace mytext = "The quick brown fox jumped over the lazy dog.\n" print mytext.split() print mytext.split(' ') print mytext.split(whitespace) print string.split(mytext, sep=whitespace) Split does its work on literal strings, or if a separator is not specified, on a set of data, splits on arbitrary whitespace. For an example, try s = "abcdefgbcdefgh" s.split("c") # ['ab', 'defgb', 'defgh'] s.split("fgb") # ['abcde', 'cdefgh'] string.whitespace is a string, so split() tries to use split on the literal whitespace, not a set of whitespace. -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: split() and string.whitespace
On Fri, 31 Oct 2008 11:53:30 -0700, Chaim Krause wrote: > I am unable to figure out why the first two statements work as I expect > them to and the next two do not. Namely, the first two spit the sentence > into its component words, while the latter two return the whole sentence > entact. > > import string > from string import whitespace > mytext = "The quick brown fox jumped over the lazy dog.\n" > > print mytext.split() > print mytext.split(' ') This splits at the string ' '. > print mytext.split(whitespace) This splits at the string '\t\n\x0b\x0c\r ' which doesn't occur in `mytext`. The argument is a string not a set of characters. > print string.split(mytext, sep=whitespace) Same here. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
split() and string.whitespace
I am unable to figure out why the first two statements work as I expect them to and the next two do not. Namely, the first two spit the sentence into its component words, while the latter two return the whole sentence entact. import string from string import whitespace mytext = "The quick brown fox jumped over the lazy dog.\n" print mytext.split() print mytext.split(' ') print mytext.split(whitespace) print string.split(mytext, sep=whitespace) -- http://mail.python.org/mailman/listinfo/python-list