Re: split() and string.whitespace

2008-11-04 Thread bearophileHUGS
MRAB:
> I also had the thought that the backtick (`), which is not used in
> Python 3, could be used to form character set literals (`aeiou` =>
> set("aeiou")), although that might only be worth while if character
> sets were introduced as an specialised form of set.

Python developers have removed it from the syntax mostly because lot
of keyboards (probably most in the world) don't have "`" on them.

Bye,
bearophile
--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-11-04 Thread MRAB
On Nov 4, 8:00 pm, [EMAIL PROTECTED] wrote:
> MRAB:
>
> > It's interesting, if you think about it, that here we have someone who
> > wants to split on a set of characters but 'split' splits on a string,
> > and others sometimes want to strip off a string but 'strip' strips on
> > a set of characters (passed as a string).
>
> That can be seen as a little inconsistency in the language. But with
> some practice you learn it.
>
> > You could imagine that if
> > Python had had (character) sets from the start then 'split' and
> > 'strip' could have accepted a string or a set depending on whether you
> > wanted to split on or stripping off a string or a set.
>
> Too bad you haven't suggested this when they were designing Python
> 3 :-)
> This may be suggested for Python 3.1.
>
I might also add that str.startswith can accept a tuple of strings;
shouldn't that have been a set? :-)

I also had the thought that the backtick (`), which is not used in
Python 3, could be used to form character set literals (`aeiou` =>
set("aeiou")), although that might only be worth while if character
sets were introduced as an specialised form of set.
--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-11-04 Thread bearophileHUGS
MRAB:
> It's interesting, if you think about it, that here we have someone who
> wants to split on a set of characters but 'split' splits on a string,
> and others sometimes want to strip off a string but 'strip' strips on
> a set of characters (passed as a string).

That can be seen as a little inconsistency in the language. But with
some practice you learn it.


> You could imagine that if
> Python had had (character) sets from the start then 'split' and
> 'strip' could have accepted a string or a set depending on whether you
> wanted to split on or stripping off a string or a set.

Too bad you haven't suggested this when they were designing Python
3 :-)
This may be suggested for Python 3.1.

Bye,
bearophile
--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-11-04 Thread Chaim Krause
That is a very creative solution! Thank you Scott.

> Or, for faster per-repetition (blending in to your use-case):
>
>      import string
>      SEP = string.maketrans('abc \t', '     ')
>      ...
>      parts = 'whatever, abalone dudes'.translate(SEP).split()
>      print parts
>
> ['wh', 'tever,', 'lone', 'dudes']
--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-11-03 Thread Scott David Daniels

Steven D'Aprano wrote:

On Fri, 31 Oct 2008 12:18:32 -0700, Chaim Krause wrote:

I have arrived here while attempting to break down a larger problem. I
got to this question when attempting to split a line on any whitespace
character so that I could then add several other characters like ';' and
':'. Ultimately splitting a line on any char in a union of
string.whitespace and some pre-designated chars.

I am now beginning to think that I have outgrown split() and must move
up to regular expressions. If that is the case, I will go off and RTFM
on RegEx.


Or just do this:
s = "the quickbrown\tdog\njumps over\r\n\t the lazy dog"
s = s.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
s.split(' ')
or even simpler:
s.split()

Or, for faster per-repetition (blending in to your use-case):

import string
SEP = string.maketrans('abc \t', ' ')
...
parts = 'whatever, abalone dudes'.translate(SEP).split()
print parts

['wh', 'tever,', 'lone', 'dudes']


--Scott David Daniels
[EMAIL PROTECTED]
--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-10-31 Thread Steven D'Aprano
On Fri, 31 Oct 2008 12:18:32 -0700, Chaim Krause wrote:

> I have arrived here while attempting to break down a larger problem. I
> got to this question when attempting to split a line on any whitespace
> character so that I could then add several other characters like ';' and
> ':'. Ultimately splitting a line on any char in a union of
> string.whitespace and some pre-designated chars.
> 
> I am now beginning to think that I have outgrown split() and must move
> up to regular expressions. If that is the case, I will go off and RTFM
> on RegEx.

Or just do this:

s = "the quickbrown\tdog\njumps over\r\n\t the lazy dog"
s = s.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
s.split(' ')


or even simpler:

s.split()


-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-10-31 Thread MRAB
On Oct 31, 6:57 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
> On Fri, 31 Oct 2008 11:53:30 -0700, Chaim Krause wrote:
> > I am unable to figure out why the first two statements work as I expect
> > them to and the next two do not. Namely, the first two spit the sentence
> > into its component words, while the latter two return the whole sentence
> > entact.
>
> > import string
> > from string import whitespace
> > mytext = "The quick brown fox jumped over the lazy dog.\n"
>
> > print mytext.split()
> > print mytext.split(' ')
>
> This splits at the string ' '.
>
> > print mytext.split(whitespace)
>
> This splits at the string '\t\n\x0b\x0c\r ' which doesn't occur in
> `mytext`.  The argument is a string not a set of characters.
>
> > print string.split(mytext, sep=whitespace)
>
> Same here.
>

It's interesting, if you think about it, that here we have someone who
wants to split on a set of characters but 'split' splits on a string,
and others sometimes want to strip off a string but 'strip' strips on
a set of characters (passed as a string). You could imagine that if
Python had had (character) sets from the start then 'split' and
'strip' could have accepted a string or a set depending on whether you
wanted to split on or stripping off a string or a set.

--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-10-31 Thread Chaim Krause
On Oct 31, 2:12 pm, Chaim Krause <[EMAIL PROTECTED]> wrote:
> The documentation I am referencing states...
>
> The sep argument may consist of multiple characters (for example, "'1,
> 2, 3'.split(', ')" returns "['1', '2', '3']").
>
> So why doesn't the latter two split on *any* whitespace character, and
> is instead looking for the sep string as a whole?

Now, rereading the documentation in light of the replies to my
origional posting, I see that I misinterpreted the example as using
"comma OR space" when it was actually "commaspace". I am now properly
enlightened.

Thank you all for your help.
--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-10-31 Thread Chaim Krause
I have arrived here while attempting to break down a larger problem. I
got to this question when attempting to split a line on any whitespace
character so that I could then add several other characters like ';'
and ':'. Ultimately splitting a line on any char in a union of
string.whitespace and some pre-designated chars.

I am now beginning to think that I have outgrown split() and must move
up to regular expressions. If that is the case, I will go off and RTFM
on RegEx.
--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-10-31 Thread Chaim Krause
The documentation I am referencing states...

The sep argument may consist of multiple characters (for example, "'1,
2, 3'.split(', ')" returns "['1', '2', '3']").

So why doesn't the latter two split on *any* whitespace character, and
is instead looking for the sep string as a whole?
--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-10-31 Thread Chris Rebert
On Fri, Oct 31, 2008 at 11:53 AM, Chaim Krause <[EMAIL PROTECTED]> wrote:
> I am unable to figure out why the first two statements work as I
> expect them to and the next two do not. Namely, the first two spit the
> sentence into its component words, while the latter two return the
> whole sentence entact.
>
> import string
> from string import whitespace
> mytext = "The quick brown fox jumped over the lazy dog.\n"
>
> print mytext.split()
> print mytext.split(' ')
> print mytext.split(whitespace)
> print string.split(mytext, sep=whitespace)

Also note that a plain 'mytext.split()' with no arguments will split
on any whitespace character like you're trying to do here.

Cheers,
Chris
-- 
Follow the path of the Iguana...
http://rebertia.com

> --
> http://mail.python.org/mailman/listinfo/python-list
>
--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-10-31 Thread Tim Chase

I am unable to figure out why the first two statements work as I
expect them to and the next two do not. Namely, the first two spit the
sentence into its component words, while the latter two return the
whole sentence entact.

import string
from string import whitespace
mytext = "The quick brown fox jumped over the lazy dog.\n"

print mytext.split()
print mytext.split(' ')
print mytext.split(whitespace)
print string.split(mytext, sep=whitespace)



Split does its work on literal strings, or if a separator is not 
specified, on a set of data, splits on arbitrary whitespace.


For an example, try

  s = "abcdefgbcdefgh"
  s.split("c") # ['ab', 'defgb', 'defgh']
  s.split("fgb") # ['abcde', 'cdefgh']


string.whitespace is a string, so split() tries to use split on 
the literal whitespace, not a set of whitespace.


-tkc





--
http://mail.python.org/mailman/listinfo/python-list


Re: split() and string.whitespace

2008-10-31 Thread Marc 'BlackJack' Rintsch
On Fri, 31 Oct 2008 11:53:30 -0700, Chaim Krause wrote:

> I am unable to figure out why the first two statements work as I expect
> them to and the next two do not. Namely, the first two spit the sentence
> into its component words, while the latter two return the whole sentence
> entact.
> 
> import string
> from string import whitespace
> mytext = "The quick brown fox jumped over the lazy dog.\n"
> 
> print mytext.split()
> print mytext.split(' ')

This splits at the string ' '.

> print mytext.split(whitespace)

This splits at the string '\t\n\x0b\x0c\r ' which doesn't occur in 
`mytext`.  The argument is a string not a set of characters.

> print string.split(mytext, sep=whitespace)

Same here.

Ciao,
Marc 'BlackJack' Rintsch

--
http://mail.python.org/mailman/listinfo/python-list


split() and string.whitespace

2008-10-31 Thread Chaim Krause
I am unable to figure out why the first two statements work as I
expect them to and the next two do not. Namely, the first two spit the
sentence into its component words, while the latter two return the
whole sentence entact.

import string
from string import whitespace
mytext = "The quick brown fox jumped over the lazy dog.\n"

print mytext.split()
print mytext.split(' ')
print mytext.split(whitespace)
print string.split(mytext, sep=whitespace)
--
http://mail.python.org/mailman/listinfo/python-list