Re: Remove empty strings from list

2009-09-16 Thread Bruno Desthuilliers

Sion Arrowsmith a écrit :

Bruno Desthuilliers   wrote:

mylist = line.strip().split()

will already do the RightThing(tm).


So will

mylist = line.split()

Yeps, it's at least the second time someone reminds me that the call to 
str.strip is just useless here... Pity my poor old neuron :(


--
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-15 Thread Rhodri James

On Tue, 15 Sep 2009 02:55:13 +0100, Chris Rebert  wrote:


On Mon, Sep 14, 2009 at 6:49 PM, Helvin  wrote:

Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:
   line = f.readline()
   line = line.lstrip() # take away whitespace at  
the beginning of the

readline.
   list = line.split(' ') # split the str line into  
a list


   # the list has empty strings in it, so now,
remove these empty strings

[snip]


Block quoting from http://effbot.org/zone/python-list.htm
"""
Note that the for-in statement maintains an internal index, which is
incremented for each loop iteration. This means that if you modify the
list you’re looping over, the indexes will get out of sync, and you
may end up skipping over items, or process the same item multiple
times.
"""

Thus why your code is skipping over some elements and not removing them.
Moral: Don't modify a list while iterating over it. Use the loop to
create a separate, new list from the old one instead.


In this case, your life would be improved by using

l = line.split()

instead of

l = line.split(' ')

and not getting the empty strings in the first place.

--
Rhodri James *-* Wildebeest Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-15 Thread Sion Arrowsmith
Bruno Desthuilliers   wrote:
> >> mylist = line.strip().split()
>
>will already do the RightThing(tm).

So will

mylist = line.split()

-- 
\S

   under construction

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-15 Thread Bruno Desthuilliers

Dennis Lee Bieber a écrit :
(snip)

All of which can be condensed into a simple

for ln in f:
wrds = ln.strip()
# do something with the words -- no whitespace to be seen



I assume you meant:
wrds = ln.strip().split()

?-)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-15 Thread Bruno Desthuilliers

Dave Angel a écrit :
(snip)


As Chris says, you're modifying the list while you're iterating through 
it, and that's undefined behavior.  Why not do the following?


mylist = line.strip().split(' ')
mylist = [item for item in mylist if item]


Mmmm... because the second line is plain useless when calling 
str.split() without a delimiter ?-)


>> mylist = line.strip().split()

will already do the RightThing(tm).

--
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-15 Thread Bruno Desthuilliers

Helvin a écrit :

Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:

line = f.readline()

  line = line.lstrip() # take away whitespace at the beginning of the
readline.


file.readline returns the line with the ending newline character (which 
is considered whitespace by the str.strip method), so you may want to 
use line.strip instead of line.lstrip



 list = line.split(' ')


Slightly OT but : don't use builtin types or functions names as 
identifiers - this shadows the builtin object.


Also, the default behaviour of str.split is to split on whitespaces and 
remove the delimiter. You would have better results not specifying the 
delimiters here:


>>> " a  a  a  a ".split(' ')
['', 'a', '', 'a', '', 'a', '', 'a', '']
>>> " a  a  a  a ".split()
['a', 'a', 'a', 'a']
>>>


# the list has empty strings in it, so now,
remove these empty strings


A problem you could have avoided right from the start !-)


 for item in list:
   if item is ' ':


Don't use identity comparison when you want to test for equality. It 
happens to kind of work in your above example but only because CPython 
implements a cache for _some_ small strings, but you should _never_ rely 
on such implementation details. A string containing accented characters 
would not have been cached:

>>> s = 'ééé'
>>> s is 'ééé'
False
>>>


Also, this is surely not your actual code : ' ' is not an empty string, 
it's a string with a single space character. The empty string is ''. And 
FWIW, empty strings (like most empty sequences and collections, all 
numerical zeros, and the None object) have a false value in a boolean 
context, so you can just test the string directly:


for s in ['', 0, 0.0, [], {}, (), None]:
   if not s:
  print "'%s' is empty, so it's false" % str(s)



print 'discard these: ',item
index = list.index(item)
del list[index] # remove this item from the list


And then you do have a big problem : the internal pointer used by the 
iterator is not in sync with the list anymore, so the next iteration 
will skip one item.


As general rule : *don't* add / remove elements to/from a sequence while 
iterating over it. If you really need to modify the sequence while 
iterating over it, do a reverse iteration - but there are usually better 
solutions.



   else:
print 'keep this: ',item
The problem is,


Make it a plural - there's more than 1 problem here !-)


when my list is :  ['44', '', '', '', '', '',
'0.0\n']
The output is:
len of list:  7
keep this:  44
discard these:
discard these:
discard these:
So finally the list is:   ['44', '', '', '0.0\n']
The code above removes all the empty strings in the middle, all except
two. My code seems to miss two of the empty strings.

Would you know why this is occuring?



cf above... and below:

>>> alist = ['44', '', '', '', '', '', '0.0']
>>> for i, it in enumerate(alist):
... print 'i : %s -  it : "%s"' % (i, it)
... if not it:
... del alist[idx]
... print "alist is now %s" % alist
...
i : 0 -  it : "44"
alist is now ['44', '', '', '', '', '', '0.0']
i : 1 -  it : ""
alist is now ['44', '', '', '', '', '0.0']
i : 2 -  it : ""
alist is now ['44', '', '', '', '0.0']
i : 3 -  it : ""
alist is now ['44', '', '', '0.0']
>>>


Ok, now for practical answers:

1/ in the above case, use line.strip().split(), you'll have no more 
problem !-)


2/ as a general rule, if you need to filter a sequence, don't try to do 
it in place (unless  it's a *very* big sequence and you run into memory 
problems but then there are probably better solutions).


The common idioms for filtering a sequence are:

* filter(predicate, sequence):

the 'predicate' param is callback function which takes an item from the 
sequence and returns a boolean value (True to keep the item, False to 
discard it). The following example will filter out even integers:


def is_odd(n):
   return n % 2

alist = range(10)
odds = filter(is_odd, alist)
print alist
print odds

Alternatively, filter() can take None as it's first param, in which case 
it will filter out items that have a false value in a boolean context, ie:


alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = filter(None, alist)
print result


* list comprehensions

Here you directly build the result list:

alist = range(10)
odds = [n for n in alist if n % 2]

alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = [item for item in alist if item]
print result



HTH
--
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-14 Thread Join hack
good solution ,thanks~!

2009/9/15 Steven D'Aprano 

> On Mon, 14 Sep 2009 18:55:13 -0700, Chris Rebert wrote:
>
> > On Mon, Sep 14, 2009 at 6:49 PM, Helvin  wrote:
> ...
> > > I have looked at documentation, and how strings and lists work, but I
> > > cannot understand the behaviour of the following:
> ...
> > >for item in list:
> > >if item is ' ':
> > >print 'discard these: ',item
> > >index = list.index(item)
> > >del list[index]
>
> ...
>
> > Moral: Don't modify a list while iterating over it. Use the loop to
> > create a separate, new list from the old one instead.
>
>
> This doesn't just apply to Python, it is good advice in every language
> I'm familiar with. At the very least, if you have to modify over a list
> in place and you are deleting or inserting items, work *backwards*:
>
> for i in xrange(len(alist), -1, -1):
>item = alist[i]
>if item == 'delete me':
>del alist[i]
>
>
> This is almost never the right solution in Python, but as a general
> technique, it works in all sorts of situations. (E.g. when varnishing a
> floor, don't start at the doorway and varnish towards the end of the
> room, because you'll be walking all over the fresh varnish. Do it the
> other way, starting at the end of the room, and work backwards towards
> the door.)
>
> In Python, the right solution is almost always to make a new copy of the
> list. Here are three ways to do that:
>
>
> newlist = []
> for item in alist:
>if item != 'delete me':
> newlist.append(item)
>
>
> newlist = [item for item in alist if item != 'delete me']
>
> newlist = filter(lambda item: item != 'delete me', alist)
>
>
>
> Once you have newlist, you can then rebind it to alist:
>
> alist = newlist
>
> or you can replace the contents of alist with the contents of newlist:
>
> alist[:] = newlist
>
>
> The two have a subtle difference in behavior that may not be apparent
> unless you have multiple names bound to alist.
>
>
>
> --
> Steven
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-14 Thread Steven D'Aprano
On Mon, 14 Sep 2009 18:55:13 -0700, Chris Rebert wrote:

> On Mon, Sep 14, 2009 at 6:49 PM, Helvin  wrote:
...
> > I have looked at documentation, and how strings and lists work, but I
> > cannot understand the behaviour of the following:
...
> >                        for item in list:
> >                                if item is ' ':
> >                                        print 'discard these: ',item
> >                                        index = list.index(item)
> >                                        del list[index]

...

> Moral: Don't modify a list while iterating over it. Use the loop to
> create a separate, new list from the old one instead.


This doesn't just apply to Python, it is good advice in every language 
I'm familiar with. At the very least, if you have to modify over a list 
in place and you are deleting or inserting items, work *backwards*:

for i in xrange(len(alist), -1, -1):
item = alist[i]
if item == 'delete me':
del alist[i]


This is almost never the right solution in Python, but as a general 
technique, it works in all sorts of situations. (E.g. when varnishing a 
floor, don't start at the doorway and varnish towards the end of the 
room, because you'll be walking all over the fresh varnish. Do it the 
other way, starting at the end of the room, and work backwards towards 
the door.)

In Python, the right solution is almost always to make a new copy of the 
list. Here are three ways to do that:


newlist = []
for item in alist:
if item != 'delete me':
 newlist.append(item)


newlist = [item for item in alist if item != 'delete me']

newlist = filter(lambda item: item != 'delete me', alist)



Once you have newlist, you can then rebind it to alist:

alist = newlist

or you can replace the contents of alist with the contents of newlist:

alist[:] = newlist


The two have a subtle difference in behavior that may not be apparent 
unless you have multiple names bound to alist.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-14 Thread Gabriel Genellina

En Mon, 14 Sep 2009 23:33:05 -0300, tec  escribió:


or use filter
list=filter(lambda x: len(x)>0, list)


For strings, len(x)>0 <=> len(x) <=> x, so the above statement is  
equivalent to:


list=filter(lambda x: x, list)

which according to the documentation is the same as:

list=filter(None, list)

which is the fastest variant AFAIK.

(Of course, it's even better to use the right split() call so there is no  
empty strings to filter out in the first place)


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-14 Thread Dave Angel

Helvin wrote:

Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:
line = f.readline()
line = line.lstrip() # take away whitespace at the 
beginning of the
readline.
list = line.split(' ') # split the str line into a list

# the list has empty strings in it, so now,
remove these empty strings
for item in list:
if item is ' ':
print 'discard these: ',item
index = list.index(item)
del list[index] # remove this 
item from the list
else:
print 'keep this: ',item
The problem is, when my list is :  ['44', '', '', '', '', '',
'0.0\n']
The output is:
len of list:  7
keep this:  44
discard these:
discard these:
discard these:
So finally the list is:   ['44', '', '', '0.0\n']
The code above removes all the empty strings in the middle, all except
two. My code seems to miss two of the empty strings.

Would you know why this is occuring?

Regards,
Helvin

  
(list already is a defined name, so you really should call it something 
else.



As Chris says, you're modifying the list while you're iterating through 
it, and that's undefined behavior.  Why not do the following?


mylist = line.strip().split(' ')
mylist = [item for item in mylist if item]

DaveA
--
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-14 Thread Helvin Lui
Thanks Chris! Thanks for the quick reply. Indeed this is the case! I have
now written out a new list, instead of modifying the list I am iterating
over.
Logged at my blog:
http://learnwithhelvin.blogspot.com/2009/09/python-loop-and-modify-list.html

Regards,
Helvin  =)

On Tue, Sep 15, 2009 at 1:55 PM, Chris Rebert  wrote:

> On Mon, Sep 14, 2009 at 6:49 PM, Helvin  wrote:
> > Hi,
> >
> > Sorry I did not want to bother the group, but I really do not
> > understand this seeming trivial problem.
> > I am reading from a textfile, where each line has 2 values, with
> > spaces before and between the values.
> > I would like to read in these values, but of course, I don't want the
> > whitespaces between them.
> > I have looked at documentation, and how strings and lists work, but I
> > cannot understand the behaviour of the following:
> >line = f.readline()
> >line = line.lstrip() # take away whitespace at the
> beginning of the
> > readline.
> >list = line.split(' ') # split the str line into a
> list
> >
> ># the list has empty strings in it, so now,
> > remove these empty strings
> >for item in list:
> >if item is ' ':
> >print 'discard these: ',item
> >index = list.index(item)
> >del list[index] # remove
> this item from the list
> >else:
> >print 'keep this: ',item
> > The problem is, when my list is :  ['44', '', '', '', '', '',
> > '0.0\n']
> > The output is:
> >len of list:  7
> >keep this:  44
> >discard these:
> >discard these:
> >discard these:
> > So finally the list is:   ['44', '', '', '0.0\n']
> > The code above removes all the empty strings in the middle, all except
> > two. My code seems to miss two of the empty strings.
> >
> > Would you know why this is occuring?
>
> Block quoting from http://effbot.org/zone/python-list.htm
> """
> Note that the for-in statement maintains an internal index, which is
> incremented for each loop iteration. This means that if you modify the
> list you’re looping over, the indexes will get out of sync, and you
> may end up skipping over items, or process the same item multiple
> times.
> """
>
> Thus why your code is skipping over some elements and not removing them.
> Moral: Don't modify a list while iterating over it. Use the loop to
> create a separate, new list from the old one instead.
>
> Cheers,
> Chris
> --
> http://blog.rebertia.com
>



-- 
Helvin

"Though the world may promise me more, I'm just made to be filled with the
Lord."
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-14 Thread tec

Helvin 写道:

Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:
line = f.readline()
line = line.lstrip() # take away whitespace at the 
beginning of the
readline.
list = line.split(' ') # split the str line into a list

# the list has empty strings in it, so now,
remove these empty strings
for item in list:
if item is ' ':
print 'discard these: ',item
index = list.index(item)
del list[index] # remove this 
item from the list
else:
print 'keep this: ',item
The problem is, when my list is :  ['44', '', '', '', '', '',
'0.0\n']
The output is:
len of list:  7
keep this:  44
discard these:
discard these:
discard these:
So finally the list is:   ['44', '', '', '0.0\n']
The code above removes all the empty strings in the middle, all except
two. My code seems to miss two of the empty strings.

Would you know why this is occuring?

Regards,
Helvin


You can use the default argument of split:
list = line.split()

From the python documentation,

"If the optional second argument sep is absent or None, the words are 
separated by arbitrary strings of whitespace characters (space, tab, 
newline, return, formfeed)."


So it is suitable for most cases without introduce empty strings.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-14 Thread tec

Chris Rebert 写道:

On Mon, Sep 14, 2009 at 6:49 PM, Helvin  wrote:

Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:
   line = f.readline()
   line = line.lstrip() # take away whitespace at the 
beginning of the
readline.
   list = line.split(' ') # split the str line into a list

   # the list has empty strings in it, so now,
remove these empty strings
   for item in list:
   if item is ' ':
   print 'discard these: ',item
   index = list.index(item)
   del list[index] # remove this 
item from the list
   else:
   print 'keep this: ',item
The problem is, when my list is :  ['44', '', '', '', '', '',
'0.0\n']
The output is:
   len of list:  7
   keep this:  44
   discard these:
   discard these:
   discard these:
So finally the list is:   ['44', '', '', '0.0\n']
The code above removes all the empty strings in the middle, all except
two. My code seems to miss two of the empty strings.

Would you know why this is occuring?


Block quoting from http://effbot.org/zone/python-list.htm
"""
Note that the for-in statement maintains an internal index, which is
incremented for each loop iteration. This means that if you modify the
list you’re looping over, the indexes will get out of sync, and you
may end up skipping over items, or process the same item multiple
times.
"""

Thus why your code is skipping over some elements and not removing them.
Moral: Don't modify a list while iterating over it. Use the loop to
create a separate, new list from the old one instead.


or use filter
list=filter(lambda x: len(x)>0, list)



Cheers,
Chris
--
http://blog.rebertia.com

--
http://mail.python.org/mailman/listinfo/python-list


Re: Remove empty strings from list

2009-09-14 Thread Chris Rebert
On Mon, Sep 14, 2009 at 6:49 PM, Helvin  wrote:
> Hi,
>
> Sorry I did not want to bother the group, but I really do not
> understand this seeming trivial problem.
> I am reading from a textfile, where each line has 2 values, with
> spaces before and between the values.
> I would like to read in these values, but of course, I don't want the
> whitespaces between them.
> I have looked at documentation, and how strings and lists work, but I
> cannot understand the behaviour of the following:
>                        line = f.readline()
>                        line = line.lstrip() # take away whitespace at the 
> beginning of the
> readline.
>                        list = line.split(' ') # split the str line into a list
>
>                        # the list has empty strings in it, so now,
> remove these empty strings
>                        for item in list:
>                                if item is ' ':
>                                        print 'discard these: ',item
>                                        index = list.index(item)
>                                        del list[index]         # remove this 
> item from the list
>                                else:
>                                        print 'keep this: ',item
> The problem is, when my list is :  ['44', '', '', '', '', '',
> '0.0\n']
> The output is:
>    len of list:  7
>    keep this:  44
>    discard these:
>    discard these:
>    discard these:
> So finally the list is:   ['44', '', '', '0.0\n']
> The code above removes all the empty strings in the middle, all except
> two. My code seems to miss two of the empty strings.
>
> Would you know why this is occuring?

Block quoting from http://effbot.org/zone/python-list.htm
"""
Note that the for-in statement maintains an internal index, which is
incremented for each loop iteration. This means that if you modify the
list you’re looping over, the indexes will get out of sync, and you
may end up skipping over items, or process the same item multiple
times.
"""

Thus why your code is skipping over some elements and not removing them.
Moral: Don't modify a list while iterating over it. Use the loop to
create a separate, new list from the old one instead.

Cheers,
Chris
--
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list