Re: Remove empty strings from list
Sion Arrowsmith a écrit : Bruno Desthuilliers wrote: mylist = line.strip().split() will already do the RightThing(tm). So will mylist = line.split() Yeps, it's at least the second time someone reminds me that the call to str.strip is just useless here... Pity my poor old neuron :( -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
On Tue, 15 Sep 2009 02:55:13 +0100, Chris Rebert wrote: On Mon, Sep 14, 2009 at 6:49 PM, Helvin wrote: Hi, Sorry I did not want to bother the group, but I really do not understand this seeming trivial problem. I am reading from a textfile, where each line has 2 values, with spaces before and between the values. I would like to read in these values, but of course, I don't want the whitespaces between them. I have looked at documentation, and how strings and lists work, but I cannot understand the behaviour of the following: line = f.readline() line = line.lstrip() # take away whitespace at the beginning of the readline. list = line.split(' ') # split the str line into a list # the list has empty strings in it, so now, remove these empty strings [snip] Block quoting from http://effbot.org/zone/python-list.htm """ Note that the for-in statement maintains an internal index, which is incremented for each loop iteration. This means that if you modify the list you’re looping over, the indexes will get out of sync, and you may end up skipping over items, or process the same item multiple times. """ Thus why your code is skipping over some elements and not removing them. Moral: Don't modify a list while iterating over it. Use the loop to create a separate, new list from the old one instead. In this case, your life would be improved by using l = line.split() instead of l = line.split(' ') and not getting the empty strings in the first place. -- Rhodri James *-* Wildebeest Herder to the Masses -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
Bruno Desthuilliers wrote: > >> mylist = line.strip().split() > >will already do the RightThing(tm). So will mylist = line.split() -- \S under construction -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
Dennis Lee Bieber a écrit : (snip) All of which can be condensed into a simple for ln in f: wrds = ln.strip() # do something with the words -- no whitespace to be seen I assume you meant: wrds = ln.strip().split() ?-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
Dave Angel a écrit : (snip) As Chris says, you're modifying the list while you're iterating through it, and that's undefined behavior. Why not do the following? mylist = line.strip().split(' ') mylist = [item for item in mylist if item] Mmmm... because the second line is plain useless when calling str.split() without a delimiter ?-) >> mylist = line.strip().split() will already do the RightThing(tm). -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
Helvin a écrit : Hi, Sorry I did not want to bother the group, but I really do not understand this seeming trivial problem. I am reading from a textfile, where each line has 2 values, with spaces before and between the values. I would like to read in these values, but of course, I don't want the whitespaces between them. I have looked at documentation, and how strings and lists work, but I cannot understand the behaviour of the following: line = f.readline() line = line.lstrip() # take away whitespace at the beginning of the readline. file.readline returns the line with the ending newline character (which is considered whitespace by the str.strip method), so you may want to use line.strip instead of line.lstrip list = line.split(' ') Slightly OT but : don't use builtin types or functions names as identifiers - this shadows the builtin object. Also, the default behaviour of str.split is to split on whitespaces and remove the delimiter. You would have better results not specifying the delimiters here: >>> " a a a a ".split(' ') ['', 'a', '', 'a', '', 'a', '', 'a', ''] >>> " a a a a ".split() ['a', 'a', 'a', 'a'] >>> # the list has empty strings in it, so now, remove these empty strings A problem you could have avoided right from the start !-) for item in list: if item is ' ': Don't use identity comparison when you want to test for equality. It happens to kind of work in your above example but only because CPython implements a cache for _some_ small strings, but you should _never_ rely on such implementation details. A string containing accented characters would not have been cached: >>> s = 'ééé' >>> s is 'ééé' False >>> Also, this is surely not your actual code : ' ' is not an empty string, it's a string with a single space character. The empty string is ''. And FWIW, empty strings (like most empty sequences and collections, all numerical zeros, and the None object) have a false value in a boolean context, so you can just test the string directly: for s in ['', 0, 0.0, [], {}, (), None]: if not s: print "'%s' is empty, so it's false" % str(s) print 'discard these: ',item index = list.index(item) del list[index] # remove this item from the list And then you do have a big problem : the internal pointer used by the iterator is not in sync with the list anymore, so the next iteration will skip one item. As general rule : *don't* add / remove elements to/from a sequence while iterating over it. If you really need to modify the sequence while iterating over it, do a reverse iteration - but there are usually better solutions. else: print 'keep this: ',item The problem is, Make it a plural - there's more than 1 problem here !-) when my list is : ['44', '', '', '', '', '', '0.0\n'] The output is: len of list: 7 keep this: 44 discard these: discard these: discard these: So finally the list is: ['44', '', '', '0.0\n'] The code above removes all the empty strings in the middle, all except two. My code seems to miss two of the empty strings. Would you know why this is occuring? cf above... and below: >>> alist = ['44', '', '', '', '', '', '0.0'] >>> for i, it in enumerate(alist): ... print 'i : %s - it : "%s"' % (i, it) ... if not it: ... del alist[idx] ... print "alist is now %s" % alist ... i : 0 - it : "44" alist is now ['44', '', '', '', '', '', '0.0'] i : 1 - it : "" alist is now ['44', '', '', '', '', '0.0'] i : 2 - it : "" alist is now ['44', '', '', '', '0.0'] i : 3 - it : "" alist is now ['44', '', '', '0.0'] >>> Ok, now for practical answers: 1/ in the above case, use line.strip().split(), you'll have no more problem !-) 2/ as a general rule, if you need to filter a sequence, don't try to do it in place (unless it's a *very* big sequence and you run into memory problems but then there are probably better solutions). The common idioms for filtering a sequence are: * filter(predicate, sequence): the 'predicate' param is callback function which takes an item from the sequence and returns a boolean value (True to keep the item, False to discard it). The following example will filter out even integers: def is_odd(n): return n % 2 alist = range(10) odds = filter(is_odd, alist) print alist print odds Alternatively, filter() can take None as it's first param, in which case it will filter out items that have a false value in a boolean context, ie: alist = ['', 'a', 0, 1, [], [1], None, object, False, True] result = filter(None, alist) print result * list comprehensions Here you directly build the result list: alist = range(10) odds = [n for n in alist if n % 2] alist = ['', 'a', 0, 1, [], [1], None, object, False, True] result = [item for item in alist if item] print result HTH -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
good solution ,thanks~! 2009/9/15 Steven D'Aprano > On Mon, 14 Sep 2009 18:55:13 -0700, Chris Rebert wrote: > > > On Mon, Sep 14, 2009 at 6:49 PM, Helvin wrote: > ... > > > I have looked at documentation, and how strings and lists work, but I > > > cannot understand the behaviour of the following: > ... > > >for item in list: > > >if item is ' ': > > >print 'discard these: ',item > > >index = list.index(item) > > >del list[index] > > ... > > > Moral: Don't modify a list while iterating over it. Use the loop to > > create a separate, new list from the old one instead. > > > This doesn't just apply to Python, it is good advice in every language > I'm familiar with. At the very least, if you have to modify over a list > in place and you are deleting or inserting items, work *backwards*: > > for i in xrange(len(alist), -1, -1): >item = alist[i] >if item == 'delete me': >del alist[i] > > > This is almost never the right solution in Python, but as a general > technique, it works in all sorts of situations. (E.g. when varnishing a > floor, don't start at the doorway and varnish towards the end of the > room, because you'll be walking all over the fresh varnish. Do it the > other way, starting at the end of the room, and work backwards towards > the door.) > > In Python, the right solution is almost always to make a new copy of the > list. Here are three ways to do that: > > > newlist = [] > for item in alist: >if item != 'delete me': > newlist.append(item) > > > newlist = [item for item in alist if item != 'delete me'] > > newlist = filter(lambda item: item != 'delete me', alist) > > > > Once you have newlist, you can then rebind it to alist: > > alist = newlist > > or you can replace the contents of alist with the contents of newlist: > > alist[:] = newlist > > > The two have a subtle difference in behavior that may not be apparent > unless you have multiple names bound to alist. > > > > -- > Steven > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
On Mon, 14 Sep 2009 18:55:13 -0700, Chris Rebert wrote: > On Mon, Sep 14, 2009 at 6:49 PM, Helvin wrote: ... > > I have looked at documentation, and how strings and lists work, but I > > cannot understand the behaviour of the following: ... > > for item in list: > > if item is ' ': > > print 'discard these: ',item > > index = list.index(item) > > del list[index] ... > Moral: Don't modify a list while iterating over it. Use the loop to > create a separate, new list from the old one instead. This doesn't just apply to Python, it is good advice in every language I'm familiar with. At the very least, if you have to modify over a list in place and you are deleting or inserting items, work *backwards*: for i in xrange(len(alist), -1, -1): item = alist[i] if item == 'delete me': del alist[i] This is almost never the right solution in Python, but as a general technique, it works in all sorts of situations. (E.g. when varnishing a floor, don't start at the doorway and varnish towards the end of the room, because you'll be walking all over the fresh varnish. Do it the other way, starting at the end of the room, and work backwards towards the door.) In Python, the right solution is almost always to make a new copy of the list. Here are three ways to do that: newlist = [] for item in alist: if item != 'delete me': newlist.append(item) newlist = [item for item in alist if item != 'delete me'] newlist = filter(lambda item: item != 'delete me', alist) Once you have newlist, you can then rebind it to alist: alist = newlist or you can replace the contents of alist with the contents of newlist: alist[:] = newlist The two have a subtle difference in behavior that may not be apparent unless you have multiple names bound to alist. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
En Mon, 14 Sep 2009 23:33:05 -0300, tec escribió: or use filter list=filter(lambda x: len(x)>0, list) For strings, len(x)>0 <=> len(x) <=> x, so the above statement is equivalent to: list=filter(lambda x: x, list) which according to the documentation is the same as: list=filter(None, list) which is the fastest variant AFAIK. (Of course, it's even better to use the right split() call so there is no empty strings to filter out in the first place) -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
Helvin wrote: Hi, Sorry I did not want to bother the group, but I really do not understand this seeming trivial problem. I am reading from a textfile, where each line has 2 values, with spaces before and between the values. I would like to read in these values, but of course, I don't want the whitespaces between them. I have looked at documentation, and how strings and lists work, but I cannot understand the behaviour of the following: line = f.readline() line = line.lstrip() # take away whitespace at the beginning of the readline. list = line.split(' ') # split the str line into a list # the list has empty strings in it, so now, remove these empty strings for item in list: if item is ' ': print 'discard these: ',item index = list.index(item) del list[index] # remove this item from the list else: print 'keep this: ',item The problem is, when my list is : ['44', '', '', '', '', '', '0.0\n'] The output is: len of list: 7 keep this: 44 discard these: discard these: discard these: So finally the list is: ['44', '', '', '0.0\n'] The code above removes all the empty strings in the middle, all except two. My code seems to miss two of the empty strings. Would you know why this is occuring? Regards, Helvin (list already is a defined name, so you really should call it something else. As Chris says, you're modifying the list while you're iterating through it, and that's undefined behavior. Why not do the following? mylist = line.strip().split(' ') mylist = [item for item in mylist if item] DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
Thanks Chris! Thanks for the quick reply. Indeed this is the case! I have now written out a new list, instead of modifying the list I am iterating over. Logged at my blog: http://learnwithhelvin.blogspot.com/2009/09/python-loop-and-modify-list.html Regards, Helvin =) On Tue, Sep 15, 2009 at 1:55 PM, Chris Rebert wrote: > On Mon, Sep 14, 2009 at 6:49 PM, Helvin wrote: > > Hi, > > > > Sorry I did not want to bother the group, but I really do not > > understand this seeming trivial problem. > > I am reading from a textfile, where each line has 2 values, with > > spaces before and between the values. > > I would like to read in these values, but of course, I don't want the > > whitespaces between them. > > I have looked at documentation, and how strings and lists work, but I > > cannot understand the behaviour of the following: > >line = f.readline() > >line = line.lstrip() # take away whitespace at the > beginning of the > > readline. > >list = line.split(' ') # split the str line into a > list > > > ># the list has empty strings in it, so now, > > remove these empty strings > >for item in list: > >if item is ' ': > >print 'discard these: ',item > >index = list.index(item) > >del list[index] # remove > this item from the list > >else: > >print 'keep this: ',item > > The problem is, when my list is : ['44', '', '', '', '', '', > > '0.0\n'] > > The output is: > >len of list: 7 > >keep this: 44 > >discard these: > >discard these: > >discard these: > > So finally the list is: ['44', '', '', '0.0\n'] > > The code above removes all the empty strings in the middle, all except > > two. My code seems to miss two of the empty strings. > > > > Would you know why this is occuring? > > Block quoting from http://effbot.org/zone/python-list.htm > """ > Note that the for-in statement maintains an internal index, which is > incremented for each loop iteration. This means that if you modify the > list you’re looping over, the indexes will get out of sync, and you > may end up skipping over items, or process the same item multiple > times. > """ > > Thus why your code is skipping over some elements and not removing them. > Moral: Don't modify a list while iterating over it. Use the loop to > create a separate, new list from the old one instead. > > Cheers, > Chris > -- > http://blog.rebertia.com > -- Helvin "Though the world may promise me more, I'm just made to be filled with the Lord." -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
Helvin 写道: Hi, Sorry I did not want to bother the group, but I really do not understand this seeming trivial problem. I am reading from a textfile, where each line has 2 values, with spaces before and between the values. I would like to read in these values, but of course, I don't want the whitespaces between them. I have looked at documentation, and how strings and lists work, but I cannot understand the behaviour of the following: line = f.readline() line = line.lstrip() # take away whitespace at the beginning of the readline. list = line.split(' ') # split the str line into a list # the list has empty strings in it, so now, remove these empty strings for item in list: if item is ' ': print 'discard these: ',item index = list.index(item) del list[index] # remove this item from the list else: print 'keep this: ',item The problem is, when my list is : ['44', '', '', '', '', '', '0.0\n'] The output is: len of list: 7 keep this: 44 discard these: discard these: discard these: So finally the list is: ['44', '', '', '0.0\n'] The code above removes all the empty strings in the middle, all except two. My code seems to miss two of the empty strings. Would you know why this is occuring? Regards, Helvin You can use the default argument of split: list = line.split() From the python documentation, "If the optional second argument sep is absent or None, the words are separated by arbitrary strings of whitespace characters (space, tab, newline, return, formfeed)." So it is suitable for most cases without introduce empty strings. -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
Chris Rebert 写道: On Mon, Sep 14, 2009 at 6:49 PM, Helvin wrote: Hi, Sorry I did not want to bother the group, but I really do not understand this seeming trivial problem. I am reading from a textfile, where each line has 2 values, with spaces before and between the values. I would like to read in these values, but of course, I don't want the whitespaces between them. I have looked at documentation, and how strings and lists work, but I cannot understand the behaviour of the following: line = f.readline() line = line.lstrip() # take away whitespace at the beginning of the readline. list = line.split(' ') # split the str line into a list # the list has empty strings in it, so now, remove these empty strings for item in list: if item is ' ': print 'discard these: ',item index = list.index(item) del list[index] # remove this item from the list else: print 'keep this: ',item The problem is, when my list is : ['44', '', '', '', '', '', '0.0\n'] The output is: len of list: 7 keep this: 44 discard these: discard these: discard these: So finally the list is: ['44', '', '', '0.0\n'] The code above removes all the empty strings in the middle, all except two. My code seems to miss two of the empty strings. Would you know why this is occuring? Block quoting from http://effbot.org/zone/python-list.htm """ Note that the for-in statement maintains an internal index, which is incremented for each loop iteration. This means that if you modify the list you’re looping over, the indexes will get out of sync, and you may end up skipping over items, or process the same item multiple times. """ Thus why your code is skipping over some elements and not removing them. Moral: Don't modify a list while iterating over it. Use the loop to create a separate, new list from the old one instead. or use filter list=filter(lambda x: len(x)>0, list) Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Remove empty strings from list
On Mon, Sep 14, 2009 at 6:49 PM, Helvin wrote: > Hi, > > Sorry I did not want to bother the group, but I really do not > understand this seeming trivial problem. > I am reading from a textfile, where each line has 2 values, with > spaces before and between the values. > I would like to read in these values, but of course, I don't want the > whitespaces between them. > I have looked at documentation, and how strings and lists work, but I > cannot understand the behaviour of the following: > line = f.readline() > line = line.lstrip() # take away whitespace at the > beginning of the > readline. > list = line.split(' ') # split the str line into a list > > # the list has empty strings in it, so now, > remove these empty strings > for item in list: > if item is ' ': > print 'discard these: ',item > index = list.index(item) > del list[index] # remove this > item from the list > else: > print 'keep this: ',item > The problem is, when my list is : ['44', '', '', '', '', '', > '0.0\n'] > The output is: > len of list: 7 > keep this: 44 > discard these: > discard these: > discard these: > So finally the list is: ['44', '', '', '0.0\n'] > The code above removes all the empty strings in the middle, all except > two. My code seems to miss two of the empty strings. > > Would you know why this is occuring? Block quoting from http://effbot.org/zone/python-list.htm """ Note that the for-in statement maintains an internal index, which is incremented for each loop iteration. This means that if you modify the list you’re looping over, the indexes will get out of sync, and you may end up skipping over items, or process the same item multiple times. """ Thus why your code is skipping over some elements and not removing them. Moral: Don't modify a list while iterating over it. Use the loop to create a separate, new list from the old one instead. Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list