subject:"Sequence splitting"

Re: Sequence splitting

2009-07-03 Thread MRAB


Mel wrote:

Steven D'Aprano wrote:


The most important difference between my suggestion and that of the OP is
that he limited the key function to something which returns a truth
value, while I'm looking for something more general which can split the
input into an arbitrary number of collated sublists.


Generally along the lines of

def scatter (seq, criterion):
lists = {}
for s in seq:
lists.setdefault (criterion (s), []).append (s)
return lists

modulo a defaultdict or some such ?


Somehow this reminds me of the new Counter class, except that it's
making lists instead of counts.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Paul Rubin

Terry Reedy  writes:
> > This isn't so attractive, since filter takes a sequence input but
> > returns a list.
> 
> In modern Python (py3), it returns an iterator.

Still not much help in the proposed version that would return two
iterators.  
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Terry Reedy


Paul Rubin wrote:

Brad  writes:

Maybe this would be difficult to get into the core, but how about this
idea: Rename the current filter function to something like "split" or
"partition" (which I agree may be a better name) and modify it to
return the desired true and false sequences. Then recreate the
existing "filter" function with a wrapper that throws away the false
sequence.


This isn't so attractive, since filter takes a sequence input but
returns a list. 


In modern Python (py3), it returns an iterator.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Terry Reedy


Brad wrote:

On Jul 2, 8:17 pm, "Pablo Torres N."  wrote:

This sounds like it belongs to the python-ideas list.  I suggest
posting there for better feedback, since the core developers check
that list more often than this one.


I tried posting on python-ideas and received a "You are not allowed to
post to this mailing list" reply. Perhaps because I am posting through
Google groups? Or maybe one must be an approved member to post?


Spammers post thru Google groups

Either subscribe or access via news.gmane.org as gmane.comp.python.ideas.

As to your main question: this was discuss some years ago. There did not 
seem enough use cases for 'bifilter' or 'distributor' to warrent 
inclusion in the stdlib. Easy enough to do on one's own for those few 
cases. Just filter twice.


Usually, one wants to do one thing or another to each item with the 
original scan


for item in collection:
  if pred(item):
proc_true(item)
  else:
proc_false(item)

Having proc_true and proc_false both be append(item) is a special case.

Note that filter has been changed from returning a list to returning an 
iterator.  So a proposal for a function to return two lists would seem 
backwards. A function that returns two iterators would be possible. 
Something like itertools.tee except that each item would go on one tee 
or the other instead of both. It would have two queues (deques) instead 
of one. And it should be lazy -- items should only be pulled from the 
original iterator when an item is requested from an empty queue.


I am not sure it is worth it. In the worst case, all items are say 
'True' and you request 'False' items and all items get copied to the 
True queue before the False iterator raised StopIteration.


The way to avoid calling keyfunc(item) twice on each item is to make a 
new list first: newlist = list(map(keyfunc,iterable)). Then scan newlist 
twice.


If you write such a function, first in Python, then C, you can register 
code on PyPI and see how much interest it gets.


Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Mel

Steven D'Aprano wrote:

> The most important difference between my suggestion and that of the OP is
> that he limited the key function to something which returns a truth
> value, while I'm looking for something more general which can split the
> input into an arbitrary number of collated sublists.

Generally along the lines of

def scatter (seq, criterion):
lists = {}
for s in seq:
lists.setdefault (criterion (s), []).append (s)
return lists

modulo a defaultdict or some such ?

Mel.
> 
> 
> 


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Scott David Daniels


Steven D'Aprano wrote:
I've never needed such a split function, and I don't like the name, and 
the functionality isn't general enough. I'd prefer something which splits 
the input sequence into as many sublists as necessary, according to the 
output of the key function. Something like itertools.groupby(), except it 
runs through the entire sequence and collates all the elements with 
identical keys.


splitby(range(10), lambda n: n%3)
=> [ (0, [0, 3, 6, 9]),
 (1, [1, 4, 7]), 
 (2, [2, 5, 8]) ]


Your split() would be nearly equivalent to this with a key function that 
returns a Boolean.


Well, here is my go at doing the original with iterators:

def splitter(source, test=bool):
a, b = itertools.tee((x, test(x)) for x in source)
return (data for data, decision in a if decision), (
data for data, decision in b if not decision)

This has the advantage that it can operate on infinite lists.  For
something like splitby for grouping, I seem to need to know the cases
up front:

def _make_gen(particular, src):
 return (x for x, c in src if c == particular)

def splitby(source, cases, case):
'''Produce a dict of generators for case(el) for el in source'''
decided = itertools.tee(((x, case(x)) for x in source), len(cases))
return dict((c, _make_gen(c, src))
for c, src in zip(cases, decided))

example:

def classify(n):
'''Least prime factor of a few'''
for prime in [2, 3, 5, 7]:
if n % prime == 0:
return prime
return 0

for k,g in splitby(range(50), (2, 3, 5, 7, 0), classify).items():
print('%s: %s' % (k, list(g)))

0: [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
2: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]
3: [3, 9, 15, 21, 27, 33, 39, 45]
5: [5, 25, 35]
7: [7, 49]

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Lie Ryan

Brad wrote:
> On Jul 3, 12:57 am, Steven D'Aprano  cybersource.com.au> wrote:
>> I've never needed such a split function, and I don't like the name, and
>> the functionality isn't general enough. I'd prefer something which splits
>> the input sequence into as many sublists as necessary, according to the
>> output of the key function.
> 
> That's not a bad idea, I'll have to experiment with the alternatives.
> My thought process for this, however, was that filter itself already
> splits the sequence and it would have been more useful had it not
> thrown away "half" of what it discovers. It could have been written to
> returned two sequences with very litter perf hit for all but very
> large input sequences, and been useful in more situations. What I
> *really* wanted was a way to make filter itself more useful, since it
> seems a bit silly to have two very similar functions.

It's not that easy. filter is nearly by definition a lazy function. The
various split/group functions is impossible to be made an efficient
iterator, since they must traverse the whole list before being sure that
they have finished the group/part of the split (or even to be sure that
they have all the group keys).

> Maybe this would be difficult to get into the core, but how about this
> idea: Rename the current filter function to something like "split" or
> "partition" (which I agree may be a better name) and modify it to
> return the desired true and false sequences. Then recreate the
> existing "filter" function with a wrapper that throws away the false
> sequence.

filter has a very deep root in functional programming, I doubt it will
change any time soon. Also, consider infinite iterators. With the
"filter as wrapper to partition" scheme, it is impossible for
split/group/partition to use infinite iterator.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Paul Rubin

Brad  writes:
> Maybe this would be difficult to get into the core, but how about this
> idea: Rename the current filter function to something like "split" or
> "partition" (which I agree may be a better name) and modify it to
> return the desired true and false sequences. Then recreate the
> existing "filter" function with a wrapper that throws away the false
> sequence.

This isn't so attractive, since filter takes a sequence input but
returns a list.  So if I have an iterator that produces a billion
elements of which I expect three to satisfy some predicate, then
   xs = filter(func, seq)
as currently implemented will build a 3-element list and return it.  
Under your suggestion, it would also build and throw away an (almost)
billion element list.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Brad

On Jul 3, 12:57 am, Steven D'Aprano  wrote:
> I've never needed such a split function, and I don't like the name, and
> the functionality isn't general enough. I'd prefer something which splits
> the input sequence into as many sublists as necessary, according to the
> output of the key function.

That's not a bad idea, I'll have to experiment with the alternatives.
My thought process for this, however, was that filter itself already
splits the sequence and it would have been more useful had it not
thrown away "half" of what it discovers. It could have been written to
returned two sequences with very litter perf hit for all but very
large input sequences, and been useful in more situations. What I
*really* wanted was a way to make filter itself more useful, since it
seems a bit silly to have two very similar functions.

Maybe this would be difficult to get into the core, but how about this
idea: Rename the current filter function to something like "split" or
"partition" (which I agree may be a better name) and modify it to
return the desired true and false sequences. Then recreate the
existing "filter" function with a wrapper that throws away the false
sequence.

Here are two simplified re-creations of situations where I could have
used partition (aka split):

words = ['this', 'is', 'a', 'bunch', 'of', 'words']
short, long = partition(words, lambda w: len(w) < 3)

d = {1 : 'w', 2 : 'x' ,3 : 'y' ,4 : 'z'}
keys = [1, 3, 4, 9]
found, missing = partition(keys, d.has_key)

There are probably a dozen other approaches, but the existing "filter"
is fast, clear, and *almost* good enough. So when is this useful in
general: Whenever filter itself is useful, but you want to use both
sides of the partitioning work it already does.

-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Lie Ryan

tsangpo wrote:
> Just a shorter implementation:
> 
> from itertools import groupby
> def split(lst, func):
> gs = groupby(lst, func)
> return list(gs[True]), list(gs[False])
> 

As you're replying to my post, I assume you meant a shorter
implementation my function. But it doesn't do the same thing. The idea
with my group() is similar to what Steven D'Aprano is describing in
another branch of this thread (i.e. splitting not only based on True and
False, but arbitrary groupings, e.g. 'tru', 'flash' or perhaps -1, 0, 1).

For example:
>>> def group(seq, func=bool):
... ret = {}
... for item in seq:
... fitem = func(item)
... try:
... ret[fitem].append(item)
... except KeyError:
... ret[fitem] = [item]
... return ret
...
>>> group(range(10), lambda x: x%3)
{0: [0, 3, 6, 9], 1: [1, 4, 7], 2: [2, 5, 8]}
>>> # the next one is the OP's split
>>> group(['true', '', [], [''], 'false'], bool)
{False: ['', []], True: ['true', [''], 'false']}
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Pablo Torres N.

On Fri, Jul 3, 2009 at 06:01, Paul Moore wrote:
> 2009/7/3 Brad :
>> Perhaps true, but it would be a nice convenience (for me) as a built-
>> in written in either Python or C. Although the default case of a bool
>> function would surely be faster.
>
> The chance of getting this accepted as a builtin is essentially zero.
> To be a builtin, as opposed to being in the standard library,
> something has to have a very strong justification.

That's right.  Mr. schickb, I think what you need is a few concrete
examples as to where this function would be beneficial, so it can be
judged objectively.

-- 
Pablo Torres N.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Paul Moore

2009/7/3 Brad :
> Perhaps true, but it would be a nice convenience (for me) as a built-
> in written in either Python or C. Although the default case of a bool
> function would surely be faster.

The chance of getting this accepted as a builtin is essentially zero.
To be a builtin, as opposed to being in the standard library,
something has to have a very strong justification.

This suggestion may find a home in the standard library, although it's
not entirely clear where (maybe itertools, although it's not entirely
obvious that it's a fit there).

You'd have to justify this against the argument "not every 2-3 line
function needs to be built in". Personally, I'm not sure it passes
that test - sure, it's a useful function, but it's not that hard to
write when you need it. It may be better as a recipe in the cookbook.
Or if it's close enough to the spirit of the itertools, it may be
suitable as a sample in the itertools documentation (the "recipes"
section).

Paul.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Steven D'Aprano

On Fri, 03 Jul 2009 01:39:27 -0700, Paul Rubin wrote:

> Steven D'Aprano  writes:
>> groupby() works on lists.
> 
 a = [1,3,4,6,7]
 from itertools import groupby
 b = groupby(a, lambda x: x%2==1)  # split into even and odd 
 c = list(b)
 print len(c)
> 3
 d = list(c[1][1])# should be [4,6] print d  # oops.
> []

I didn't say it worked properly *wink*

Seriously, this behaviour caught me out too. The problem isn't that the 
input data is a list, the same problem occurs for arbitrary iterators. 
From the docs:

[quote]
The operation of groupby() is similar to the uniq filter in Unix. It 
generates a break or new group every time the value of the key function 
changes (which is why it is usually necessary to have sorted the data 
using the same key function). That behavior differs from SQL’s GROUP BY 
which aggregates common elements regardless of their input order.

The returned group is itself an iterator that shares the underlying 
iterable with groupby(). Because the source is shared, when the groupby() 
object is advanced, the previous group is no longer visible. So, if that 
data is needed later, it should be stored as a list
[end quote]

http://www.python.org/doc/2.6/library/itertools.html#itertools.groupby

-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread tsangpo

Just a shorter implementation:

from itertools import groupby
def split(lst, func):
gs = groupby(lst, func)
return list(gs[True]), list(gs[False])


"Lie Ryan"  
Ð´ÈëÏûÏ¢ÐÂÎÅ:nfi3m.2341$ze1.1...@news-server.bigpond.net.au...
> Brad wrote:
>> On Jul 2, 9:40 pm, "Pablo Torres N."  wrote:
>>> If it is speed that we are after, it's my understanding that map and
>>> filter are faster than iterating with the for statement (and also
>>> faster than list comprehensions).  So here is a rewrite:
>>>
>>> def split(seq, func=bool):
>>> t = filter(func, seq)
>>> f = filter(lambda x: not func(x), seq)
>>> return list(t), list(f)
>>>
>>
>> In my simple tests, that takes 1.8x as long as the original solution.
>> Better than the itertools solution, when "func" is short and fast. I
>> think the solution here would worse if func was more complex.
>>
>> Either way, what I am still wondering is if people would find a built-
>> in implementation useful?
>>
>> -Brad
>
> A built-in/itertools should always try to provide the general solution
> to be as useful as possible, something like this:
>
> def group(seq, func=bool):
>ret = {}
>for item in seq:
>fitem = func(item)
>try:
>ret[fitem].append(item)
>except KeyError:
>ret[fitem] = [item]
>return ret
>
> definitely won't be faster, but it is a much more general solution.
> Basically, the function allows you to group sequences based on the
> result of func(item). It is similar to itertools.groupby() except that
> this also group non-contiguous items. 


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Paul Rubin

Steven D'Aprano  writes:
> groupby() works on lists.

>>> a = [1,3,4,6,7]
>>> from itertools import groupby
>>> b = groupby(a, lambda x: x%2==1)  # split into even and odd
>>> c = list(b)
>>> print len(c)
3
>>> d = list(c[1][1])# should be [4,6]
>>> print d  # oops.
[]

> The difference between what I'm suggesting and what groupby() does is 
> that my suggestion would collate *all* the elements with the same key, 
> not just runs of them. This (as far as I can tell) requires returning 
> lists rather than iterators.

I guess that is reasonable.

> The most important difference between my suggestion and that of the OP is 
> that he limited the key function to something which returns a truth 
> value, while I'm looking for something more general which can split the 
> input into an arbitrary number of collated sublists.

Also ok.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Steven D'Aprano

On Fri, 03 Jul 2009 01:02:56 -0700, Paul Rubin wrote:

> Steven D'Aprano  writes:
>> I've never needed such a split function, and I don't like the name, and
>> the functionality isn't general enough. I'd prefer something which
>> splits the input sequence into as many sublists as necessary, according
>> to the output of the key function. Something like itertools.groupby(),
>> except it runs through the entire sequence and collates all the
>> elements with identical keys.
> 
> No really, groupby makes iterators, not lists, and it you have to
> develop quite a delicate sense of when you can use it without having
> bugs caused by the different iterators it makes getting advanced at the
> wrong times.  The concept of a split function that actually works on
> lists is useful.  I'm neutral about whether it's worth having a C
> version in the stdlib.

groupby() works on lists.

The difference between what I'm suggesting and what groupby() does is 
that my suggestion would collate *all* the elements with the same key, 
not just runs of them. This (as far as I can tell) requires returning 
lists rather than iterators.

The most important difference between my suggestion and that of the OP is 
that he limited the key function to something which returns a truth 
value, while I'm looking for something more general which can split the 
input into an arbitrary number of collated sublists.

-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Chris Rebert

On Thu, Jul 2, 2009 at 11:31 PM, Brad wrote:
> On Jul 2, 9:40 pm, "Pablo Torres N."  wrote:
>>
>> If it is speed that we are after, it's my understanding that map and
>> filter are faster than iterating with the for statement (and also
>> faster than list comprehensions).  So here is a rewrite:
>>
>> def split(seq, func=bool):
>>         t = filter(func, seq)
>>         f = filter(lambda x: not func(x), seq)
>>         return list(t), list(f)
>>
>
> In my simple tests, that takes 1.8x as long as the original solution.
> Better than the itertools solution, when "func" is short and fast. I
> think the solution here would worse if func was more complex.
>
> Either way, what I am still wondering is if people would find a built-
> in implementation useful?

FWIW, Ruby has Enumerable#partition, which does the same thing as
split() and has a better name IMHO.
http://www.ruby-doc.org/core/classes/Enumerable.html#M003130

Cheers,
Chris
-- 
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Paul Rubin

Steven D'Aprano  writes:
> I've never needed such a split function, and I don't like the name, and 
> the functionality isn't general enough. I'd prefer something which splits 
> the input sequence into as many sublists as necessary, according to the 
> output of the key function. Something like itertools.groupby(), except it 
> runs through the entire sequence and collates all the elements with 
> identical keys.

No really, groupby makes iterators, not lists, and it you have to
develop quite a delicate sense of when you can use it without having
bugs caused by the different iterators it makes getting advanced at
the wrong times.  The concept of a split function that actually works
on lists is useful.  I'm neutral about whether it's worth having a C
version in the stdlib.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Steven D'Aprano

On Thu, 02 Jul 2009 22:10:14 -0500, Pablo Torres N. wrote:

> This sounds like it belongs to the python-ideas list.  I suggest posting
> there for better feedback, since the core developers check that list
> more often than this one.

If you post to python-ideas, you'll probably be told to gather feedback 
here first. The core-developers aren't hugely interested in arbitrary new 
features unless they have significant community support.

I've never needed such a split function, and I don't like the name, and 
the functionality isn't general enough. I'd prefer something which splits 
the input sequence into as many sublists as necessary, according to the 
output of the key function. Something like itertools.groupby(), except it 
runs through the entire sequence and collates all the elements with 
identical keys.

E.g.:

splitby(range(10), lambda n: n%3)
=> [ (0, [0, 3, 6, 9]),
 (1, [1, 4, 7]), 
 (2, [2, 5, 8]) ]

Your split() would be nearly equivalent to this with a key function that 
returns a Boolean.

-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Lie Ryan

Rickard Lindberg wrote:
>> I tried posting on python-ideas and received a "You are not allowed to
>> post to this mailing list" reply. Perhaps because I am posting through
>> Google groups? Or maybe one must be an approved member to post?
> 
> If you got an "awaiting moderator approval" message you post might appear on
> the list soon. The reason for getting those can be that it is a member only
> list and you posted from another address. I am not sure if that was the 
> message
> you got.
> 

AFAIK, python-ideas is not moderated (I followed python-ideas). I've
never used Google Groups to access it though. Try subscribing to the
mailing list directly (instead of using Google Group's web-gateway)
here: http://mail.python.org/mailman/listinfo/python-ideas
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-03 Thread Lie Ryan

Brad wrote:
> On Jul 2, 9:40 pm, "Pablo Torres N."  wrote:
>> If it is speed that we are after, it's my understanding that map and
>> filter are faster than iterating with the for statement (and also
>> faster than list comprehensions).  So here is a rewrite:
>>
>> def split(seq, func=bool):
>> t = filter(func, seq)
>> f = filter(lambda x: not func(x), seq)
>> return list(t), list(f)
>>
> 
> In my simple tests, that takes 1.8x as long as the original solution.
> Better than the itertools solution, when "func" is short and fast. I
> think the solution here would worse if func was more complex.
> 
> Either way, what I am still wondering is if people would find a built-
> in implementation useful?
> 
> -Brad

A built-in/itertools should always try to provide the general solution
to be as useful as possible, something like this:

def group(seq, func=bool):
ret = {}
for item in seq:
fitem = func(item)
try:
ret[fitem].append(item)
except KeyError:
ret[fitem] = [item]
return ret

definitely won't be faster, but it is a much more general solution.
Basically, the function allows you to group sequences based on the
result of func(item). It is similar to itertools.groupby() except that
this also group non-contiguous items.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Rickard Lindberg

> I tried posting on python-ideas and received a "You are not allowed to
> post to this mailing list" reply. Perhaps because I am posting through
> Google groups? Or maybe one must be an approved member to post?

If you got an "awaiting moderator approval" message you post might appear on
the list soon. The reason for getting those can be that it is a member only
list and you posted from another address. I am not sure if that was the message
you got.

-- 
Rickard Lindberg
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Brad

On Jul 2, 8:17 pm, "Pablo Torres N."  wrote:
>
> This sounds like it belongs to the python-ideas list.  I suggest
> posting there for better feedback, since the core developers check
> that list more often than this one.

I tried posting on python-ideas and received a "You are not allowed to
post to this mailing list" reply. Perhaps because I am posting through
Google groups? Or maybe one must be an approved member to post?

-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Brad

On Jul 2, 9:40 pm, "Pablo Torres N."  wrote:
>
> If it is speed that we are after, it's my understanding that map and
> filter are faster than iterating with the for statement (and also
> faster than list comprehensions).  So here is a rewrite:
>
> def split(seq, func=bool):
>         t = filter(func, seq)
>         f = filter(lambda x: not func(x), seq)
>         return list(t), list(f)
>

In my simple tests, that takes 1.8x as long as the original solution.
Better than the itertools solution, when "func" is short and fast. I
think the solution here would worse if func was more complex.

Either way, what I am still wondering is if people would find a built-
in implementation useful?

-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Gabriel Genellina


En Fri, 03 Jul 2009 01:58:22 -0300, > escribió:


"Pablo Torres N."  writes:

def split(seq, func=bool):
t = filter(func, seq)
f = filter(lambda x: not func(x), seq)
return list(t), list(f)


That is icky--you're calling func (which might be slow) twice instead
of once on every element of the seq.


In addition, this doesn't work if seq is an iterator instead of a sequence.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Paul Rubin

"Pablo Torres N."  writes:
> def split(seq, func=bool):
>   t = filter(func, seq)
>   f = filter(lambda x: not func(x), seq)
>   return list(t), list(f)

That is icky--you're calling func (which might be slow) twice instead
of once on every element of the seq.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Pablo Torres N.

On Thu, Jul 2, 2009 at 23:34, Brad wrote:
> On Jul 2, 9:08 pm, Paul Rubin  wrote:
>> Brad  writes:
>> > On Jul 2, 8:14 pm, Paul Rubin  wrote:
>> > > schickb  writes:
>> > > > def split(seq, func=None):
>> > > >     if func is None:
>> > > >         func = bool
>> > > >     t, f = [], []
>> > > >     for item in seq:
>> > > >         if func(item):
>> > > >             t.append(item)
>> > > >         else:
>> > > >             f.append(item)
>> > > >     return (t, f)
>>
>> > > untested:
>>
>> > >    def split(seq, func=bool):
>> > >       xs = zip(seq, itertools.imap(func, seq))
>> > >       t = list(x for (x,y) in xs if y)
>> > >       f = list(x for (x,y) in xs if not y)
>> > >       return (t, f)
>>
>> > In my testing that is 3.5x slower than the original solution (and less
>> > clear imo). I fixed my version to take a bool default. Either way, I'm
>> > not really looking for additional ways to do this in Python unless
>> > I've totally missed something. What I am considering is writing it in
>> > C, much like filter.
>>
>> I'm a little skeptical that the C version will help much, if it's
>> evaluating a python function at every list element.
>
> Perhaps true, but it would be a nice convenience (for me) as a built-
> in written in either Python or C. Although the default case of a bool
> function would surely be faster.
>
>> Here's a variant of your version:
>>
>>  def split(seq, func=bool):
>>      t, f = [], []
>>      ta, fa = t.append, f.append
>>      for item in seq:
>>          (ta if func(item) else fa)(item)
>>      return (t, f)
>>
>> This avoids some dict lookups and copying.  I wonder if that helps
>> significantly.
>
> Faster, but in tests of a few short sequences only 1% so.
>
> -Brad
> --
> http://mail.python.org/mailman/listinfo/python-list
>

If it is speed that we are after, it's my understanding that map and
filter are faster than iterating with the for statement (and also
faster than list comprehensions).  So here is a rewrite:

def split(seq, func=bool):
t = filter(func, seq)
f = filter(lambda x: not func(x), seq)
return list(t), list(f)

The lambda thing is kinda ugly, but I can't think of anything else.
Also, is it ok to return lists?  Py3k saw a lot of APIs changed to
return iterables instead of lists, so maybe my function should have
'return t, f' as it's last statement.


-- 
Pablo Torres N.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Brad

On Jul 2, 9:08 pm, Paul Rubin  wrote:
> Brad  writes:
> > On Jul 2, 8:14 pm, Paul Rubin  wrote:
> > > schickb  writes:
> > > > def split(seq, func=None):
> > > >     if func is None:
> > > >         func = bool
> > > >     t, f = [], []
> > > >     for item in seq:
> > > >         if func(item):
> > > >             t.append(item)
> > > >         else:
> > > >             f.append(item)
> > > >     return (t, f)
>
> > > untested:
>
> > >    def split(seq, func=bool):
> > >       xs = zip(seq, itertools.imap(func, seq))
> > >       t = list(x for (x,y) in xs if y)
> > >       f = list(x for (x,y) in xs if not y)
> > >       return (t, f)
>
> > In my testing that is 3.5x slower than the original solution (and less
> > clear imo). I fixed my version to take a bool default. Either way, I'm
> > not really looking for additional ways to do this in Python unless
> > I've totally missed something. What I am considering is writing it in
> > C, much like filter.
>
> I'm a little skeptical that the C version will help much, if it's
> evaluating a python function at every list element.  

Perhaps true, but it would be a nice convenience (for me) as a built-
in written in either Python or C. Although the default case of a bool
function would surely be faster.

> Here's a variant of your version:
>
>  def split(seq, func=bool):
>      t, f = [], []
>      ta, fa = t.append, f.append
>      for item in seq:
>          (ta if func(item) else fa)(item)
>      return (t, f)
>
> This avoids some dict lookups and copying.  I wonder if that helps
> significantly.

Faster, but in tests of a few short sequences only 1% so.

-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Paul Rubin

Brad  writes:

> On Jul 2, 8:14 pm, Paul Rubin  wrote:
> > schickb  writes:
> > > def split(seq, func=None):
> > >     if func is None:
> > >         func = bool
> > >     t, f = [], []
> > >     for item in seq:
> > >         if func(item):
> > >             t.append(item)
> > >         else:
> > >             f.append(item)
> > >     return (t, f)
> >
> > untested:
> >
> >    def split(seq, func=bool):
> >       xs = zip(seq, itertools.imap(func, seq))
> >       t = list(x for (x,y) in xs if y)
> >       f = list(x for (x,y) in xs if not y)
> >       return (t, f)
> 
> In my testing that is 3.5x slower than the original solution (and less
> clear imo). I fixed my version to take a bool default. Either way, I'm
> not really looking for additional ways to do this in Python unless
> I've totally missed something. What I am considering is writing it in
> C, much like filter.

I'm a little skeptical that the C version will help much, if it's
evaluating a python function at every list element.  Here's a variant
of your version:

 def split(seq, func=bool):
     t, f = [], []
 ta, fa = t.append, f.append
     for item in seq:
 (ta if func(item) else fa)(item)
     return (t, f)

This avoids some dict lookups and copying.  I wonder if that helps
significantly.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Brad

On Jul 2, 8:17 pm, "Pablo Torres N."  wrote:
> On Jul 2, 9:56 pm, schickb  wrote:
>
> > I have fairly often found the need to split a sequence into two groups
> > based on a function result.
>
> This sounds like it belongs to the python-ideas list.  I suggest
> posting there for better feedback, since the core developers check
> that list more often than this one.

Thanks, I didn't know about that list.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Brad

On Jul 2, 8:14 pm, Paul Rubin  wrote:
> schickb  writes:
> > def split(seq, func=None):
> >     if func is None:
> >         func = bool
> >     t, f = [], []
> >     for item in seq:
> >         if func(item):
> >             t.append(item)
> >         else:
> >             f.append(item)
> >     return (t, f)
>
> untested:
>
>    def split(seq, func=bool):
>       xs = zip(seq, itertools.imap(func, seq))
>       t = list(x for (x,y) in xs if y)
>       f = list(x for (x,y) in xs if not y)
>       return (t, f)

In my testing that is 3.5x slower than the original solution (and less
clear imo). I fixed my version to take a bool default. Either way, I'm
not really looking for additional ways to do this in Python unless
I've totally missed something. What I am considering is writing it in
C, much like filter.

-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Pablo Torres N.

On Jul 2, 9:56 pm, schickb  wrote:
> I have fairly often found the need to split a sequence into two groups
> based on a function result. Much like the existing filter function,
> but returning a tuple of true, false sequences. In Python, something
> like:
>
> def split(seq, func=None):
>     if func is None:
>         func = bool
>     t, f = [], []
>     for item in seq:
>         if func(item):
>             t.append(item)
>         else:
>             f.append(item)
>     return (t, f)
>
> The discussion linked to below has various approaches for doing this
> now, but most traverse the sequence twice and many don't apply a
> function to spit the 
> sequence.http://stackoverflow.com/questions/949098/python-split-a-list-based-o...
>
> Is there any interest in a C implementation of this? Seems too trivial
> to write a PEP, so I'm just trying to measure interest before diving
> in. This wouldn't really belong in intertool. Would it be best
> implemented as a top level built-in?
>
> -Brad

This sounds like it belongs to the python-ideas list.  I suggest
posting there for better feedback, since the core developers check
that list more often than this one.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Pablo Torres N.

On Thu, Jul 2, 2009 at 21:56, schickb wrote:
> I have fairly often found the need to split a sequence into two groups
> based on a function result. Much like the existing filter function,
> but returning a tuple of true, false sequences. In Python, something
> like:
>
> def split(seq, func=None):
>    if func is None:
>        func = bool
>    t, f = [], []
>    for item in seq:
>        if func(item):
>            t.append(item)
>        else:
>            f.append(item)
>    return (t, f)
>
> The discussion linked to below has various approaches for doing this
> now, but most traverse the sequence twice and many don't apply a
> function to spit the sequence.
> http://stackoverflow.com/questions/949098/python-split-a-list-based-on-a-condition
>
> Is there any interest in a C implementation of this? Seems too trivial
> to write a PEP, so I'm just trying to measure interest before diving
> in. This wouldn't really belong in intertool. Would it be best
> implemented as a top level built-in?
>
> -Brad
> --
> http://mail.python.org/mailman/listinfo/python-list
>

This sounds like it belongs to the python-ideas list.  I suggest
posting there for better feedback, since the core developers check
that list more often than this one.


-- 
Pablo Torres N.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

2009-07-02 Thread Paul Rubin

schickb  writes:
> def split(seq, func=None):
> if func is None:
> func = bool
> t, f = [], []
> for item in seq:
> if func(item):
> t.append(item)
> else:
> f.append(item)
> return (t, f)

untested:

   def split(seq, func=bool):
  xs = zip(seq, itertools.imap(func, seq))
  t = list(x for (x,y) in xs if y)
  f = list(x for (x,y) in xs if not y)
  return (t, f)
-- 
http://mail.python.org/mailman/listinfo/python-list

Sequence splitting

2009-07-02 Thread schickb

I have fairly often found the need to split a sequence into two groups
based on a function result. Much like the existing filter function,
but returning a tuple of true, false sequences. In Python, something
like:

def split(seq, func=None):
if func is None:
func = bool
t, f = [], []
for item in seq:
if func(item):
t.append(item)
else:
f.append(item)
return (t, f)

The discussion linked to below has various approaches for doing this
now, but most traverse the sequence twice and many don't apply a
function to spit the sequence.
http://stackoverflow.com/questions/949098/python-split-a-list-based-on-a-condition

Is there any interest in a C implementation of this? Seems too trivial
to write a PEP, so I'm just trying to measure interest before diving
in. This wouldn't really belong in intertool. Would it be best
implemented as a top level built-in?

-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Re: Sequence splitting

Sequence splitting

35 matches

Site Navigation

Mail list logo

Footer information