Re: Sequence splitting

2009-07-03 Thread MRAB

Mel wrote:

Steven D'Aprano wrote:


The most important difference between my suggestion and that of the OP is
that he limited the key function to something which returns a truth
value, while I'm looking for something more general which can split the
input into an arbitrary number of collated sublists.


Generally along the lines of

def scatter (seq, criterion):
lists = {}
for s in seq:
lists.setdefault (criterion (s), []).append (s)
return lists

modulo a defaultdict or some such ?


Somehow this reminds me of the new Counter class, except that it's
making lists instead of counts.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Paul Rubin
Terry Reedy  writes:
> > This isn't so attractive, since filter takes a sequence input but
> > returns a list.
> 
> In modern Python (py3), it returns an iterator.

Still not much help in the proposed version that would return two
iterators.  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Terry Reedy

Paul Rubin wrote:

Brad  writes:

Maybe this would be difficult to get into the core, but how about this
idea: Rename the current filter function to something like "split" or
"partition" (which I agree may be a better name) and modify it to
return the desired true and false sequences. Then recreate the
existing "filter" function with a wrapper that throws away the false
sequence.


This isn't so attractive, since filter takes a sequence input but
returns a list. 


In modern Python (py3), it returns an iterator.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Terry Reedy

Brad wrote:

On Jul 2, 8:17 pm, "Pablo Torres N."  wrote:

This sounds like it belongs to the python-ideas list.  I suggest
posting there for better feedback, since the core developers check
that list more often than this one.


I tried posting on python-ideas and received a "You are not allowed to
post to this mailing list" reply. Perhaps because I am posting through
Google groups? Or maybe one must be an approved member to post?


Spammers post thru Google groups

Either subscribe or access via news.gmane.org as gmane.comp.python.ideas.

As to your main question: this was discuss some years ago. There did not 
seem enough use cases for 'bifilter' or 'distributor' to warrent 
inclusion in the stdlib. Easy enough to do on one's own for those few 
cases. Just filter twice.


Usually, one wants to do one thing or another to each item with the 
original scan


for item in collection:
  if pred(item):
proc_true(item)
  else:
proc_false(item)

Having proc_true and proc_false both be append(item) is a special case.

Note that filter has been changed from returning a list to returning an 
iterator.  So a proposal for a function to return two lists would seem 
backwards. A function that returns two iterators would be possible. 
Something like itertools.tee except that each item would go on one tee 
or the other instead of both. It would have two queues (deques) instead 
of one. And it should be lazy -- items should only be pulled from the 
original iterator when an item is requested from an empty queue.


I am not sure it is worth it. In the worst case, all items are say 
'True' and you request 'False' items and all items get copied to the 
True queue before the False iterator raised StopIteration.


The way to avoid calling keyfunc(item) twice on each item is to make a 
new list first: newlist = list(map(keyfunc,iterable)). Then scan newlist 
twice.


If you write such a function, first in Python, then C, you can register 
code on PyPI and see how much interest it gets.


Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Mel
Steven D'Aprano wrote:

> The most important difference between my suggestion and that of the OP is
> that he limited the key function to something which returns a truth
> value, while I'm looking for something more general which can split the
> input into an arbitrary number of collated sublists.

Generally along the lines of

def scatter (seq, criterion):
lists = {}
for s in seq:
lists.setdefault (criterion (s), []).append (s)
return lists

modulo a defaultdict or some such ?

Mel.
> 
> 
> 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Scott David Daniels

Steven D'Aprano wrote:
I've never needed such a split function, and I don't like the name, and 
the functionality isn't general enough. I'd prefer something which splits 
the input sequence into as many sublists as necessary, according to the 
output of the key function. Something like itertools.groupby(), except it 
runs through the entire sequence and collates all the elements with 
identical keys.


splitby(range(10), lambda n: n%3)
=> [ (0, [0, 3, 6, 9]),
 (1, [1, 4, 7]), 
 (2, [2, 5, 8]) ]


Your split() would be nearly equivalent to this with a key function that 
returns a Boolean.


Well, here is my go at doing the original with iterators:

def splitter(source, test=bool):
a, b = itertools.tee((x, test(x)) for x in source)
return (data for data, decision in a if decision), (
data for data, decision in b if not decision)

This has the advantage that it can operate on infinite lists.  For
something like splitby for grouping, I seem to need to know the cases
up front:

def _make_gen(particular, src):
 return (x for x, c in src if c == particular)

def splitby(source, cases, case):
'''Produce a dict of generators for case(el) for el in source'''
decided = itertools.tee(((x, case(x)) for x in source), len(cases))
return dict((c, _make_gen(c, src))
for c, src in zip(cases, decided))

example:

def classify(n):
'''Least prime factor of a few'''
for prime in [2, 3, 5, 7]:
if n % prime == 0:
return prime
return 0

for k,g in splitby(range(50), (2, 3, 5, 7, 0), classify).items():
print('%s: %s' % (k, list(g)))

0: [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
2: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]
3: [3, 9, 15, 21, 27, 33, 39, 45]
5: [5, 25, 35]
7: [7, 49]

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Lie Ryan
Brad wrote:
> On Jul 3, 12:57 am, Steven D'Aprano  cybersource.com.au> wrote:
>> I've never needed such a split function, and I don't like the name, and
>> the functionality isn't general enough. I'd prefer something which splits
>> the input sequence into as many sublists as necessary, according to the
>> output of the key function.
> 
> That's not a bad idea, I'll have to experiment with the alternatives.
> My thought process for this, however, was that filter itself already
> splits the sequence and it would have been more useful had it not
> thrown away "half" of what it discovers. It could have been written to
> returned two sequences with very litter perf hit for all but very
> large input sequences, and been useful in more situations. What I
> *really* wanted was a way to make filter itself more useful, since it
> seems a bit silly to have two very similar functions.

It's not that easy. filter is nearly by definition a lazy function. The
various split/group functions is impossible to be made an efficient
iterator, since they must traverse the whole list before being sure that
they have finished the group/part of the split (or even to be sure that
they have all the group keys).

> Maybe this would be difficult to get into the core, but how about this
> idea: Rename the current filter function to something like "split" or
> "partition" (which I agree may be a better name) and modify it to
> return the desired true and false sequences. Then recreate the
> existing "filter" function with a wrapper that throws away the false
> sequence.

filter has a very deep root in functional programming, I doubt it will
change any time soon. Also, consider infinite iterators. With the
"filter as wrapper to partition" scheme, it is impossible for
split/group/partition to use infinite iterator.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Paul Rubin
Brad  writes:
> Maybe this would be difficult to get into the core, but how about this
> idea: Rename the current filter function to something like "split" or
> "partition" (which I agree may be a better name) and modify it to
> return the desired true and false sequences. Then recreate the
> existing "filter" function with a wrapper that throws away the false
> sequence.

This isn't so attractive, since filter takes a sequence input but
returns a list.  So if I have an iterator that produces a billion
elements of which I expect three to satisfy some predicate, then
   xs = filter(func, seq)
as currently implemented will build a 3-element list and return it.  
Under your suggestion, it would also build and throw away an (almost)
billion element list.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Brad
On Jul 3, 12:57 am, Steven D'Aprano  wrote:
> I've never needed such a split function, and I don't like the name, and
> the functionality isn't general enough. I'd prefer something which splits
> the input sequence into as many sublists as necessary, according to the
> output of the key function.

That's not a bad idea, I'll have to experiment with the alternatives.
My thought process for this, however, was that filter itself already
splits the sequence and it would have been more useful had it not
thrown away "half" of what it discovers. It could have been written to
returned two sequences with very litter perf hit for all but very
large input sequences, and been useful in more situations. What I
*really* wanted was a way to make filter itself more useful, since it
seems a bit silly to have two very similar functions.

Maybe this would be difficult to get into the core, but how about this
idea: Rename the current filter function to something like "split" or
"partition" (which I agree may be a better name) and modify it to
return the desired true and false sequences. Then recreate the
existing "filter" function with a wrapper that throws away the false
sequence.

Here are two simplified re-creations of situations where I could have
used partition (aka split):

words = ['this', 'is', 'a', 'bunch', 'of', 'words']
short, long = partition(words, lambda w: len(w) < 3)

d = {1 : 'w', 2 : 'x' ,3 : 'y' ,4 : 'z'}
keys = [1, 3, 4, 9]
found, missing = partition(keys, d.has_key)


There are probably a dozen other approaches, but the existing "filter"
is fast, clear, and *almost* good enough. So when is this useful in
general: Whenever filter itself is useful, but you want to use both
sides of the partitioning work it already does.


-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Lie Ryan
tsangpo wrote:
> Just a shorter implementation:
> 
> from itertools import groupby
> def split(lst, func):
> gs = groupby(lst, func)
> return list(gs[True]), list(gs[False])
> 

As you're replying to my post, I assume you meant a shorter
implementation my function. But it doesn't do the same thing. The idea
with my group() is similar to what Steven D'Aprano is describing in
another branch of this thread (i.e. splitting not only based on True and
False, but arbitrary groupings, e.g. 'tru', 'flash' or perhaps -1, 0, 1).

For example:
>>> def group(seq, func=bool):
... ret = {}
... for item in seq:
... fitem = func(item)
... try:
... ret[fitem].append(item)
... except KeyError:
... ret[fitem] = [item]
... return ret
...
>>> group(range(10), lambda x: x%3)
{0: [0, 3, 6, 9], 1: [1, 4, 7], 2: [2, 5, 8]}
>>> # the next one is the OP's split
>>> group(['true', '', [], [''], 'false'], bool)
{False: ['', []], True: ['true', [''], 'false']}
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Pablo Torres N.
On Fri, Jul 3, 2009 at 06:01, Paul Moore wrote:
> 2009/7/3 Brad :
>> Perhaps true, but it would be a nice convenience (for me) as a built-
>> in written in either Python or C. Although the default case of a bool
>> function would surely be faster.
>
> The chance of getting this accepted as a builtin is essentially zero.
> To be a builtin, as opposed to being in the standard library,
> something has to have a very strong justification.

That's right.  Mr. schickb, I think what you need is a few concrete
examples as to where this function would be beneficial, so it can be
judged objectively.

-- 
Pablo Torres N.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Paul Moore
2009/7/3 Brad :
> Perhaps true, but it would be a nice convenience (for me) as a built-
> in written in either Python or C. Although the default case of a bool
> function would surely be faster.

The chance of getting this accepted as a builtin is essentially zero.
To be a builtin, as opposed to being in the standard library,
something has to have a very strong justification.

This suggestion may find a home in the standard library, although it's
not entirely clear where (maybe itertools, although it's not entirely
obvious that it's a fit there).

You'd have to justify this against the argument "not every 2-3 line
function needs to be built in". Personally, I'm not sure it passes
that test - sure, it's a useful function, but it's not that hard to
write when you need it. It may be better as a recipe in the cookbook.
Or if it's close enough to the spirit of the itertools, it may be
suitable as a sample in the itertools documentation (the "recipes"
section).

Paul.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Steven D'Aprano
On Fri, 03 Jul 2009 01:39:27 -0700, Paul Rubin wrote:

> Steven D'Aprano  writes:
>> groupby() works on lists.
> 
 a = [1,3,4,6,7]
 from itertools import groupby
 b = groupby(a, lambda x: x%2==1)  # split into even and odd 
 c = list(b)
 print len(c)
> 3
 d = list(c[1][1])# should be [4,6] print d  # oops.
> []

I didn't say it worked properly *wink*

Seriously, this behaviour caught me out too. The problem isn't that the 
input data is a list, the same problem occurs for arbitrary iterators. 
From the docs:

[quote]
The operation of groupby() is similar to the uniq filter in Unix. It 
generates a break or new group every time the value of the key function 
changes (which is why it is usually necessary to have sorted the data 
using the same key function). That behavior differs from SQL’s GROUP BY 
which aggregates common elements regardless of their input order.

The returned group is itself an iterator that shares the underlying 
iterable with groupby(). Because the source is shared, when the groupby() 
object is advanced, the previous group is no longer visible. So, if that 
data is needed later, it should be stored as a list
[end quote]

http://www.python.org/doc/2.6/library/itertools.html#itertools.groupby




-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread tsangpo
Just a shorter implementation:

from itertools import groupby
def split(lst, func):
gs = groupby(lst, func)
return list(gs[True]), list(gs[False])


"Lie Ryan"  
дÈëÏûÏ¢ÐÂÎÅ:nfi3m.2341$ze1.1...@news-server.bigpond.net.au...
> Brad wrote:
>> On Jul 2, 9:40 pm, "Pablo Torres N."  wrote:
>>> If it is speed that we are after, it's my understanding that map and
>>> filter are faster than iterating with the for statement (and also
>>> faster than list comprehensions).  So here is a rewrite:
>>>
>>> def split(seq, func=bool):
>>> t = filter(func, seq)
>>> f = filter(lambda x: not func(x), seq)
>>> return list(t), list(f)
>>>
>>
>> In my simple tests, that takes 1.8x as long as the original solution.
>> Better than the itertools solution, when "func" is short and fast. I
>> think the solution here would worse if func was more complex.
>>
>> Either way, what I am still wondering is if people would find a built-
>> in implementation useful?
>>
>> -Brad
>
> A built-in/itertools should always try to provide the general solution
> to be as useful as possible, something like this:
>
> def group(seq, func=bool):
>ret = {}
>for item in seq:
>fitem = func(item)
>try:
>ret[fitem].append(item)
>except KeyError:
>ret[fitem] = [item]
>return ret
>
> definitely won't be faster, but it is a much more general solution.
> Basically, the function allows you to group sequences based on the
> result of func(item). It is similar to itertools.groupby() except that
> this also group non-contiguous items. 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Paul Rubin
Steven D'Aprano  writes:
> groupby() works on lists.

>>> a = [1,3,4,6,7]
>>> from itertools import groupby
>>> b = groupby(a, lambda x: x%2==1)  # split into even and odd
>>> c = list(b)
>>> print len(c)
3
>>> d = list(c[1][1])# should be [4,6]
>>> print d  # oops.
[]

> The difference between what I'm suggesting and what groupby() does is 
> that my suggestion would collate *all* the elements with the same key, 
> not just runs of them. This (as far as I can tell) requires returning 
> lists rather than iterators.

I guess that is reasonable.

> The most important difference between my suggestion and that of the OP is 
> that he limited the key function to something which returns a truth 
> value, while I'm looking for something more general which can split the 
> input into an arbitrary number of collated sublists.

Also ok.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Steven D'Aprano
On Fri, 03 Jul 2009 01:02:56 -0700, Paul Rubin wrote:

> Steven D'Aprano  writes:
>> I've never needed such a split function, and I don't like the name, and
>> the functionality isn't general enough. I'd prefer something which
>> splits the input sequence into as many sublists as necessary, according
>> to the output of the key function. Something like itertools.groupby(),
>> except it runs through the entire sequence and collates all the
>> elements with identical keys.
> 
> No really, groupby makes iterators, not lists, and it you have to
> develop quite a delicate sense of when you can use it without having
> bugs caused by the different iterators it makes getting advanced at the
> wrong times.  The concept of a split function that actually works on
> lists is useful.  I'm neutral about whether it's worth having a C
> version in the stdlib.

groupby() works on lists.

The difference between what I'm suggesting and what groupby() does is 
that my suggestion would collate *all* the elements with the same key, 
not just runs of them. This (as far as I can tell) requires returning 
lists rather than iterators.

The most important difference between my suggestion and that of the OP is 
that he limited the key function to something which returns a truth 
value, while I'm looking for something more general which can split the 
input into an arbitrary number of collated sublists.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Chris Rebert
On Thu, Jul 2, 2009 at 11:31 PM, Brad wrote:
> On Jul 2, 9:40 pm, "Pablo Torres N."  wrote:
>>
>> If it is speed that we are after, it's my understanding that map and
>> filter are faster than iterating with the for statement (and also
>> faster than list comprehensions).  So here is a rewrite:
>>
>> def split(seq, func=bool):
>>         t = filter(func, seq)
>>         f = filter(lambda x: not func(x), seq)
>>         return list(t), list(f)
>>
>
> In my simple tests, that takes 1.8x as long as the original solution.
> Better than the itertools solution, when "func" is short and fast. I
> think the solution here would worse if func was more complex.
>
> Either way, what I am still wondering is if people would find a built-
> in implementation useful?

FWIW, Ruby has Enumerable#partition, which does the same thing as
split() and has a better name IMHO.
http://www.ruby-doc.org/core/classes/Enumerable.html#M003130

Cheers,
Chris
-- 
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Paul Rubin
Steven D'Aprano  writes:
> I've never needed such a split function, and I don't like the name, and 
> the functionality isn't general enough. I'd prefer something which splits 
> the input sequence into as many sublists as necessary, according to the 
> output of the key function. Something like itertools.groupby(), except it 
> runs through the entire sequence and collates all the elements with 
> identical keys.

No really, groupby makes iterators, not lists, and it you have to
develop quite a delicate sense of when you can use it without having
bugs caused by the different iterators it makes getting advanced at
the wrong times.  The concept of a split function that actually works
on lists is useful.  I'm neutral about whether it's worth having a C
version in the stdlib.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Steven D'Aprano
On Thu, 02 Jul 2009 22:10:14 -0500, Pablo Torres N. wrote:

> This sounds like it belongs to the python-ideas list.  I suggest posting
> there for better feedback, since the core developers check that list
> more often than this one.

If you post to python-ideas, you'll probably be told to gather feedback 
here first. The core-developers aren't hugely interested in arbitrary new 
features unless they have significant community support.

I've never needed such a split function, and I don't like the name, and 
the functionality isn't general enough. I'd prefer something which splits 
the input sequence into as many sublists as necessary, according to the 
output of the key function. Something like itertools.groupby(), except it 
runs through the entire sequence and collates all the elements with 
identical keys.

E.g.:

splitby(range(10), lambda n: n%3)
=> [ (0, [0, 3, 6, 9]),
 (1, [1, 4, 7]), 
 (2, [2, 5, 8]) ]

Your split() would be nearly equivalent to this with a key function that 
returns a Boolean.



-- 
Steven


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Lie Ryan
Rickard Lindberg wrote:
>> I tried posting on python-ideas and received a "You are not allowed to
>> post to this mailing list" reply. Perhaps because I am posting through
>> Google groups? Or maybe one must be an approved member to post?
> 
> If you got an "awaiting moderator approval" message you post might appear on
> the list soon. The reason for getting those can be that it is a member only
> list and you posted from another address. I am not sure if that was the 
> message
> you got.
> 

AFAIK, python-ideas is not moderated (I followed python-ideas). I've
never used Google Groups to access it though. Try subscribing to the
mailing list directly (instead of using Google Group's web-gateway)
here: http://mail.python.org/mailman/listinfo/python-ideas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Lie Ryan
Brad wrote:
> On Jul 2, 9:40 pm, "Pablo Torres N."  wrote:
>> If it is speed that we are after, it's my understanding that map and
>> filter are faster than iterating with the for statement (and also
>> faster than list comprehensions).  So here is a rewrite:
>>
>> def split(seq, func=bool):
>> t = filter(func, seq)
>> f = filter(lambda x: not func(x), seq)
>> return list(t), list(f)
>>
> 
> In my simple tests, that takes 1.8x as long as the original solution.
> Better than the itertools solution, when "func" is short and fast. I
> think the solution here would worse if func was more complex.
> 
> Either way, what I am still wondering is if people would find a built-
> in implementation useful?
> 
> -Brad

A built-in/itertools should always try to provide the general solution
to be as useful as possible, something like this:

def group(seq, func=bool):
ret = {}
for item in seq:
fitem = func(item)
try:
ret[fitem].append(item)
except KeyError:
ret[fitem] = [item]
return ret

definitely won't be faster, but it is a much more general solution.
Basically, the function allows you to group sequences based on the
result of func(item). It is similar to itertools.groupby() except that
this also group non-contiguous items.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Rickard Lindberg
> I tried posting on python-ideas and received a "You are not allowed to
> post to this mailing list" reply. Perhaps because I am posting through
> Google groups? Or maybe one must be an approved member to post?

If you got an "awaiting moderator approval" message you post might appear on
the list soon. The reason for getting those can be that it is a member only
list and you posted from another address. I am not sure if that was the message
you got.

-- 
Rickard Lindberg
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Brad
On Jul 2, 8:17 pm, "Pablo Torres N."  wrote:
>
> This sounds like it belongs to the python-ideas list.  I suggest
> posting there for better feedback, since the core developers check
> that list more often than this one.

I tried posting on python-ideas and received a "You are not allowed to
post to this mailing list" reply. Perhaps because I am posting through
Google groups? Or maybe one must be an approved member to post?

-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Brad
On Jul 2, 9:40 pm, "Pablo Torres N."  wrote:
>
> If it is speed that we are after, it's my understanding that map and
> filter are faster than iterating with the for statement (and also
> faster than list comprehensions).  So here is a rewrite:
>
> def split(seq, func=bool):
>         t = filter(func, seq)
>         f = filter(lambda x: not func(x), seq)
>         return list(t), list(f)
>

In my simple tests, that takes 1.8x as long as the original solution.
Better than the itertools solution, when "func" is short and fast. I
think the solution here would worse if func was more complex.

Either way, what I am still wondering is if people would find a built-
in implementation useful?

-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Gabriel Genellina

En Fri, 03 Jul 2009 01:58:22 -0300, > escribió:


"Pablo Torres N."  writes:

def split(seq, func=bool):
t = filter(func, seq)
f = filter(lambda x: not func(x), seq)
return list(t), list(f)


That is icky--you're calling func (which might be slow) twice instead
of once on every element of the seq.


In addition, this doesn't work if seq is an iterator instead of a sequence.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Paul Rubin
"Pablo Torres N."  writes:
> def split(seq, func=bool):
>   t = filter(func, seq)
>   f = filter(lambda x: not func(x), seq)
>   return list(t), list(f)

That is icky--you're calling func (which might be slow) twice instead
of once on every element of the seq.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Pablo Torres N.
On Thu, Jul 2, 2009 at 23:34, Brad wrote:
> On Jul 2, 9:08 pm, Paul Rubin  wrote:
>> Brad  writes:
>> > On Jul 2, 8:14 pm, Paul Rubin  wrote:
>> > > schickb  writes:
>> > > > def split(seq, func=None):
>> > > >     if func is None:
>> > > >         func = bool
>> > > >     t, f = [], []
>> > > >     for item in seq:
>> > > >         if func(item):
>> > > >             t.append(item)
>> > > >         else:
>> > > >             f.append(item)
>> > > >     return (t, f)
>>
>> > > untested:
>>
>> > >    def split(seq, func=bool):
>> > >       xs = zip(seq, itertools.imap(func, seq))
>> > >       t = list(x for (x,y) in xs if y)
>> > >       f = list(x for (x,y) in xs if not y)
>> > >       return (t, f)
>>
>> > In my testing that is 3.5x slower than the original solution (and less
>> > clear imo). I fixed my version to take a bool default. Either way, I'm
>> > not really looking for additional ways to do this in Python unless
>> > I've totally missed something. What I am considering is writing it in
>> > C, much like filter.
>>
>> I'm a little skeptical that the C version will help much, if it's
>> evaluating a python function at every list element.
>
> Perhaps true, but it would be a nice convenience (for me) as a built-
> in written in either Python or C. Although the default case of a bool
> function would surely be faster.
>
>> Here's a variant of your version:
>>
>>  def split(seq, func=bool):
>>      t, f = [], []
>>      ta, fa = t.append, f.append
>>      for item in seq:
>>          (ta if func(item) else fa)(item)
>>      return (t, f)
>>
>> This avoids some dict lookups and copying.  I wonder if that helps
>> significantly.
>
> Faster, but in tests of a few short sequences only 1% so.
>
> -Brad
> --
> http://mail.python.org/mailman/listinfo/python-list
>

If it is speed that we are after, it's my understanding that map and
filter are faster than iterating with the for statement (and also
faster than list comprehensions).  So here is a rewrite:

def split(seq, func=bool):
t = filter(func, seq)
f = filter(lambda x: not func(x), seq)
return list(t), list(f)

The lambda thing is kinda ugly, but I can't think of anything else.
Also, is it ok to return lists?  Py3k saw a lot of APIs changed to
return iterables instead of lists, so maybe my function should have
'return t, f' as it's last statement.


-- 
Pablo Torres N.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Brad
On Jul 2, 9:08 pm, Paul Rubin  wrote:
> Brad  writes:
> > On Jul 2, 8:14 pm, Paul Rubin  wrote:
> > > schickb  writes:
> > > > def split(seq, func=None):
> > > >     if func is None:
> > > >         func = bool
> > > >     t, f = [], []
> > > >     for item in seq:
> > > >         if func(item):
> > > >             t.append(item)
> > > >         else:
> > > >             f.append(item)
> > > >     return (t, f)
>
> > > untested:
>
> > >    def split(seq, func=bool):
> > >       xs = zip(seq, itertools.imap(func, seq))
> > >       t = list(x for (x,y) in xs if y)
> > >       f = list(x for (x,y) in xs if not y)
> > >       return (t, f)
>
> > In my testing that is 3.5x slower than the original solution (and less
> > clear imo). I fixed my version to take a bool default. Either way, I'm
> > not really looking for additional ways to do this in Python unless
> > I've totally missed something. What I am considering is writing it in
> > C, much like filter.
>
> I'm a little skeptical that the C version will help much, if it's
> evaluating a python function at every list element.  

Perhaps true, but it would be a nice convenience (for me) as a built-
in written in either Python or C. Although the default case of a bool
function would surely be faster.

> Here's a variant of your version:
>
>  def split(seq, func=bool):
>      t, f = [], []
>      ta, fa = t.append, f.append
>      for item in seq:
>          (ta if func(item) else fa)(item)
>      return (t, f)
>
> This avoids some dict lookups and copying.  I wonder if that helps
> significantly.

Faster, but in tests of a few short sequences only 1% so.

-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Paul Rubin
Brad  writes:

> On Jul 2, 8:14 pm, Paul Rubin  wrote:
> > schickb  writes:
> > > def split(seq, func=None):
> > >     if func is None:
> > >         func = bool
> > >     t, f = [], []
> > >     for item in seq:
> > >         if func(item):
> > >             t.append(item)
> > >         else:
> > >             f.append(item)
> > >     return (t, f)
> >
> > untested:
> >
> >    def split(seq, func=bool):
> >       xs = zip(seq, itertools.imap(func, seq))
> >       t = list(x for (x,y) in xs if y)
> >       f = list(x for (x,y) in xs if not y)
> >       return (t, f)
> 
> In my testing that is 3.5x slower than the original solution (and less
> clear imo). I fixed my version to take a bool default. Either way, I'm
> not really looking for additional ways to do this in Python unless
> I've totally missed something. What I am considering is writing it in
> C, much like filter.

I'm a little skeptical that the C version will help much, if it's
evaluating a python function at every list element.  Here's a variant
of your version:

 def split(seq, func=bool):
     t, f = [], []
 ta, fa = t.append, f.append
     for item in seq:
 (ta if func(item) else fa)(item)
     return (t, f)

This avoids some dict lookups and copying.  I wonder if that helps
significantly.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Brad
On Jul 2, 8:17 pm, "Pablo Torres N."  wrote:
> On Jul 2, 9:56 pm, schickb  wrote:
>
> > I have fairly often found the need to split a sequence into two groups
> > based on a function result.
>
> This sounds like it belongs to the python-ideas list.  I suggest
> posting there for better feedback, since the core developers check
> that list more often than this one.

Thanks, I didn't know about that list.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Brad
On Jul 2, 8:14 pm, Paul Rubin  wrote:
> schickb  writes:
> > def split(seq, func=None):
> >     if func is None:
> >         func = bool
> >     t, f = [], []
> >     for item in seq:
> >         if func(item):
> >             t.append(item)
> >         else:
> >             f.append(item)
> >     return (t, f)
>
> untested:
>
>    def split(seq, func=bool):
>       xs = zip(seq, itertools.imap(func, seq))
>       t = list(x for (x,y) in xs if y)
>       f = list(x for (x,y) in xs if not y)
>       return (t, f)

In my testing that is 3.5x slower than the original solution (and less
clear imo). I fixed my version to take a bool default. Either way, I'm
not really looking for additional ways to do this in Python unless
I've totally missed something. What I am considering is writing it in
C, much like filter.

-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Pablo Torres N.
On Jul 2, 9:56 pm, schickb  wrote:
> I have fairly often found the need to split a sequence into two groups
> based on a function result. Much like the existing filter function,
> but returning a tuple of true, false sequences. In Python, something
> like:
>
> def split(seq, func=None):
>     if func is None:
>         func = bool
>     t, f = [], []
>     for item in seq:
>         if func(item):
>             t.append(item)
>         else:
>             f.append(item)
>     return (t, f)
>
> The discussion linked to below has various approaches for doing this
> now, but most traverse the sequence twice and many don't apply a
> function to spit the 
> sequence.http://stackoverflow.com/questions/949098/python-split-a-list-based-o...
>
> Is there any interest in a C implementation of this? Seems too trivial
> to write a PEP, so I'm just trying to measure interest before diving
> in. This wouldn't really belong in intertool. Would it be best
> implemented as a top level built-in?
>
> -Brad

This sounds like it belongs to the python-ideas list.  I suggest
posting there for better feedback, since the core developers check
that list more often than this one.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Pablo Torres N.
On Thu, Jul 2, 2009 at 21:56, schickb wrote:
> I have fairly often found the need to split a sequence into two groups
> based on a function result. Much like the existing filter function,
> but returning a tuple of true, false sequences. In Python, something
> like:
>
> def split(seq, func=None):
>    if func is None:
>        func = bool
>    t, f = [], []
>    for item in seq:
>        if func(item):
>            t.append(item)
>        else:
>            f.append(item)
>    return (t, f)
>
> The discussion linked to below has various approaches for doing this
> now, but most traverse the sequence twice and many don't apply a
> function to spit the sequence.
> http://stackoverflow.com/questions/949098/python-split-a-list-based-on-a-condition
>
> Is there any interest in a C implementation of this? Seems too trivial
> to write a PEP, so I'm just trying to measure interest before diving
> in. This wouldn't really belong in intertool. Would it be best
> implemented as a top level built-in?
>
> -Brad
> --
> http://mail.python.org/mailman/listinfo/python-list
>

This sounds like it belongs to the python-ideas list.  I suggest
posting there for better feedback, since the core developers check
that list more often than this one.


-- 
Pablo Torres N.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-02 Thread Paul Rubin
schickb  writes:
> def split(seq, func=None):
> if func is None:
> func = bool
> t, f = [], []
> for item in seq:
> if func(item):
> t.append(item)
> else:
> f.append(item)
> return (t, f)

untested:

   def split(seq, func=bool):
  xs = zip(seq, itertools.imap(func, seq))
  t = list(x for (x,y) in xs if y)
  f = list(x for (x,y) in xs if not y)
  return (t, f)
-- 
http://mail.python.org/mailman/listinfo/python-list


Sequence splitting

2009-07-02 Thread schickb
I have fairly often found the need to split a sequence into two groups
based on a function result. Much like the existing filter function,
but returning a tuple of true, false sequences. In Python, something
like:

def split(seq, func=None):
if func is None:
func = bool
t, f = [], []
for item in seq:
if func(item):
t.append(item)
else:
f.append(item)
return (t, f)

The discussion linked to below has various approaches for doing this
now, but most traverse the sequence twice and many don't apply a
function to spit the sequence.
http://stackoverflow.com/questions/949098/python-split-a-list-based-on-a-condition

Is there any interest in a C implementation of this? Seems too trivial
to write a PEP, so I'm just trying to measure interest before diving
in. This wouldn't really belong in intertool. Would it be best
implemented as a top level built-in?

-Brad
-- 
http://mail.python.org/mailman/listinfo/python-list