Re: [Tutor] Help with iterators

2013-03-28 Thread Matthew Johnson
Dear list,

Sorry for the delay -- it has taken some time for me to get these emails.

It appears i made some dumb error when typing out the description.

Mitya Sirenef was correct to ignore my words and to focus on my code.

Thanks for your help. I may ask again / for more help when i feel i
have tried sufficiently hard to absorb the answers below.

Thanks again

mj

On 22/03/2013, at 6:24 PM, tutor-requ...@python.org
tutor-requ...@python.org wrote:

 Send Tutor mailing list submissions to
tutor@python.org

 To subscribe or unsubscribe via the World Wide Web, visit
http://mail.python.org/mailman/listinfo/tutor
 or, via email, send a message with subject or body 'help' to
tutor-requ...@python.org

 You can reach the person managing the list at
tutor-ow...@python.org

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Tutor digest...


 Today's Topics:

   1. Re: Help with iterators (Mitya Sirenef)
   2. Re: Help with iterators (Steven D'Aprano)
   3. Re: Help with iterators (Steven D'Aprano)
   4. Re: Help with iterators (Mitya Sirenef)
   5. Please Help (Arijit Ukil)


 --

 Message: 1
 Date: Thu, 21 Mar 2013 21:39:12 -0400
 From: Mitya Sirenef msire...@lightbird.net
 To: tutor@python.org
 Subject: Re: [Tutor] Help with iterators
 Message-ID: 514bb640.5050...@lightbird.net
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed

 On 03/21/2013 08:39 PM, Matthew Johnson wrote:
 Dear list,

 I have been trying to understand out how to use iterators and in
 particular groupby statements. I am, however, quite lost.

 I wish to subset the below list, selecting the observations that have
 an ID ('realtime_start') value that is greater than some date (i've
 used the variable name maxDate), and in the case that there is more
 than one such record, returning only the one that has the largest ID
 ('realtime_start').

 The code below does the job, however i have the impression that it
 might be done in a more python way using iterators and groupby
 statements.

 could someone please help me understand how to go from this code to
 the pythonic idiom?

 thanks in advance,

 Matt Johnson

 _

 ## Code example

 import pprint

 obs = [{'date': '2012-09-01',
 'realtime_end': '2013-02-18',
 'realtime_start': '2012-10-15',
 'value': '231.951'},
 {'date': '2012-09-01',
 'realtime_end': '2013-02-18',
 'realtime_start': '2012-11-15',
 'value': '231.881'},
 {'date': '2012-10-01',
 'realtime_end': '2013-02-18',
 'realtime_start': '2012-11-15',
 'value': '231.751'},
 {'date': '2012-10-01',
 'realtime_end': '-12-31',
 'realtime_start': '2012-12-19',
 'value': '231.623'},
 {'date': '2013-02-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-03-21',
 'value': '231.157'},
 {'date': '2012-11-01',
 'realtime_end': '2013-02-18',
 'realtime_start': '2012-12-14',
 'value': '231.025'},
 {'date': '2012-11-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-01-19',
 'value': '231.071'},
 {'date': '2012-12-01',
 'realtime_end': '2013-02-18',
 'realtime_start': '2013-01-16',
 'value': '230.979'},
 {'date': '2012-12-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-02-19',
 'value': '231.137'},
 {'date': '2012-12-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-03-19',
 'value': '231.197'},
 {'date': '2013-01-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-02-21',
 'value': '231.198'},
 {'date': '2013-01-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-03-21',
 'value': '231.222'}]

 maxDate = 2013-03-21

 dobs = dict([(d, []) for d in set([e['date'] for e in obs])])

 for o in obs:
 dobs[o['date']].append(o)

 dobs_subMax = dict([(k, [d for d in v if d['realtime_start'] = maxDate])
 for k, v in dobs.items()])

 rts = lambda x: x['realtime_start']

 mmax = [sorted(e, key=rts)[-1] for e in dobs_subMax.values() if e]

 mmax.sort(key = lambda x: x['date'])

 pprint.pprint(mmax)
 ___
 Tutor maillist - Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor


 You can do it with groupby like so:


 from itertools import groupby
 from operator import itemgetter


 maxDate = 2013-03-21
 mmax= list()

 obs.sort(key=itemgetter('date'))

 for k, group in groupby(obs, key=itemgetter('date')):
 group = [dob for dob in group if dob['realtime_start'] = maxDate]
 if group:
 group.sort(key=itemgetter('realtime_start'))
 mmax.append(group[-1])

 pprint.pprint(mmax)


 Note that writing multiply-nested comprehensions like you did results in
 very unreadable code. Do you find this code more readable?

  -m


 --
 Lark's Tongue Guide to Python: http://lightbird.net/larks/

 Many a man fails as an original thinker simply because his memory it too
 good.  Friedrich Nietzsche



 --

 Message: 2

Re: [Tutor] Help with iterators

2013-03-21 Thread Mitya Sirenef

On 03/21/2013 08:39 PM, Matthew Johnson wrote:

Dear list,


 I have been trying to understand out how to use iterators and in
 particular groupby statements. I am, however, quite lost.

 I wish to subset the below list, selecting the observations that have
 an ID ('realtime_start') value that is greater than some date (i've
 used the variable name maxDate), and in the case that there is more
 than one such record, returning only the one that has the largest ID
 ('realtime_start').

 The code below does the job, however i have the impression that it
 might be done in a more python way using iterators and groupby
 statements.

 could someone please help me understand how to go from this code to
 the pythonic idiom?

 thanks in advance,

 Matt Johnson

 _

 ## Code example

 import pprint

 obs = [{'date': '2012-09-01',
 'realtime_end': '2013-02-18',
 'realtime_start': '2012-10-15',
 'value': '231.951'},
 {'date': '2012-09-01',
 'realtime_end': '2013-02-18',
 'realtime_start': '2012-11-15',
 'value': '231.881'},
 {'date': '2012-10-01',
 'realtime_end': '2013-02-18',
 'realtime_start': '2012-11-15',
 'value': '231.751'},
 {'date': '2012-10-01',
 'realtime_end': '-12-31',
 'realtime_start': '2012-12-19',
 'value': '231.623'},
 {'date': '2013-02-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-03-21',
 'value': '231.157'},
 {'date': '2012-11-01',
 'realtime_end': '2013-02-18',
 'realtime_start': '2012-12-14',
 'value': '231.025'},
 {'date': '2012-11-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-01-19',
 'value': '231.071'},
 {'date': '2012-12-01',
 'realtime_end': '2013-02-18',
 'realtime_start': '2013-01-16',
 'value': '230.979'},
 {'date': '2012-12-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-02-19',
 'value': '231.137'},
 {'date': '2012-12-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-03-19',
 'value': '231.197'},
 {'date': '2013-01-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-02-21',
 'value': '231.198'},
 {'date': '2013-01-01',
 'realtime_end': '-12-31',
 'realtime_start': '2013-03-21',
 'value': '231.222'}]

 maxDate = 2013-03-21

 dobs = dict([(d, []) for d in set([e['date'] for e in obs])])

 for o in obs:
 dobs[o['date']].append(o)

 dobs_subMax = dict([(k, [d for d in v if d['realtime_start'] = maxDate])
 for k, v in dobs.items()])

 rts = lambda x: x['realtime_start']

 mmax = [sorted(e, key=rts)[-1] for e in dobs_subMax.values() if e]

 mmax.sort(key = lambda x: x['date'])

 pprint.pprint(mmax)
 ___
 Tutor maillist - Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor



You can do it with groupby like so:


from itertools import groupby
from operator import itemgetter


maxDate = 2013-03-21
mmax= list()

obs.sort(key=itemgetter('date'))

for k, group in groupby(obs, key=itemgetter('date')):
group = [dob for dob in group if dob['realtime_start'] = maxDate]
if group:
group.sort(key=itemgetter('realtime_start'))
mmax.append(group[-1])

pprint.pprint(mmax)


Note that writing multiply-nested comprehensions like you did results in
very unreadable code. Do you find this code more readable?

 -m


--
Lark's Tongue Guide to Python: http://lightbird.net/larks/

Many a man fails as an original thinker simply because his memory it too
good.  Friedrich Nietzsche

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help with iterators

2013-03-21 Thread Steven D'Aprano

On 22/03/13 11:39, Matthew Johnson wrote:

Dear list,

I have been trying to understand out how to use iterators and in
particular groupby statements.  I am, however, quite lost.


groupby is a very specialist function which is not very intuitive to
use. Sometimes I think that groupby is an excellent solution in search
of a problem.



I wish to subset the below list, selecting the observations that have
an ID ('realtime_start') value that is greater than some date (i've
used the variable name maxDate), and in the case that there is more
than one such record, returning only the one that has the largest ID
('realtime_start').



The code that you show does not so what you describe here. The most
obvious difference is that it doesn't return or display a single record,
but shows multiple records.

In your case, it selects six records, four of which have a realtime_start
that occurs BEFORE the given maxDate.

To solve the problem you describe here, of finding at most a single
record, the solution is much simpler than what you have done. Prepare a
list of observations, sorted by realtime_start. Take the latest such
observation. If the realtime_start is greater than the maxDate, you have
your answer. If not, there is no answer.

The simplest solution is usually the best. The simpler your code, the fewer
bugs it will contain.


obs.sort(key=lambda rec: rec['realtime_start'])
rec = obs[-1]
if rec['realtime_start']  maxDate:
print rec
else:
print no record found


which prints:

{'date': '2013-01-01', 'realtime_start': '2013-03-21', 'realtime_end': 
'-12-31', 'value': '231.222'}




--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help with iterators

2013-03-21 Thread Steven D'Aprano

On 22/03/13 12:39, Mitya Sirenef wrote:


You can do it with groupby like so:


from itertools import groupby
from operator import itemgetter

maxDate = 2013-03-21
mmax= list()

obs.sort(key=itemgetter('date'))

for k, group in groupby(obs, key=itemgetter('date')):
 group = [dob for dob in group if dob['realtime_start'] = maxDate]
 if group:
 group.sort(key=itemgetter('realtime_start'))
 mmax.append(group[-1])

pprint.pprint(mmax)



This suffers from the same problem of finding six records instead of one,
and that four of the six have start dates before the given date instead
of after it.

Here's another solution that finds all the records that start on or after
the given data (the poorly named maxDate) and displays them sorted by
date.


selected = [rec for rec in obs if rec['realtime_start'] = maxDate]
selected.sort(key=lambda rec: rec['date'])
print selected




--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help with iterators

2013-03-21 Thread Mitya Sirenef

On 03/21/2013 10:20 PM, Steven D'Aprano wrote:

On 22/03/13 12:39, Mitya  Sirenef wrote:


 You can do it with groupby like so:


 from itertools import groupby
 from operator import itemgetter

 maxDate = 2013-03-21
 mmax = list()

 obs.sort(key=itemgetter('date'))

 for k, group in groupby(obs, key=itemgetter('date')):
 group = [dob for dob in group if dob['realtime_start'] = maxDate]
 if group:
 group.sort(key=itemgetter('realtime_start'))
 mmax.append(group[-1])

 pprint.pprint(mmax)


 This suffers from the same problem of finding six records instead of one,
 and that four of the six have start dates before the given date instead
 of after it.


OP said his code produces the needed result and I think his description
probably doesn't match what he really intends to do (he also said he
wants the same code rewritten using groupby). I reproduced the logic of
his code... hopefully he can step in and clarify!






 Here's another solution that finds all the records that start on or after
 the given data (the poorly named maxDate) and displays them sorted by
 date.


 selected = [rec for rec in obs if rec['realtime_start'] = maxDate]
 selected.sort(key=lambda rec: rec['date'])
 print selected






--
Lark's Tongue Guide to Python: http://lightbird.net/larks/

A little bad taste is like a nice dash of paprika.
Dorothy Parker

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor