Re: [Tutor] Help with iterators

2013-03-27 Thread Matthew Johnson
Dear list,

Sorry for the delay -- it has taken some time for me to get these emails.

It appears i made some dumb error when typing out the description.

Mitya Sirenef was correct to ignore my words and to focus on my code.

Thanks for your help. I may ask again / for more help when i feel i
have tried sufficiently hard to absorb the answers below.

Thanks again

mj

On 22/03/2013, at 6:24 PM, "tutor-requ...@python.org"
 wrote:

> Send Tutor mailing list submissions to
>tutor@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>http://mail.python.org/mailman/listinfo/tutor
> or, via email, send a message with subject or body 'help' to
>tutor-requ...@python.org
>
> You can reach the person managing the list at
>tutor-ow...@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Tutor digest..."
>
>
> Today's Topics:
>
>   1. Re: Help with iterators (Mitya Sirenef)
>   2. Re: Help with iterators (Steven D'Aprano)
>   3. Re: Help with iterators (Steven D'Aprano)
>   4. Re: Help with iterators (Mitya Sirenef)
>   5. Please Help (Arijit Ukil)
>
>
> --
>
> Message: 1
> Date: Thu, 21 Mar 2013 21:39:12 -0400
> From: Mitya Sirenef 
> To: tutor@python.org
> Subject: Re: [Tutor] Help with iterators
> Message-ID: <514bb640.5050...@lightbird.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 03/21/2013 08:39 PM, Matthew Johnson wrote:
>> Dear list,
>>
>> I have been trying to understand out how to use iterators and in
>> particular groupby statements. I am, however, quite lost.
>>
>> I wish to subset the below list, selecting the observations that have
>> an ID ('realtime_start') value that is greater than some date (i've
>> used the variable name maxDate), and in the case that there is more
>> than one such record, returning only the one that has the largest ID
>> ('realtime_start').
>>
>> The code below does the job, however i have the impression that it
>> might be done in a more python way using iterators and groupby
>> statements.
>>
>> could someone please help me understand how to go from this code to
>> the pythonic idiom?
>>
>> thanks in advance,
>>
>> Matt Johnson
>>
>> _
>>
>> ## Code example
>>
>> import pprint
>>
>> obs = [{'date': '2012-09-01',
>> 'realtime_end': '2013-02-18',
>> 'realtime_start': '2012-10-15',
>> 'value': '231.951'},
>> {'date': '2012-09-01',
>> 'realtime_end': '2013-02-18',
>> 'realtime_start': '2012-11-15',
>> 'value': '231.881'},
>> {'date': '2012-10-01',
>> 'realtime_end': '2013-02-18',
>> 'realtime_start': '2012-11-15',
>> 'value': '231.751'},
>> {'date': '2012-10-01',
>> 'realtime_end': '-12-31',
>> 'realtime_start': '2012-12-19',
>> 'value': '231.623'},
>> {'date': '2013-02-01',
>> 'realtime_end': '-12-31',
>> 'realtime_start': '2013-03-21',
>> 'value': '231.157'},
>> {'date': '2012-11-01',
>> 'realtime_end': '2013-02-18',
>> 'realtime_start': '2012-12-14',
>> 'value': '231.025'},
>> {'date': '2012-11-01',
>> 'realtime_end': '-12-31',
>> 'realtime_start': '2013-01-19',
>> 'value': '231.071'},
>> {'date': '2012-12-01',
>> 'realtime_end': '2013-02-18',
>> 'realtime_start': '2013-01-16',
>> 'value': '230.979'},
>> {'date': '2012-12-01',
>> 'realtime_end': '-12-31',
>> 'realtime_start': '2013-02-19',
>> 'value': '231.137'},
>> {'date': '2012-12-01',
>> 'realtime_end': '-12-31',
>> 'realtime_start': '2013-03-19',
>> 'value': '231.197'},
>> {'date': '2013-01-01',
>> 'r

Re: [Tutor] Help with iterators

2013-03-21 Thread Mitya Sirenef

On 03/21/2013 10:20 PM, Steven D'Aprano wrote:

On 22/03/13 12:39, Mitya  Sirenef wrote:

>
>> You can do it with groupby like so:
>>
>>
>> from itertools import groupby
>> from operator import itemgetter
>>
>> maxDate = "2013-03-21"
>> mmax = list()
>>
>> obs.sort(key=itemgetter('date'))
>>
>> for k, group in groupby(obs, key=itemgetter('date')):
>> group = [dob for dob in group if dob['realtime_start'] <= maxDate]
>> if group:
>> group.sort(key=itemgetter('realtime_start'))
>> mmax.append(group[-1])
>>
>> pprint.pprint(mmax)
>
>
> This suffers from the same problem of finding six records instead of one,
> and that four of the six have start dates before the given date instead
> of after it.


OP said his code produces the needed result and I think his description
probably doesn't match what he really intends to do (he also said he
wants the same code rewritten using groupby). I reproduced the logic of
his code... hopefully he can step in and clarify!






> Here's another solution that finds all the records that start on or after
> the given data (the poorly named "maxDate") and displays them sorted by
> date.
>
>
> selected = [rec for rec in obs if rec['realtime_start'] >= maxDate]
> selected.sort(key=lambda rec: rec['date'])
> print selected
>
>
>
>


--
Lark's Tongue Guide to Python: http://lightbird.net/larks/

A little bad taste is like a nice dash of paprika.
Dorothy Parker

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help with iterators

2013-03-21 Thread Steven D'Aprano

On 22/03/13 12:39, Mitya Sirenef wrote:


You can do it with groupby like so:


from itertools import groupby
from operator import itemgetter

maxDate = "2013-03-21"
mmax= list()

obs.sort(key=itemgetter('date'))

for k, group in groupby(obs, key=itemgetter('date')):
 group = [dob for dob in group if dob['realtime_start'] <= maxDate]
 if group:
 group.sort(key=itemgetter('realtime_start'))
 mmax.append(group[-1])

pprint.pprint(mmax)



This suffers from the same problem of finding six records instead of one,
and that four of the six have start dates before the given date instead
of after it.

Here's another solution that finds all the records that start on or after
the given data (the poorly named "maxDate") and displays them sorted by
date.


selected = [rec for rec in obs if rec['realtime_start'] >= maxDate]
selected.sort(key=lambda rec: rec['date'])
print selected




--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help with iterators

2013-03-21 Thread Steven D'Aprano

On 22/03/13 11:39, Matthew Johnson wrote:

Dear list,

I have been trying to understand out how to use iterators and in
particular groupby statements.  I am, however, quite lost.


groupby is a very specialist function which is not very intuitive to
use. Sometimes I think that groupby is an excellent solution in search
of a problem.



I wish to subset the below list, selecting the observations that have
an ID ('realtime_start') value that is greater than some date (i've
used the variable name maxDate), and in the case that there is more
than one such record, returning only the one that has the largest ID
('realtime_start').



The code that you show does not so what you describe here. The most
obvious difference is that it doesn't return or display a single record,
but shows multiple records.

In your case, it selects six records, four of which have a realtime_start
that occurs BEFORE the given maxDate.

To solve the problem you describe here, of finding at most a single
record, the solution is much simpler than what you have done. Prepare a
list of observations, sorted by realtime_start. Take the latest such
observation. If the realtime_start is greater than the maxDate, you have
your answer. If not, there is no answer.

The simplest solution is usually the best. The simpler your code, the fewer
bugs it will contain.


obs.sort(key=lambda rec: rec['realtime_start'])
rec = obs[-1]
if rec['realtime_start'] > maxDate:
print rec
else:
print "no record found"


which prints:

{'date': '2013-01-01', 'realtime_start': '2013-03-21', 'realtime_end': 
'-12-31', 'value': '231.222'}




--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help with iterators

2013-03-21 Thread Mitya Sirenef

On 03/21/2013 08:39 PM, Matthew Johnson wrote:

Dear list,

>
> I have been trying to understand out how to use iterators and in
> particular groupby statements. I am, however, quite lost.
>
> I wish to subset the below list, selecting the observations that have
> an ID ('realtime_start') value that is greater than some date (i've
> used the variable name maxDate), and in the case that there is more
> than one such record, returning only the one that has the largest ID
> ('realtime_start').
>
> The code below does the job, however i have the impression that it
> might be done in a more python way using iterators and groupby
> statements.
>
> could someone please help me understand how to go from this code to
> the pythonic idiom?
>
> thanks in advance,
>
> Matt Johnson
>
> _
>
> ## Code example
>
> import pprint
>
> obs = [{'date': '2012-09-01',
> 'realtime_end': '2013-02-18',
> 'realtime_start': '2012-10-15',
> 'value': '231.951'},
> {'date': '2012-09-01',
> 'realtime_end': '2013-02-18',
> 'realtime_start': '2012-11-15',
> 'value': '231.881'},
> {'date': '2012-10-01',
> 'realtime_end': '2013-02-18',
> 'realtime_start': '2012-11-15',
> 'value': '231.751'},
> {'date': '2012-10-01',
> 'realtime_end': '-12-31',
> 'realtime_start': '2012-12-19',
> 'value': '231.623'},
> {'date': '2013-02-01',
> 'realtime_end': '-12-31',
> 'realtime_start': '2013-03-21',
> 'value': '231.157'},
> {'date': '2012-11-01',
> 'realtime_end': '2013-02-18',
> 'realtime_start': '2012-12-14',
> 'value': '231.025'},
> {'date': '2012-11-01',
> 'realtime_end': '-12-31',
> 'realtime_start': '2013-01-19',
> 'value': '231.071'},
> {'date': '2012-12-01',
> 'realtime_end': '2013-02-18',
> 'realtime_start': '2013-01-16',
> 'value': '230.979'},
> {'date': '2012-12-01',
> 'realtime_end': '-12-31',
> 'realtime_start': '2013-02-19',
> 'value': '231.137'},
> {'date': '2012-12-01',
> 'realtime_end': '-12-31',
> 'realtime_start': '2013-03-19',
> 'value': '231.197'},
> {'date': '2013-01-01',
> 'realtime_end': '-12-31',
> 'realtime_start': '2013-02-21',
> 'value': '231.198'},
> {'date': '2013-01-01',
> 'realtime_end': '-12-31',
> 'realtime_start': '2013-03-21',
> 'value': '231.222'}]
>
> maxDate = "2013-03-21"
>
> dobs = dict([(d, []) for d in set([e['date'] for e in obs])])
>
> for o in obs:
> dobs[o['date']].append(o)
>
> dobs_subMax = dict([(k, [d for d in v if d['realtime_start'] <= maxDate])
> for k, v in dobs.items()])
>
> rts = lambda x: x['realtime_start']
>
> mmax = [sorted(e, key=rts)[-1] for e in dobs_subMax.values() if e]
>
> mmax.sort(key = lambda x: x['date'])
>
> pprint.pprint(mmax)
> ___
> Tutor maillist - Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>


You can do it with groupby like so:


from itertools import groupby
from operator import itemgetter


maxDate = "2013-03-21"
mmax= list()

obs.sort(key=itemgetter('date'))

for k, group in groupby(obs, key=itemgetter('date')):
group = [dob for dob in group if dob['realtime_start'] <= maxDate]
if group:
group.sort(key=itemgetter('realtime_start'))
mmax.append(group[-1])

pprint.pprint(mmax)


Note that writing multiply-nested comprehensions like you did results in
very unreadable code. Do you find this code more readable?

 -m


--
Lark's Tongue Guide to Python: http://lightbird.net/larks/

Many a man fails as an original thinker simply because his memory it too
good.  Friedrich Nietzsche

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Help with iterators

2013-03-21 Thread Matthew Johnson
Dear list,

I have been trying to understand out how to use iterators and in
particular groupby statements.  I am, however, quite lost.

I wish to subset the below list, selecting the observations that have
an ID ('realtime_start') value that is greater than some date (i've
used the variable name maxDate), and in the case that there is more
than one such record, returning only the one that has the largest ID
('realtime_start').

The code below does the job, however i have the impression that it
might be done in a more python way using iterators and groupby
statements.

could someone please help me understand how to go from this code to
the pythonic idiom?

thanks in advance,

Matt Johnson

_

## Code example

import pprint

obs = [{'date': '2012-09-01',
  'realtime_end': '2013-02-18',
  'realtime_start': '2012-10-15',
  'value': '231.951'},
 {'date': '2012-09-01',
  'realtime_end': '2013-02-18',
  'realtime_start': '2012-11-15',
  'value': '231.881'},
 {'date': '2012-10-01',
  'realtime_end': '2013-02-18',
  'realtime_start': '2012-11-15',
  'value': '231.751'},
 {'date': '2012-10-01',
  'realtime_end': '-12-31',
  'realtime_start': '2012-12-19',
  'value': '231.623'},
 {'date': '2013-02-01',
  'realtime_end': '-12-31',
  'realtime_start': '2013-03-21',
  'value': '231.157'},
 {'date': '2012-11-01',
  'realtime_end': '2013-02-18',
  'realtime_start': '2012-12-14',
  'value': '231.025'},
 {'date': '2012-11-01',
  'realtime_end': '-12-31',
  'realtime_start': '2013-01-19',
  'value': '231.071'},
 {'date': '2012-12-01',
  'realtime_end': '2013-02-18',
  'realtime_start': '2013-01-16',
  'value': '230.979'},
 {'date': '2012-12-01',
  'realtime_end': '-12-31',
  'realtime_start': '2013-02-19',
  'value': '231.137'},
 {'date': '2012-12-01',
  'realtime_end': '-12-31',
  'realtime_start': '2013-03-19',
  'value': '231.197'},
 {'date': '2013-01-01',
  'realtime_end': '-12-31',
  'realtime_start': '2013-02-21',
  'value': '231.198'},
 {'date': '2013-01-01',
  'realtime_end': '-12-31',
  'realtime_start': '2013-03-21',
  'value': '231.222'}]

maxDate = "2013-03-21"

dobs = dict([(d, []) for d in set([e['date'] for e in obs])])

for o in obs:
dobs[o['date']].append(o)

dobs_subMax = dict([(k, [d for d in v if d['realtime_start'] <= maxDate])
for k, v in dobs.items()])

rts = lambda x: x['realtime_start']

mmax = [sorted(e, key=rts)[-1] for e in dobs_subMax.values() if e]

mmax.sort(key = lambda x: x['date'])

pprint.pprint(mmax)
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor