Re: [Tutor] Help with iterators

Matthew Johnson Wed, 27 Mar 2013 23:31:49 -0700

Dear list,

Sorry for the delay -- it has taken some time for me to get these emails.


It appears i made some dumb error when typing out the description.

Mitya Sirenef was correct to ignore my words and to focus on my code.

Thanks for your help. I may ask again / for more help when i feel i
have tried sufficiently hard to absorb the answers below.

Thanks again

mj

On 22/03/2013, at 6:24 PM, "tutor-requ...@python.org"
<tutor-requ...@python.org> wrote:

> Send Tutor mailing list submissions to
>    tutor@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>    http://mail.python.org/mailman/listinfo/tutor
> or, via email, send a message with subject or body 'help' to
>    tutor-requ...@python.org
>
> You can reach the person managing the list at
>    tutor-ow...@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Tutor digest..."
>
>
> Today's Topics:
>
>   1. Re: Help with iterators (Mitya Sirenef)
>   2. Re: Help with iterators (Steven D'Aprano)
>   3. Re: Help with iterators (Steven D'Aprano)
>   4. Re: Help with iterators (Mitya Sirenef)
>   5. Please Help (Arijit Ukil)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 21 Mar 2013 21:39:12 -0400
> From: Mitya Sirenef <msire...@lightbird.net>
> To: tutor@python.org
> Subject: Re: [Tutor] Help with iterators
> Message-ID: <514bb640.5050...@lightbird.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 03/21/2013 08:39 PM, Matthew Johnson wrote:
>> Dear list,
>>
>> I have been trying to understand out how to use iterators and in
>> particular groupby statements. I am, however, quite lost.
>>
>> I wish to subset the below list, selecting the observations that have
>> an ID ('realtime_start') value that is greater than some date (i've
>> used the variable name maxDate), and in the case that there is more
>> than one such record, returning only the one that has the largest ID
>> ('realtime_start').
>>
>> The code below does the job, however i have the impression that it
>> might be done in a more python way using iterators and groupby
>> statements.
>>
>> could someone please help me understand how to go from this code to
>> the pythonic idiom?
>>
>> thanks in advance,
>>
>> Matt Johnson
>>
>> _________________
>>
>> ## Code example
>>
>> import pprint
>>
>> obs = [{'date': '2012-09-01',
>> 'realtime_end': '2013-02-18',
>> 'realtime_start': '2012-10-15',
>> 'value': '231.951'},
>> {'date': '2012-09-01',
>> 'realtime_end': '2013-02-18',
>> 'realtime_start': '2012-11-15',
>> 'value': '231.881'},
>> {'date': '2012-10-01',
>> 'realtime_end': '2013-02-18',
>> 'realtime_start': '2012-11-15',
>> 'value': '231.751'},
>> {'date': '2012-10-01',
>> 'realtime_end': '9999-12-31',
>> 'realtime_start': '2012-12-19',
>> 'value': '231.623'},
>> {'date': '2013-02-01',
>> 'realtime_end': '9999-12-31',
>> 'realtime_start': '2013-03-21',
>> 'value': '231.157'},
>> {'date': '2012-11-01',
>> 'realtime_end': '2013-02-18',
>> 'realtime_start': '2012-12-14',
>> 'value': '231.025'},
>> {'date': '2012-11-01',
>> 'realtime_end': '9999-12-31',
>> 'realtime_start': '2013-01-19',
>> 'value': '231.071'},
>> {'date': '2012-12-01',
>> 'realtime_end': '2013-02-18',
>> 'realtime_start': '2013-01-16',
>> 'value': '230.979'},
>> {'date': '2012-12-01',
>> 'realtime_end': '9999-12-31',
>> 'realtime_start': '2013-02-19',
>> 'value': '231.137'},
>> {'date': '2012-12-01',
>> 'realtime_end': '9999-12-31',
>> 'realtime_start': '2013-03-19',
>> 'value': '231.197'},
>> {'date': '2013-01-01',
>> 'realtime_end': '9999-12-31',
>> 'realtime_start': '2013-02-21',
>> 'value': '231.198'},
>> {'date': '2013-01-01',
>> 'realtime_end': '9999-12-31',
>> 'realtime_start': '2013-03-21',
>> 'value': '231.222'}]
>>
>> maxDate = "2013-03-21"
>>
>> dobs = dict([(d, []) for d in set([e['date'] for e in obs])])
>>
>> for o in obs:
>> dobs[o['date']].append(o)
>>
>> dobs_subMax = dict([(k, [d for d in v if d['realtime_start'] <= maxDate])
>> for k, v in dobs.items()])
>>
>> rts = lambda x: x['realtime_start']
>>
>> mmax = [sorted(e, key=rts)[-1] for e in dobs_subMax.values() if e]
>>
>> mmax.sort(key = lambda x: x['date'])
>>
>> pprint.pprint(mmax)
>> _______________________________________________
>> Tutor maillist - Tutor@python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>
>
> You can do it with groupby like so:
>
>
> from itertools import groupby
> from operator import itemgetter
>
>
> maxDate = "2013-03-21"
> mmax    = list()
>
> obs.sort(key=itemgetter('date'))
>
> for k, group in groupby(obs, key=itemgetter('date')):
>     group = [dob for dob in group if dob['realtime_start'] <= maxDate]
>     if group:
>         group.sort(key=itemgetter('realtime_start'))
>         mmax.append(group[-1])
>
> pprint.pprint(mmax)
>
>
> Note that writing multiply-nested comprehensions like you did results in
> very unreadable code. Do you find this code more readable?
>
>  -m
>
>
> --
> Lark's Tongue Guide to Python: http://lightbird.net/larks/
>
> Many a man fails as an original thinker simply because his memory it too
> good.  Friedrich Nietzsche
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 22 Mar 2013 13:05:38 +1100
> From: Steven D'Aprano <st...@pearwood.info>
> To: tutor@python.org
> Subject: Re: [Tutor] Help with iterators
> Message-ID: <514bbc72.8040...@pearwood.info>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 22/03/13 11:39, Matthew Johnson wrote:
>> Dear list,
>>
>> I have been trying to understand out how to use iterators and in
>> particular groupby statements.  I am, however, quite lost.
>
> groupby is a very specialist function which is not very intuitive to
> use. Sometimes I think that groupby is an excellent solution in search
> of a problem.
>
>
>> I wish to subset the below list, selecting the observations that have
>> an ID ('realtime_start') value that is greater than some date (i've
>> used the variable name maxDate), and in the case that there is more
>> than one such record, returning only the one that has the largest ID
>> ('realtime_start').
>
>
> The code that you show does not so what you describe here. The most
> obvious difference is that it doesn't return or display a single record,
> but shows multiple records.
>
> In your case, it selects six records, four of which have a realtime_start
> that occurs BEFORE the given maxDate.
>
> To solve the problem you describe here, of finding at most a single
> record, the solution is much simpler than what you have done. Prepare a
> list of observations, sorted by realtime_start. Take the latest such
> observation. If the realtime_start is greater than the maxDate, you have
> your answer. If not, there is no answer.
>
> The simplest solution is usually the best. The simpler your code, the fewer
> bugs it will contain.
>
>
> obs.sort(key=lambda rec: rec['realtime_start'])
> rec = obs[-1]
> if rec['realtime_start'] > maxDate:
>     print rec
> else:
>     print "no record found"
>
>
> which prints:
>
> {'date': '2013-01-01', 'realtime_start': '2013-03-21', 'realtime_end': 
> '9999-12-31', 'value': '231.222'}
>
>
>
>
> --
> Steven
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 22 Mar 2013 13:20:09 +1100
> From: Steven D'Aprano <st...@pearwood.info>
> To: tutor@python.org
> Subject: Re: [Tutor] Help with iterators
> Message-ID: <514bbfd9.3090...@pearwood.info>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 22/03/13 12:39, Mitya Sirenef wrote:
>
>> You can do it with groupby like so:
>>
>>
>> from itertools import groupby
>> from operator import itemgetter
>>
>> maxDate = "2013-03-21"
>> mmax    = list()
>>
>> obs.sort(key=itemgetter('date'))
>>
>> for k, group in groupby(obs, key=itemgetter('date')):
>>     group = [dob for dob in group if dob['realtime_start'] <= maxDate]
>>     if group:
>>         group.sort(key=itemgetter('realtime_start'))
>>         mmax.append(group[-1])
>>
>> pprint.pprint(mmax)
>
>
> This suffers from the same problem of finding six records instead of one,
> and that four of the six have start dates before the given date instead
> of after it.
>
> Here's another solution that finds all the records that start on or after
> the given data (the poorly named "maxDate") and displays them sorted by
> date.
>
>
> selected = [rec for rec in obs if rec['realtime_start'] >= maxDate]
> selected.sort(key=lambda rec: rec['date'])
> print selected
>
>
>
>
> --
> Steven
>
>
> ------------------------------
>
> Message: 4
> Date: Thu, 21 Mar 2013 22:31:53 -0400
> From: Mitya Sirenef <msire...@lightbird.net>
> To: tutor@python.org
> Subject: Re: [Tutor] Help with iterators
> Message-ID: <514bc299.1070...@lightbird.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 03/21/2013 10:20 PM, Steven D'Aprano wrote:
>> On 22/03/13 12:39, Mitya  Sirenef wrote:
>>
>>> You can do it with groupby like so:
>>>
>>>
>>> from itertools import groupby
>>> from operator import itemgetter
>>>
>>> maxDate = "2013-03-21"
>>> mmax = list()
>>>
>>> obs.sort(key=itemgetter('date'))
>>>
>>> for k, group in groupby(obs, key=itemgetter('date')):
>>> group = [dob for dob in group if dob['realtime_start'] <= maxDate]
>>> if group:
>>> group.sort(key=itemgetter('realtime_start'))
>>> mmax.append(group[-1])
>>>
>>> pprint.pprint(mmax)
>>
>>
>> This suffers from the same problem of finding six records instead of one,
>> and that four of the six have start dates before the given date instead
>> of after it.
>
>
> OP said his code produces the needed result and I think his description
> probably doesn't match what he really intends to do (he also said he
> wants the same code rewritten using groupby). I reproduced the logic of
> his code... hopefully he can step in and clarify!
>
>
>
>>
>> Here's another solution that finds all the records that start on or after
>> the given data (the poorly named "maxDate") and displays them sorted by
>> date.
>>
>>
>> selected = [rec for rec in obs if rec['realtime_start'] >= maxDate]
>> selected.sort(key=lambda rec: rec['date'])
>> print selected
>
>
> --
> Lark's Tongue Guide to Python: http://lightbird.net/larks/
>
> A little bad taste is like a nice dash of paprika.
> Dorothy Parker
>
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 22 Mar 2013 12:54:01 +0530
> From: Arijit Ukil <arijit.u...@tcs.com>
> To: tutor@python.org
> Subject: [Tutor] Please Help
> Message-ID:
>    <of6fbda2e4.aea5d238-on65257b36.00282978-65257b36.0028a...@tcs.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
> I have another small problem. Pls help.
>
> I have written the following code:
>
> f = open ("digi_2.txt", "r+")
> lines = f.readlines()
> for line in lines:
>    number_list = []
>    for number in line.split(','):
>        number_list.append(float(number))
>
> s_data = []
> for i in range(len(number_list)):
>    if number_list[i] > 5:
>        s_data = number_list[i]
>
> print 'Data val:', s_data
>
>
> The problem is: it is printing only the last value, not all the values. In
> this case '10', not '9,8,6,10'.
>
>
>
> Regards,
> Arijit Ukil
> Tata Consultancy Services
> Mailto: arijit.u...@tcs.com
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty.   IT Services
>                        Business Solutions
>                        Outsourcing
> ____________________________________________
>
>
>
> From:
> Amit Saha <amitsaha...@gmail.com>
> To:
> Arijit Ukil <arijit.u...@tcs.com>
> Cc:
> tutor@python.org
> Date:
> 03/21/2013 05:30 PM
> Subject:
> Re: [Tutor] Please Help
>
>
>
> Hi Arijit,
>
> On Thu, Mar 21, 2013 at 8:42 PM, Arijit Ukil <arijit.u...@tcs.com> wrote:
>>
>> I am new to python. I like to calculate average of the numbers by
> reading
>> the file 'digi_2.txt'. I have written the following code:
>>
>> def average(s): return sum(s) * 1.0 / len(s)
>>
>> f = open ("digi_2.txt", "r+")
>>
>> list_of_lists1 = f.readlines()
>>
>>
>> for index in range(len(list_of_lists1)):
>>
>>
>>    tt = list_of_lists1[index]
>>
>>    print 'Current value :', tt
>>
>> avg =average (tt)
>>
>>
>> This gives an error:
>>
>> def average(s): return sum(s) * 1.0 / len(s)
>> TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>
>> I also attach the file i am reading.
>>
>>
>>
>> Please help to rectify.
>
> The main issue here is that when you are reading from a file, to
> Python, its all strings. And although, 'abc' + 'def' is valid, 'abc' +
> 5 isn't (for example). Hence, besides the fact that your average
> calculation is not right, you will have to 'convert' the string to an
> integer/float to do any arithmetic operation on them. (If you know C,
> this is similar to typecasting). So, coming back to your program, I
> will first demonstrate you a few things and then you can write the
> program yourself.
>
> If you were to break down this program into simple steps, they would be:
>
> 1. Read the lines from a file (Assume a generic case, where you have
> more than one line in the file, and you have to calculate the average
> for each such row)
> 2. Create a list of floating point numbers for each of those lines
> 3. And call your average function on each of these lists
>
> You could of course do 2 & 3 together, so you create the list and call
> the average function.
>
> So, here is step 1:
>
> with open('digi.txt','r') as f:
>    lines = f.readlines()
>
> Please refer to
> http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects
> for an explanation of the advantage of using 'with'.
>
> Now, you have *all* the lines of the file in 'lines'. Now, you want to
> perform step 2 for each line in this file. Here you go:
>
> for line in lines:
>    number_list = []
>    for number in line.split(','):
>        number_list.append(float(number))
>
> (To learn more about Python lists, see
> http://effbot.org/zone/python-list.htm). It is certainly possible to
> use the index of an element to access elements from a list, but this
> is more Pythonic way of doing it. To understand this better, in the
> variable 'line', you will have a list of numbers on a single line. For
> example: 1350696461, 448.0, 538660.0, 1350696466, 448.0. Note how they
> are separated by a ',' ? To get each element, we use the split( )
> function, which returns a list of the individual numbers. (See:
> http://docs.python.org/2/library/stdtypes.html#str.split). And then,
> we use the .append() method to create the list. Now, you have a
> number_list which is a list of floating point numbers for each line.
>
> Now, step 2 & 3 combined:
>
> for line in lines:
>    number_list = []
>    for number in line.split(','):
>        number_list.append(float(number))
>    print average(number_list)
>
> Where average( ) is defined as:
>
> def average(num_list):
>    return sum(num_list)/len(num_list)
>
>
>
> There may be a number of unknown things I may have talked about, but i
> hope the links will help you learn more and write your program now.
>
> Good Luck.
> -Amit.
>
>
> --
> http://amitsaha.github.com/
>
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> <http://mail.python.org/pipermail/tutor/attachments/20130322/bcf827ca/attachment.html>
> -------------- next part --------------
> An embedded and charset-unspecified text was scrubbed...
> Name: digi_2.txt
> URL: 
> <http://mail.python.org/pipermail/tutor/attachments/20130322/bcf827ca/attachment.txt>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
> ------------------------------
>
> End of Tutor Digest, Vol 109, Issue 75
> **************************************
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Help with iterators

Reply via email to