Dear list, Sorry for the delay -- it has taken some time for me to get these emails.
It appears i made some dumb error when typing out the description. Mitya Sirenef was correct to ignore my words and to focus on my code. Thanks for your help. I may ask again / for more help when i feel i have tried sufficiently hard to absorb the answers below. Thanks again mj On 22/03/2013, at 6:24 PM, "tutor-requ...@python.org" <tutor-requ...@python.org> wrote: > Send Tutor mailing list submissions to > tutor@python.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.python.org/mailman/listinfo/tutor > or, via email, send a message with subject or body 'help' to > tutor-requ...@python.org > > You can reach the person managing the list at > tutor-ow...@python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Tutor digest..." > > > Today's Topics: > > 1. Re: Help with iterators (Mitya Sirenef) > 2. Re: Help with iterators (Steven D'Aprano) > 3. Re: Help with iterators (Steven D'Aprano) > 4. Re: Help with iterators (Mitya Sirenef) > 5. Please Help (Arijit Ukil) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Mar 2013 21:39:12 -0400 > From: Mitya Sirenef <msire...@lightbird.net> > To: tutor@python.org > Subject: Re: [Tutor] Help with iterators > Message-ID: <514bb640.5050...@lightbird.net> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On 03/21/2013 08:39 PM, Matthew Johnson wrote: >> Dear list, >> >> I have been trying to understand out how to use iterators and in >> particular groupby statements. I am, however, quite lost. >> >> I wish to subset the below list, selecting the observations that have >> an ID ('realtime_start') value that is greater than some date (i've >> used the variable name maxDate), and in the case that there is more >> than one such record, returning only the one that has the largest ID >> ('realtime_start'). >> >> The code below does the job, however i have the impression that it >> might be done in a more python way using iterators and groupby >> statements. >> >> could someone please help me understand how to go from this code to >> the pythonic idiom? >> >> thanks in advance, >> >> Matt Johnson >> >> _________________ >> >> ## Code example >> >> import pprint >> >> obs = [{'date': '2012-09-01', >> 'realtime_end': '2013-02-18', >> 'realtime_start': '2012-10-15', >> 'value': '231.951'}, >> {'date': '2012-09-01', >> 'realtime_end': '2013-02-18', >> 'realtime_start': '2012-11-15', >> 'value': '231.881'}, >> {'date': '2012-10-01', >> 'realtime_end': '2013-02-18', >> 'realtime_start': '2012-11-15', >> 'value': '231.751'}, >> {'date': '2012-10-01', >> 'realtime_end': '9999-12-31', >> 'realtime_start': '2012-12-19', >> 'value': '231.623'}, >> {'date': '2013-02-01', >> 'realtime_end': '9999-12-31', >> 'realtime_start': '2013-03-21', >> 'value': '231.157'}, >> {'date': '2012-11-01', >> 'realtime_end': '2013-02-18', >> 'realtime_start': '2012-12-14', >> 'value': '231.025'}, >> {'date': '2012-11-01', >> 'realtime_end': '9999-12-31', >> 'realtime_start': '2013-01-19', >> 'value': '231.071'}, >> {'date': '2012-12-01', >> 'realtime_end': '2013-02-18', >> 'realtime_start': '2013-01-16', >> 'value': '230.979'}, >> {'date': '2012-12-01', >> 'realtime_end': '9999-12-31', >> 'realtime_start': '2013-02-19', >> 'value': '231.137'}, >> {'date': '2012-12-01', >> 'realtime_end': '9999-12-31', >> 'realtime_start': '2013-03-19', >> 'value': '231.197'}, >> {'date': '2013-01-01', >> 'realtime_end': '9999-12-31', >> 'realtime_start': '2013-02-21', >> 'value': '231.198'}, >> {'date': '2013-01-01', >> 'realtime_end': '9999-12-31', >> 'realtime_start': '2013-03-21', >> 'value': '231.222'}] >> >> maxDate = "2013-03-21" >> >> dobs = dict([(d, []) for d in set([e['date'] for e in obs])]) >> >> for o in obs: >> dobs[o['date']].append(o) >> >> dobs_subMax = dict([(k, [d for d in v if d['realtime_start'] <= maxDate]) >> for k, v in dobs.items()]) >> >> rts = lambda x: x['realtime_start'] >> >> mmax = [sorted(e, key=rts)[-1] for e in dobs_subMax.values() if e] >> >> mmax.sort(key = lambda x: x['date']) >> >> pprint.pprint(mmax) >> _______________________________________________ >> Tutor maillist - Tutor@python.org >> To unsubscribe or change subscription options: >> http://mail.python.org/mailman/listinfo/tutor > > > You can do it with groupby like so: > > > from itertools import groupby > from operator import itemgetter > > > maxDate = "2013-03-21" > mmax = list() > > obs.sort(key=itemgetter('date')) > > for k, group in groupby(obs, key=itemgetter('date')): > group = [dob for dob in group if dob['realtime_start'] <= maxDate] > if group: > group.sort(key=itemgetter('realtime_start')) > mmax.append(group[-1]) > > pprint.pprint(mmax) > > > Note that writing multiply-nested comprehensions like you did results in > very unreadable code. Do you find this code more readable? > > -m > > > -- > Lark's Tongue Guide to Python: http://lightbird.net/larks/ > > Many a man fails as an original thinker simply because his memory it too > good. Friedrich Nietzsche > > > > ------------------------------ > > Message: 2 > Date: Fri, 22 Mar 2013 13:05:38 +1100 > From: Steven D'Aprano <st...@pearwood.info> > To: tutor@python.org > Subject: Re: [Tutor] Help with iterators > Message-ID: <514bbc72.8040...@pearwood.info> > Content-Type: text/plain; charset=UTF-8; format=flowed > > On 22/03/13 11:39, Matthew Johnson wrote: >> Dear list, >> >> I have been trying to understand out how to use iterators and in >> particular groupby statements. I am, however, quite lost. > > groupby is a very specialist function which is not very intuitive to > use. Sometimes I think that groupby is an excellent solution in search > of a problem. > > >> I wish to subset the below list, selecting the observations that have >> an ID ('realtime_start') value that is greater than some date (i've >> used the variable name maxDate), and in the case that there is more >> than one such record, returning only the one that has the largest ID >> ('realtime_start'). > > > The code that you show does not so what you describe here. The most > obvious difference is that it doesn't return or display a single record, > but shows multiple records. > > In your case, it selects six records, four of which have a realtime_start > that occurs BEFORE the given maxDate. > > To solve the problem you describe here, of finding at most a single > record, the solution is much simpler than what you have done. Prepare a > list of observations, sorted by realtime_start. Take the latest such > observation. If the realtime_start is greater than the maxDate, you have > your answer. If not, there is no answer. > > The simplest solution is usually the best. The simpler your code, the fewer > bugs it will contain. > > > obs.sort(key=lambda rec: rec['realtime_start']) > rec = obs[-1] > if rec['realtime_start'] > maxDate: > print rec > else: > print "no record found" > > > which prints: > > {'date': '2013-01-01', 'realtime_start': '2013-03-21', 'realtime_end': > '9999-12-31', 'value': '231.222'} > > > > > -- > Steven > > > ------------------------------ > > Message: 3 > Date: Fri, 22 Mar 2013 13:20:09 +1100 > From: Steven D'Aprano <st...@pearwood.info> > To: tutor@python.org > Subject: Re: [Tutor] Help with iterators > Message-ID: <514bbfd9.3090...@pearwood.info> > Content-Type: text/plain; charset=UTF-8; format=flowed > > On 22/03/13 12:39, Mitya Sirenef wrote: > >> You can do it with groupby like so: >> >> >> from itertools import groupby >> from operator import itemgetter >> >> maxDate = "2013-03-21" >> mmax = list() >> >> obs.sort(key=itemgetter('date')) >> >> for k, group in groupby(obs, key=itemgetter('date')): >> group = [dob for dob in group if dob['realtime_start'] <= maxDate] >> if group: >> group.sort(key=itemgetter('realtime_start')) >> mmax.append(group[-1]) >> >> pprint.pprint(mmax) > > > This suffers from the same problem of finding six records instead of one, > and that four of the six have start dates before the given date instead > of after it. > > Here's another solution that finds all the records that start on or after > the given data (the poorly named "maxDate") and displays them sorted by > date. > > > selected = [rec for rec in obs if rec['realtime_start'] >= maxDate] > selected.sort(key=lambda rec: rec['date']) > print selected > > > > > -- > Steven > > > ------------------------------ > > Message: 4 > Date: Thu, 21 Mar 2013 22:31:53 -0400 > From: Mitya Sirenef <msire...@lightbird.net> > To: tutor@python.org > Subject: Re: [Tutor] Help with iterators > Message-ID: <514bc299.1070...@lightbird.net> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On 03/21/2013 10:20 PM, Steven D'Aprano wrote: >> On 22/03/13 12:39, Mitya Sirenef wrote: >> >>> You can do it with groupby like so: >>> >>> >>> from itertools import groupby >>> from operator import itemgetter >>> >>> maxDate = "2013-03-21" >>> mmax = list() >>> >>> obs.sort(key=itemgetter('date')) >>> >>> for k, group in groupby(obs, key=itemgetter('date')): >>> group = [dob for dob in group if dob['realtime_start'] <= maxDate] >>> if group: >>> group.sort(key=itemgetter('realtime_start')) >>> mmax.append(group[-1]) >>> >>> pprint.pprint(mmax) >> >> >> This suffers from the same problem of finding six records instead of one, >> and that four of the six have start dates before the given date instead >> of after it. > > > OP said his code produces the needed result and I think his description > probably doesn't match what he really intends to do (he also said he > wants the same code rewritten using groupby). I reproduced the logic of > his code... hopefully he can step in and clarify! > > > >> >> Here's another solution that finds all the records that start on or after >> the given data (the poorly named "maxDate") and displays them sorted by >> date. >> >> >> selected = [rec for rec in obs if rec['realtime_start'] >= maxDate] >> selected.sort(key=lambda rec: rec['date']) >> print selected > > > -- > Lark's Tongue Guide to Python: http://lightbird.net/larks/ > > A little bad taste is like a nice dash of paprika. > Dorothy Parker > > > > ------------------------------ > > Message: 5 > Date: Fri, 22 Mar 2013 12:54:01 +0530 > From: Arijit Ukil <arijit.u...@tcs.com> > To: tutor@python.org > Subject: [Tutor] Please Help > Message-ID: > <of6fbda2e4.aea5d238-on65257b36.00282978-65257b36.0028a...@tcs.com> > Content-Type: text/plain; charset="us-ascii" > > Hi, > > I have another small problem. Pls help. > > I have written the following code: > > f = open ("digi_2.txt", "r+") > lines = f.readlines() > for line in lines: > number_list = [] > for number in line.split(','): > number_list.append(float(number)) > > s_data = [] > for i in range(len(number_list)): > if number_list[i] > 5: > s_data = number_list[i] > > print 'Data val:', s_data > > > The problem is: it is printing only the last value, not all the values. In > this case '10', not '9,8,6,10'. > > > > Regards, > Arijit Ukil > Tata Consultancy Services > Mailto: arijit.u...@tcs.com > Website: http://www.tcs.com > ____________________________________________ > Experience certainty. IT Services > Business Solutions > Outsourcing > ____________________________________________ > > > > From: > Amit Saha <amitsaha...@gmail.com> > To: > Arijit Ukil <arijit.u...@tcs.com> > Cc: > tutor@python.org > Date: > 03/21/2013 05:30 PM > Subject: > Re: [Tutor] Please Help > > > > Hi Arijit, > > On Thu, Mar 21, 2013 at 8:42 PM, Arijit Ukil <arijit.u...@tcs.com> wrote: >> >> I am new to python. I like to calculate average of the numbers by > reading >> the file 'digi_2.txt'. I have written the following code: >> >> def average(s): return sum(s) * 1.0 / len(s) >> >> f = open ("digi_2.txt", "r+") >> >> list_of_lists1 = f.readlines() >> >> >> for index in range(len(list_of_lists1)): >> >> >> tt = list_of_lists1[index] >> >> print 'Current value :', tt >> >> avg =average (tt) >> >> >> This gives an error: >> >> def average(s): return sum(s) * 1.0 / len(s) >> TypeError: unsupported operand type(s) for +: 'int' and 'str' >> >> I also attach the file i am reading. >> >> >> >> Please help to rectify. > > The main issue here is that when you are reading from a file, to > Python, its all strings. And although, 'abc' + 'def' is valid, 'abc' + > 5 isn't (for example). Hence, besides the fact that your average > calculation is not right, you will have to 'convert' the string to an > integer/float to do any arithmetic operation on them. (If you know C, > this is similar to typecasting). So, coming back to your program, I > will first demonstrate you a few things and then you can write the > program yourself. > > If you were to break down this program into simple steps, they would be: > > 1. Read the lines from a file (Assume a generic case, where you have > more than one line in the file, and you have to calculate the average > for each such row) > 2. Create a list of floating point numbers for each of those lines > 3. And call your average function on each of these lists > > You could of course do 2 & 3 together, so you create the list and call > the average function. > > So, here is step 1: > > with open('digi.txt','r') as f: > lines = f.readlines() > > Please refer to > http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects > for an explanation of the advantage of using 'with'. > > Now, you have *all* the lines of the file in 'lines'. Now, you want to > perform step 2 for each line in this file. Here you go: > > for line in lines: > number_list = [] > for number in line.split(','): > number_list.append(float(number)) > > (To learn more about Python lists, see > http://effbot.org/zone/python-list.htm). It is certainly possible to > use the index of an element to access elements from a list, but this > is more Pythonic way of doing it. To understand this better, in the > variable 'line', you will have a list of numbers on a single line. For > example: 1350696461, 448.0, 538660.0, 1350696466, 448.0. Note how they > are separated by a ',' ? To get each element, we use the split( ) > function, which returns a list of the individual numbers. (See: > http://docs.python.org/2/library/stdtypes.html#str.split). And then, > we use the .append() method to create the list. Now, you have a > number_list which is a list of floating point numbers for each line. > > Now, step 2 & 3 combined: > > for line in lines: > number_list = [] > for number in line.split(','): > number_list.append(float(number)) > print average(number_list) > > Where average( ) is defined as: > > def average(num_list): > return sum(num_list)/len(num_list) > > > > There may be a number of unknown things I may have talked about, but i > hope the links will help you learn more and write your program now. > > Good Luck. > -Amit. > > > -- > http://amitsaha.github.com/ > > > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <http://mail.python.org/pipermail/tutor/attachments/20130322/bcf827ca/attachment.html> > -------------- next part -------------- > An embedded and charset-unspecified text was scrubbed... > Name: digi_2.txt > URL: > <http://mail.python.org/pipermail/tutor/attachments/20130322/bcf827ca/attachment.txt> > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > > ------------------------------ > > End of Tutor Digest, Vol 109, Issue 75 > ************************************** _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor