Re: sorting tuples...
Steve Holden wrote: Dan Sommers wrote: On 27 Sep 2005 19:01:38 -0700, [EMAIL PROTECTED] wrote: with the binary stuff out of the way, what i have is this string data: 20050922 # date line mike mike's message... 20040825 # date line jeremy jeremy's message... ... what i want to do is to use the date line as the first data in a tuple and the succeeding lines goes into the tuple, like: (20050922, mike, mike's message) then when it matches another date line it makes another new tuple with that date line as the header data and the succeeding data, etc.. (20050922, mike, mike's message) (20040825, jeremy, jeremy's message) ... then i would sort the tuples according to the date. is there an easier/proper way of doing this without generating alot of tuples? You want a dictionary. Python dictionaries map keys to values (in other languages, these data structures are known as hashes, maps, or associative arrays). The keys will be the dates; the values will depend on whether or not you have multiple messages for one date. If the dates are unique (which, looking at your data, is probably not true), then each item in the dictionary can be just one (who, message) tuple. If the dates are not unique, then you'll have to manage each item of the dictionary as a list of (who, message) tuples. And before you ask: no, dictionaries are *not* sorted; you'll have to sort a separate list of the keys or the items at the appropriate time. I'm not sure this advice is entirely helpful, since it introduces complexities not really required by the simplistic tuple notation the OP seems to be struggling for. Following the old adage First, make it work; then (if it doesn't work fast enough) make it faster), and making the *dangerous* assumption that each message genuinely is exactly three lines, we might write: msglist = [] f = open(theDataFile.txt, r) for date in f: who = f.next() # pulls a line from the file msg = f.next() # pulls a line from the file msglist,append((date, who, msg)) # now have list of messages as tuples msglist.sort() After this, msglist should be date-sorted list of messages. Though who knows what needs to happen to them next ... just to spit it all out to stdout in a nice formatted form so I can save it to a file. I'm still confused though, but I'm working on it. struct is nice. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.pycon.org -- http://mail.python.org/mailman/listinfo/python-list
Re: sorting tuples...
Dan Sommers wrote: On 27 Sep 2005 19:01:38 -0700, [EMAIL PROTECTED] wrote: with the binary stuff out of the way, what i have is this string data: 20050922 # date line mike mike's message... 20040825 # date line jeremy jeremy's message... ... what i want to do is to use the date line as the first data in a tuple and the succeeding lines goes into the tuple, like: (20050922, mike, mike's message) then when it matches another date line it makes another new tuple with that date line as the header data and the succeeding data, etc.. (20050922, mike, mike's message) (20040825, jeremy, jeremy's message) ... then i would sort the tuples according to the date. is there an easier/proper way of doing this without generating alot of tuples? You want a dictionary. Python dictionaries map keys to values (in other languages, these data structures are known as hashes, maps, or associative arrays). The keys will be the dates; the values will depend on whether or not you have multiple messages for one date. If the dates are unique (which, looking at your data, is probably not true), then each item in the dictionary can be just one (who, message) tuple. If the dates are not unique, then you'll have to manage each item of the dictionary as a list of (who, message) tuples. And before you ask: no, dictionaries are *not* sorted; you'll have to sort a separate list of the keys or the items at the appropriate time. I'm not sure this advice is entirely helpful, since it introduces complexities not really required by the simplistic tuple notation the OP seems to be struggling for. Following the old adage First, make it work; then (if it doesn't work fast enough) make it faster), and making the *dangerous* assumption that each message genuinely is exactly three lines, we might write: msglist = [] f = open(theDataFile.txt, r) for date in f: who = f.next() # pulls a line from the file msg = f.next() # pulls a line from the file msglist,append((date, who, msg)) # now have list of messages as tuples msglist.sort() After this, msglist should be date-sorted list of messages. Though who knows what needs to happen to them next ... regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.pycon.org -- http://mail.python.org/mailman/listinfo/python-list
Re: sorting tuples...
Magnus Lycka wrote: Why? It seems you are trying to use a string as some kind of container, and Python has those in the box. Just use a list of tuples, rather than a list of strings. That will work fine for .sort(), and it's much more convenient to access your data. Using the typical tool for extracting binary data from files/strings will give you tuples by default. my problem with tuples lists is that i don't know how to assign data to them properly. i'm quite new in python ;) with the binary stuff out of the way, what i have is this string data: 20050922 # date line mike mike's message... 20040825 # date line jeremy jeremy's message... ... what i want to do is to use the date line as the first data in a tuple and the succeeding lines goes into the tuple, like: (20050922, mike, mike's message) then when it matches another date line it makes another new tuple with that date line as the header data and the succeeding data, etc.. (20050922, mike, mike's message) (20040825, jeremy, jeremy's message) ... then i would sort the tuples according to the date. is there an easier/proper way of doing this without generating alot of tuples? thanks! for the help :) -- http://mail.python.org/mailman/listinfo/python-list
Re: sorting tuples...
On 27 Sep 2005 19:01:38 -0700, [EMAIL PROTECTED] wrote: with the binary stuff out of the way, what i have is this string data: 20050922 # date line mike mike's message... 20040825 # date line jeremy jeremy's message... ... what i want to do is to use the date line as the first data in a tuple and the succeeding lines goes into the tuple, like: (20050922, mike, mike's message) then when it matches another date line it makes another new tuple with that date line as the header data and the succeeding data, etc.. (20050922, mike, mike's message) (20040825, jeremy, jeremy's message) ... then i would sort the tuples according to the date. is there an easier/proper way of doing this without generating alot of tuples? You want a dictionary. Python dictionaries map keys to values (in other languages, these data structures are known as hashes, maps, or associative arrays). The keys will be the dates; the values will depend on whether or not you have multiple messages for one date. If the dates are unique (which, looking at your data, is probably not true), then each item in the dictionary can be just one (who, message) tuple. If the dates are not unique, then you'll have to manage each item of the dictionary as a list of (who, message) tuples. And before you ask: no, dictionaries are *not* sorted; you'll have to sort a separate list of the keys or the items at the appropriate time. Regards, Dan -- Dan Sommers http://www.tombstonezero.net/dan/ -- http://mail.python.org/mailman/listinfo/python-list
Re: sorting tuples...
[EMAIL PROTECTED] wrote: I edited my code earlier and came up with stringing the groups (200501202010, sender, message_string) into one string delimited by '%%%'. Why? It seems you are trying to use a string as some kind of container, and Python has those in the box. Just use a list of tuples, rather than a list of strings. That will work fine for .sort(), and it's much more convenient to access your data. Using the typical tool for extracting binary data from files/strings will give you tuples by default. import struct # Check this out in library ref. # I'm inventing a simple binary format with everything # as strings in fixed positions. There's just one string # below, adjacent string literals are concatenated by # Python. I split it over three lines for readability. bin = ( 200501221530John*** long string here *** 200504151625Clyde *** clyde's long string here *** 200503130935Jeremy *** jeremy string here ) fmt=@12s8s32s # imagined binary format. l=52 # 12+8+32, from previous line msgs = [] for i in range(3): ... # struct.unpack will return a tuple. It works well ... # with numeric data too. ... msgs.append(struct.unpack(fmt, bin[i*l:(i+1)*l])) msgs.sort() for msg in msgs: ... print msg ('200501221530', 'John', '*** long string here ***') ('200503130935', 'Jeremy ', '*** jeremy string here ') ('200504151625', 'Clyde ', *** clyde's long string here ***) I could then sort the messages with the date string at the beginning as the one being sorted with the big string in its tail being sorted too. This works equally well with a list of tuples. Another benefit of the list of tuples approach is that you don't need to cast everything to strings. If parts of your data is e.g. numeric, just let it be an int, a long or a float in your struct, and sorting will work correctly without any need to format the number in such a way as to make string sorting work exactly as numeric sorting. Here's an example with numeric data: b = ( '\x00\x00\x07\xd5\x00\x00\x00\x01\x00\x00\x00\x16\x00\x00\x00' '\x0f\x00\x00\x00\x1eJohn\x00\x00\x00\x00*** long string here' ' ***\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x07\xd5\x00\x00' '\x00\x03\x00\x00\x00\r\x00\x00\x00\t\x00\x00\x00#Jeremy\x00' '\x00*** jeremy string here \x00\x00\x00\x00\x00\x00\x00' '\x07\xd5\x00\x00\x00\x04\x00\x00\x00\x0f\x00\x00\x00\x10\x00' '\x00\x00\x19Clyde\x00\x00\x00*** clyde\'s long string here ***') fmt=!i8s32s l = 60 # five ints (5*4) + 8 + 32 bin_msgs=[] for i in range(3): bin_msgs.append(struct.unpack(fmt, bin[i*l:(i+1)*l])) bin_msgs.reverse() # unsort... bin_msgs.sort() for msg in bin_msgs: print msg (2005, 1, 22, 15, 30, 'John\x00\x00\x00\x00', '*** long string here ***\x00\x00\x00\x00\x00\x00\x00\x00') (2005, 3, 13, 9, 35, 'Jeremy\x00\x00', '*** jeremy string here \x00\x00\x00\x00\x00') (2005, 4, 15, 16, 25, 'Clyde\x00\x00\x00', *** clyde's long string here ***) -- http://mail.python.org/mailman/listinfo/python-list
Re: sorting tuples...
Thank you very much. I'll look into this immediately. I edited my code earlier and came up with stringing the groups (200501202010, sender, message_string) into one string delimited by '%%%'. I could then sort the messages with the date string at the beginning as the one being sorted with the big string in its tail being sorted too. 200501202010%%%sender%%%message_string 200502160821%%%sender%%%message_string ... After sorting this list of long strings, I could then split them up using the '%%%' delimiter and arrange them properly for output. It's crude but at least I achieve what I wanted done. But both posters gave good advices, if not a bit too advanced for me. I'll play with them and keep tweaking my code. Thanks so much! -- /nh -- http://mail.python.org/mailman/listinfo/python-list
Re: sorting tuples...
Uhm, if the file is clean you can use something like this: data = \ 200501221530 John *** long string here *** 200504151625 Clyde *** clyde's long string here *** 200503130935 Jeremy *** jeremy string here records = [rec.split(\n) for rec in data.split(\n\n)] records.sort() print records If it's not clean, you have to put some more cheeks/cleanings. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: sorting tuples...
On 17 Sep 2005 06:41:08 -0700, [EMAIL PROTECTED] wrote: Hello guys, I made a script that extracts strings from a binary file. It works. My next problem is sorting those strings. Output is like: snip 200501221530 John *** long string here *** 200504151625 Clyde *** clyde's long string here *** 200503130935 Jeremy *** jeremy string here snip How can I go about sorting this list based on the date string that marks the start of each message? Should I be using lists, dictionaries or tuples? What should I look into? Is there a way to generate variables in a loop? Like: x=0 while (x10): # assign variable-x = [...list...] x = x+1 Thanks. Assuming your groups of strings are all non-blank lines delimited by blank lines, and using StringIO as a line iterable playing the role of your source of lines, (not tested beyond what you see ;-) from StringIO import StringIO lines = StringIO(\ ... 200501221530 ... John ... *** long string here *** ... ... 200504151625 ... Clyde ... *** clyde's long string here *** ... ... 200503130935 ... Jeremy ... *** jeremy string here ... ) from itertools import groupby for t in sorted(tuple(g) for k, g in groupby(lines, ... lambda line:line.strip()!='') if k): ... print t ... ('200501221530\n', 'John\n', '*** long string here ***\n') ('200503130935\n', 'Jeremy\n', '*** jeremy string here \n') ('200504151625\n', 'Clyde\n', *** clyde's long string here ***\n) The lambda computes a grouping key that groupby uses to collect group members as long as the value doesn't change, so this groups non-blank vs blank lines, and the if k throws out the blank-line groups. Obviously you could do something else with the sorted line tuples t, e.g., lines.seek(0) (just needed that to rewind the StringIO data here) for t in sorted(tuple(g) for k, g in groupby(lines, ... lambda line:line.strip()!='') if k): ... width = max(map(lambda x:len(x.rstrip()), t)) ... topbot = '+-%s-+'%('-'*width) ... print topbot ... for line in t: print '| %s |' % line.rstrip().ljust(width) ... print topbot ... print ... +--+ | 200501221530 | | John | | *** long string here *** | +--+ +-+ | 200503130935| | Jeremy | | *** jeremy string here | +-+ +--+ | 200504151625 | | Clyde| | *** clyde's long string here *** | +--+ Or of course you can just print the sorted groups bare: lines.seek(0) for t in sorted(tuple(g) for k, g in groupby(lines, ... lambda line:line.strip()!='') if k): ... print ''.join(t) ... 200501221530 John *** long string here *** 200503130935 Jeremy *** jeremy string here 200504151625 Clyde *** clyde's long string here *** If your source of line groups is not delimited by blank lines, or has other non-blank lines, you will have to change the source or change the lambda to some other key function that produces one value for the lines to include (True if you want to use if k as above) and another (False) for the ones to exclude. HTH Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list