Greetings, >I have timestamped log files I need to read through and keep track >of the most upto date information. > >For example lets say we had a log file > >timeStamp,name,marblesHeld,timeNow,timeSinceLastEaten
I do not quite understand the distinction between timeStamp and timeNow. >I need to keep track of every 'name' in this table, I don't want >duplicate values so if values come in from a later timestamp that >is different then that needs to get updated. For example if a later >timestamp showed 'dave' with less marbles that should get updated. > >I thought a dictionary would be a good idea because of the key >restrictions ensuring no duplicates, so the data would always >update - Yes. A dictionary seems reasonable. >However because they are unordered and I need to do some more >processing on the data afterwards I'm having trouble. Ordered how? For each name, you need to keep the stream of data ordered? This is what I'm assuming based on your problem description. If the order of names (dave, steve and jenny) is important, then you should look to OrderedDict as JM has suggested. I am inferring from your description that the order of events (along a timeline) is what is important, not the sequence of players to each other(, since that is already in the logfile). >For example lets assume that once I have the most upto date values >from dave,steve,jenny I wanted to do timeNow - timeSinceLastEaten >to get an interval then write all the info together to some other >database. Crucially order is important here. Again, it's not utterly clear what "order" means. If order of events for a single player is important, then see below. >I don't know of a particular name will appear in the records or >not, so it needs to created on the first instance and updated from >then on. Again, a dictionary is great for this. It seems that you could benefit, also from a list (to store an event and the time at which the event occurred). But, you don't want to store all of history, so you want to use a bounded length list. You may find a collections.deque useful here. >Could anyone suggest some good approaches or suggested data >structures for this? First, JM already pointed you to OrderedDict, which may help depending on exactly what you are trying to order. There are two other data structures in the collections module that may be helpful for you. I perceive the following (from your description). You have a set of names (players). You wish to store, for each name, a value (marblesHeld). You wish to store, for each name, a value (timeSinceLastEaten). I recommend learning how to use both: collections.defaultdict [0]: so you can dynamically create entries for new players in the marble game without checking if they already exist in the dictionary (very convenient!) collectionst.deque [1]: in this case, I'm suggesting using it as a bounded-length list; you keep adding stuff to it and after it stores X entries, the old ones will "fall off" Note, I fabricated players and data, but the bit that you are probably interested in is the interaction between the dictionary, whose keys are the names of the players, and whose values contain the deque capturing (the last 10 entries) of the users marble count and the time at which this occurred. mydeque = functools.partial(collections.deque, maxlen=10) record = collections.defaultdict(mydeque) Storing both the marble count and the time will allow you to calculate at any time later the duration since the user last had a marble count change. I don't understand how the eating fits into your problem, but maybe my code (below) will afford you an example of how to approach the problem with a few of Python's wonderfully convenient standard library data structures. Good luck, -Martin P.S. I just read your reply to JM, and it looks like you also are trying to figure out how to read the input data. Is it CSV? Could you simply use the csv module [2]? [0] https://docs.python.org/3/library/collections.html#collections.defaultdict [1] https://docs.python.org/3/library/collections.html#collections.deque [2] https://docs.python.org/3/library/csv.html #! /usr/bin/python3 import time import random import functools import collections import pprint players = ['Steve', 'Jenny', 'Dave', 'Samuel', 'Jerzy', 'Ellen'] mydeque = functools.partial(collections.deque, maxlen=10) def marblegame(rounds): record = collections.defaultdict(mydeque) for _ in range(rounds): now = time.time() who = random.choice(players) marbles = random.randint(0, 100) record[who].append((marbles, now)) for whom, marblehistory in record.items(): print(whom, end=": ") pprint.pprint(marblehistory) if __name__ == '__main__': import sys if len(sys.argv) > 1: count = int(sys.argv[1]) else: count = 30 marblegame(count) # -- end of file -- Martin A. Brown http://linux-ip.net/ -- https://mail.python.org/mailman/listinfo/python-list