Hi Juan.That would probably be the best way to do it alright. It has the added side effect that I can display stats before I have read the complete dataset too, which would be useful.
Thanks everyone else too! You gave me some nice ideas and sent my brain back working, thats pretty much what I was hoping for. Thanks again, Dan. 2008/12/15 Juan Hernandez Gomez <[email protected]>: > Sometime ago I started doing a custom stats analyzer and the idea was to not > load the full log (it was huge) but update > the stats as I was reading the log. > > I registered <individual stats analyzers> within the analyzer so for every > line read each <individual analyzer> did its own calculation (some kind of > aggregation of data) so they didn't consume to much memory either. > > I think you can do something similar. And add as many analyzers as different > criterias you have. > For example you can have one analyzer that reads the transaction id and > keeps it in a dictionary as long as all the different types of messages are > found ... all depends on what's the output you want to get (like missing > messages, repeated messages, ...) > > Hope you get the idea too. > > > Daniel Kersten wrote: > > PS: So, yes, they are readonly. > > 2008/12/15 Daniel Kersten <[email protected]>: > > > Ok, I'll give a few more details as to what I'm doing. > > Basically, I have a little python app which analyses log files (these > log files are large, I have one here thats incomplete and is already > 200MB). Each entry contains a number of fields which I package into > convenient little objects. > > The objects represent "messages" and the fields are addresses of those > messages (transaction id's etc) and I need to verify that if I get > message of type A that I then receive a message of type B with > matching transaction id's and address X in range Y... you get the > idea, I hope. > > I could use sqlite for this (and that might even be a good solution), > though I'd like to keep it in plain python, if possible, since its > meant to just be a little script which I can run over the log files on > whatever machine it happens to be on, though I may settle for using > sqlite if theres no alternative. > > > 2008/12/15 Juan Hernandez Gomez <[email protected]>: > > > Hi, > > you could create an SQLite table with the properties you want to filter as > columns and an extra column with the index of the object in your large list > (if not the full object). > Then you have the full power of SQL and you can create indexes as needed. Is > quite flexible. > > You haven't said if the list of objects can be updated or is just readonly. > > Juan > > > Daniel Kersten wrote: > > Hi all, > > I have a large list of objects which I'd like to filter on various criteria. > For example, I'd like to do something like: > give me all objects o where o.a == "A" and o.b == "B" and o.c in [...] > > I thought of storing references to these objects in dictionaries, so > that I can look them up by their values (eg dict_of_a would contain > all objects where its value is the object and the key is that objects > value of 'a', this way if I do dict_of_a[o.a] I get back [o] (or more > elements, if other objects have the same value)) and then look up each > field and then perform a set union to get all objects which match the > desired criteria (though this doesn't work for the `in` operator). I > hope that made sense. > > The problem is that I have a large list of these objects (well over > 100k) and I was wondering if there was a better way of doing this? > Perhaps a super-efficient built in query object?? anything? > > I'm probably doing it wrong anyway, so any tips or ideas to push me > towards a proper solution would be greatly appreciated. > > Thanks, > Dan. > > > > > > -- > Daniel Kersten. > Leveraging dynamic paradigms since the synergies of 1985. > > > > > > > > -- Daniel Kersten. Leveraging dynamic paradigms since the synergies of 1985. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Python Ireland" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.ie/group/pythonireland?hl=en -~----------~----~----~----~------~----~------~--~---
