Sometime ago I started doing a custom stats analyzer and the idea was to not load the full log (it was huge) but update
the stats as I was reading the log.

I registered <individual stats analyzers> within the analyzer so for every line read each <individual analyzer> did its own calculation (some kind of aggregation of data) so they didn't consume to much memory either.

I think you can do something similar. And add as many analyzers as different criterias you have.
For example you can have one analyzer that reads the transaction id and keeps it in a dictionary as long as all the different types of messages are found ... all depends on what's the output you want to get (like missing messages, repeated messages, ...)

Hope you get the idea too.


Daniel Kersten wrote:
PS: So, yes, they are readonly.

2008/12/15 Daniel Kersten <[email protected]>:
  
Ok, I'll give a few more details as to what I'm doing.

Basically, I have a little python app which analyses log files (these
log files are large, I have one here thats incomplete and is already
200MB). Each entry contains a number of fields which I package into
convenient little objects.

The objects represent "messages" and the fields are addresses of those
messages (transaction id's etc) and I need to verify that if I get
message of type A that I then receive a message of type B with
matching transaction id's and address X in range Y... you get the
idea, I hope.

I could use sqlite for this (and that might even be a good solution),
though I'd like to keep it in plain python, if possible, since its
meant to just be a little script which I can run over the log files on
whatever machine it happens to be on, though I may settle for using
sqlite if theres no alternative.


2008/12/15 Juan Hernandez Gomez <[email protected]>:
    
Hi,

you could create an SQLite table with the properties you want to filter as
columns and an extra column with the index of the object in your large list
(if not the full object).
Then you have the full power of SQL and you can create indexes as needed. Is
quite flexible.

You haven't said if the list of objects can be updated or is just readonly.

Juan


Daniel Kersten wrote:

Hi all,

I have a large list of objects which I'd like to filter on various criteria.
For example, I'd like to do something like:
  give me all objects o where o.a == "A" and o.b == "B" and o.c in [...]

I thought of storing references to these objects in dictionaries, so
that I can look them up by their values (eg dict_of_a would contain
all objects where its value is the object and the key is that objects
value of 'a', this way if I do dict_of_a[o.a] I get back [o] (or more
elements, if other objects have the same value)) and then look up each
field and then perform a set union to get all objects which match the
desired criteria (though this doesn't work for the `in` operator). I
hope that made sense.

The problem is that I have a large list of these objects (well over
100k) and I was wondering if there was a better way of doing this?
Perhaps a super-efficient built in query object?? anything?

I'm probably doing it wrong anyway, so any tips or ideas to push me
towards a proper solution would be greatly appreciated.

Thanks,
Dan.



      

--
Daniel Kersten.
Leveraging dynamic paradigms since the synergies of 1985.

    



  

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Python Ireland" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [email protected]
For more options, visit this group at http://groups.google.ie/group/pythonireland?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to