Hi,
I've seen mention of metakit in various places over the last couple of years and I finally started using it with Python. It's great! It definitely fills the large application persistence void between object serialization or Gadfly and relational database servers that I was straddling up until a few days ago.
Even though I've been working with it for a few days and have gone through much of the list archives and available MK4Py docs I still don't quite have my brain wrapped around it yet and thought I would ask for some suggestions.
Here's what I'm doing:
On a remote server, tracking data is entered into a database by a tracking device about every 3 seconds. My Python program then requests data for a block of time in that remote database through a Java servlet that returns xml.
This request, usually for about 1 minute of tracking data, happens every 10 seconds or so.
In each xml document I download, there are tracking "targets" each with
an id, a timestamp, and a location coordinate. I parse the xml and store each target and its attributes in a row all in a metakit view.
Next I remove duplicates from the view because there is often overlap in the incoming data sets.
Within the view, a single target id may have several earlier recorded tracks with the same id but different timestamps and locations. We call all the data for a target prior to its latest timestamp the target "trail".
I then sort the view by the timestamps to get the latest recorded time.
Here's where I'm starting to get lost. Using that latest timestamp I have to get rid of all targets (and their trails) whose latest record is more than 30 seconds before the latest timestamp. Any target that hasn't been active is considered to have dropped out of the tracking system and should be removed.
I created a filtered subview which contains the indicies of the targets whose latest recorded date is less than thirty seconds old compared to the latest timestamp.
So I'm not sure where to go from here because I haven't quite figured out all of the MK4Py methods yet.
So far we've had roughly 1,000 targets at any given time with a wide variety of tail targets. But I don't for see the database ever gettin bigger than 200 megs.
So each time I add data to metakit I want to:
1. Throw out duplicates (done)
2. Get the latest timestamp (done)
3. Remove targets and their trails 30+ seconds older than the latest timestamp. (not sure)
4. Group the targets by id (no problem)
5. Access each target id by group and sort it by head (the latest point) and trail (all the other track points). (not sure)
Here's pretty much what the view I have looks like:
id date latitude longitude epoch ----- ------------------- ------------- -------------- ------------ 2873 2004-04-15 20:00:31 38.2749996185 -75.1578979492 1087520384.0 2873 2004-04-15 20:00:34 38.2749996185 -75.1579971313 1087520384.0 2878 2004-04-15 20:00:31 38.2283992767 -75.1417999268 1087520384.0 2878 2004-04-15 20:00:34 38.2285003662 -75.141998291 1087520384.0 2878 2004-04-15 20:00:37 38.2285003662 -75.141998291 1087520384.0 2878 2004-04-15 20:00:40 38.2285003662 -75.141998291 1087520384.0 2906 2004-04-15 20:00:37 38.3486003876 -75.0955963135 1087520384.0 2906 2004-04-15 20:00:40 38.3486003876 -75.0955963135 1087520384.0 2909 2004-04-15 20:00:31 38.3435001373 -75.1462020874 1087520384.0 2909 2004-04-15 20:00:34 38.3435001373 -75.1462020874 1087520384.0
I add the "epoch" column for time sorting so I don't run into any problems sorting the date column which is a string.
I'm trying to use metakit for as much of the sorting as possible because it's so darn fast.
I haven't quite grasped several of the view operators such as "remapwith", "reduce", etc. and haven't found good examples.
Any suggestions on which metakit methods can or can't be used to do the five data processing steps above?
Thanks, Joel
_____________________________________________ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
