On Tue, 2005-10-11 at 17:09 +0200, Dirk Meyer wrote: > We should keep some things in mind. First, your IPC code is nice but > not secure. Everyone can connect to the server and call python > functions and such things.
I can add authentication. But the data still wouldn't be encrypted, so in any case this solution isn't suitable for use over a public network. And I don't even think we should bother trying to make the IPC channel encrypted. Not only would it hurt performance, but it's probably impossible to get it right anyway. (And if we used m2crypto, say, our program would leak like crazy.) I think it's good enough for IPC to use filesystem access control (in the case of unix sockets). If the user wants to use it over the LAN I can add some basic challenge/response authentication to kaa's ipc. But for purposes of epg and vfs, I agree with your basic architecture of: database on local machine, db reads in thread, db writes over ipc. For something like managing recording schedules in kaa.record, a simple authentication mechanism in kaa.base.ipc might do. > local. We also can't use mbus because it is designed to be a message > bus, not a bus to transport that much data (but it is secure btw). mbus is secure, is it? High praise indeed. I wouldn't use that word about any software. :) Even about openvpn, which could be the best piece of software I use on my computer. > (async). The thread will not only query the db, it will also create > nice 'Program' objects so the main thread can use it without creating > something. There should also be a cache to speed up stuff by not using > the thread with db lookup at all. Herein lies the main benefit of doing reads in a thread. The thread can also take care of putting the data in a manageable form for the main application. This is important particularly since Python is a hog when it comes to object creation. > Freevo knows what channels would be visible when entering the tv > grid. So Freevo will request these channels with programs +- 6 hours > at startup. Using my rewrite of kaa.epg (which I've cleverly called kaa.epg2 for now), this takes 0.2 seconds and returns 1978 program objects. As a point of interest, the query itself takes 0.02 seconds to execute, another 0.1 seconds to convert the rows to tuples, another 0.05 seconds to normalize the tuples into dicts (including unpickling ATTR_SIMPLE attributes), and another 0.03 seconds to convert those to dicts to Program objects. So that 0.05 in normalize time is some low hanging fruit and would bring that query down to 0.15 seconds (on my system, at least). Not slow, but I agree that it's worth prefetching. The original kaa.epg executes that same query in 0.17 seconds. Pretty comparable performance there. Keyword searches are a different story, of course. Searching for "simpsons" with kaa.epg returns 120 rows and takes 0.15 seconds. With kaa.epg2 and using the keyword support in kaa.base.db, the same query takes 0.015 seconds. BTW, when parsing my 17MB xmltv file, kaa.epg takes 74 minutes ([EMAIL PROTECTED] $!#@) to execute, and uses 377MB RSS. My rewrite (whose improvement is mainly due to my use of libxml2, of course) takes 94 seconds and uses less than half that memory. That's a 50X performance improvement. About 55% of that 94 seconds is due to keyword indexing (ATTR_KEYWORDS attributes). I could probably improve that time quite a bit by adding mass add functionality to the API. (Sort of like the difference between pysqlite's execute and executemany.) > cache. When you go to the right, freevo will ask the db for data + 10 > hours, just to be sure in case the user needs it. So we can cache in > the background what we thing is needed next in a thread and the main > loop can display stuff without using the db. Probably not a bad idea to do prefetches like that. There's a pretty high initial overhead, so it's better to get more rows than you need. For example, querying for the next 2 hours of program data takes 0.1 seconds and returns 200 rows. Querying for the next 12 hours returns 2000 rows and takes 0.2 seconds. 10X the data for only 2X extra execution time. Actually, now that I think about that, something doesn't seem right there. Smells like an index isn't getting used (or doesn't exist). Anyway, I agree, prefetching in another thread seems to be the way to go. > Back to client / server. When we want to add data to the epg, we spawn > a child. First we check if a write child is already running (try to > connect to the ipc, if it doesn't work, start child, I have some test > code for that). Same for kaa.vfs. One reading thread in each app, one > writing app on each machine. Sounds sensible. I think I need to change my opinion a bit about kaa.base.ipc. My original thinking was that you don't need to write a client API. You just grab a proxy to a remote object and use it as if it's local. This works in terms of functionality, but in practice, things aren't so clear. For example, in the epg example, you do a query and return a list of 2000 Program objects. Since objects get proxied by default, all those Program objects are proxies. So if we assume epg is a proxied, remote object: for prog in epg.search(keywords="simpsons"): prog.description That would be fairly slow, because since 'prog' is a proxied Program object, each access of the description attribute goes over the wire. Alternatively you could do this: for prog in epg.search(keywords="simpsons", __ipc_copy_result = True): prog.description That'd be fast, because each Program object is pickled (rather than just a reference to it), so all those objects are local. But if Program object holds a reference to the epg (prog._epg in my case), if you use __ipc_copy_result, the epg Guide object also gets pickled. That's not good. Ideally you'd want Program objects to get pickled (so that attribute accesses are local), but the epg reference is to a remote object. This isn't something kaa.base.ipc can do automatically. It needs some supporting logic. So in reality, we'll need a client API that uses IPC and does intelligent things for the API its wrapping. This isn't really a problem, it just means that kaa ipc isn't magic pixie dust like I claimed it was. :) > And kaa.epg has different sources. One is > xmltv, a new one could be sync-from-other-db. As I mentioned on IRC, unless this is just a straight copy of the sqlite file, this probably isn't worth it. Syncing individual rows means accessing the db through pysqlite in which case we're not really saving anything. With libxml2, parsing the xml file is very quick. Almost all the time is due to db accesses, so we're not saving much by syncing at the row level from another db. Copying the epgdb.sqlite file straight over would be a big win, of course. We could implement that eventually. Cheers, Jason.
signature.asc
Description: This is a digitally signed message part