Events remove the need to subclass and override Graph.add/remove to add
special behavior. It keeps complex stuff out of Graph. In fact I
imagine some complex stuff that is now in Graph could be refactored into
event handlers. Of note I discovered is that it is absurdly difficult
to subclass Graph these days, you have to subclass Graph, and
ConjunctiveGraph, and override CG.parse at a bare minimum to enhance
Graph. Event handlers remove a lot of the need to subclass as I've
found overriding add/remove to be the most common case.
Interesting. The event handling adds another dimension to working
with graphs that I hadn't thought of before. Could you give some
examples of some of the current APIs that could be refactored into
working as event handlers. I agree that the Graph classes should be
as 'vanilla' as possible to encourage extending their behavior, I'm
just more used to thinking of such abstraction in terms of seperate
classes and mixins than via event handling.
For instance, do you see Graph aggregation working in this way also?
I.e. working with the union of one or more graphs as though they were
one. I imagine if each registered handlers to the major events (add,
remove, triples) then you could achieve that. Graph aggregation is
one thing I don't think we have a mechanism for that would be useful.
In addition to facilitating text indexing, this events adds some other
bonus functionality, like multiple stores can subscribe to one graph,
and when the one graph changes, all the stores changes as well. Note
that the graph is still only backed by one graph that is actually
queried and considered the backend for the graph, but other stores can
"follow along" the changes made to the primary store. For example, a
graph can have an in memory backend, and also have a database store
subscribe to the graph, so that any changes to the in memory graph are
written through to the db as well. I have also used a subscriber that
keeps a count of all the triples added to the graph, and commits a ZODB
subtransaction or prints a progress message when a threshold is hit.
This could solve the "upkeep" use case Chimzie mentioned earlier.
Yes, I can imagine a Triples remove event handler that would keep
count of when the removes hit a certain threshold to trigger garbage
collection of identifiers - as a fallback for when the application
doesn't care to do this itself.
Comments? I'll probably be rolling this in this week if there is no
conflicting work going on. This work was paid for by the good folks at
Six Feet Up and we are starting to use this code in production.
I'm looking forward to the new APIs, they seem very well thought through.
Searching time is very fast for even very large data sets, and I was
able to index 2MT from a 10MT Swoogle dump and still get subsecond
search speeds for searches. I think this could really differentiate
rdflib, especially given that we are not text indexing using xapian or
any kind of black box, but instead keep all the index data in rdf
itself. This makes it ultimately portable.
_______________________________________________
Dev mailing list
[email protected]
http://rdflib.net/mailman/listinfo/dev