On May 2, 2006, at 1:02 AM, Michel Pelletier wrote:

even very large data sets, and I was able to index 2MT from a 10MT Swoogle dump and still get subsecond search speeds for searches. I think this could really differentiate rdflib, especially given that we are not text indexing using xapian or any kind of black box, but instead keep all the index data in rdf itself.

This sounds neat, but I keep running into bugs and poor code support and documentation in rdflib, and it's getting to the point that I'm wondering whether I should be using it in future production scenarios -- just a few, er, highlights:

1. ntriples serialization is broken, as I reported last week

2. the whole *InputSource thing seems totally undocumented -- if it is documented, I'd love to know where?

3. the support for datatyped literals is remarkably poor. I've been complaining about this for years, have sent design ideas and even a patch or two, as I recall, and it's still really bad... (Every time I teach someone rdflib, or use it in a project, I have to start by showing them how to do real, *minimally* decent datatyped-literal support by writing a wrapper around Literal, which maps basic Python types to RDF datatypes. This gets *so* old!)

4. no *real* query language support, though it seems Chimezie has fixed this to some extent by allowing Versa querying... Would be *really* nice to have that rolled into the next major release. SPARQL would be nice too, but *any* query language is better than none.

5. Is there any inference support at all, even for just simple RDFS? It's not clear what the state of play here is, and that's a problem in addition to what appears to be the lack of support...

6. A shocking amount of API instability for a 2.x versioned library.

7. Sparta -- which I hear may be rolled into a future release -- is really unhelpful in many cases; for anything but a very simple RDF graph, it's a lot more trouble than it's worth to create a bunch of in-mem Py objects and then just interact with *them*... That's not my idea at all of a Python-RDF databinding tool. Yeah, I know, Sparta isn't *really* a part of rdflib; fine, so where is rdflib's good Python-RDF databinding tool?

(FWIW: Sparta's use of OWL cardinality constraints is *completely* broken. OWL cardinality constraints are *not* database constraints, at all, but that's how Sparta uses and describes them. Which just *spreads confusion*. That's just broken by design. Nothing prevennts anyone from doing database constraints, but those properties should *not* be in the OWL namespace. Database integrity constraints are a *very* good thing, but that's *not* what OWL max cardinality is about.)

8. Graph.value() is also completely broken with respect to RDF semantics, and the explanation in the error message that you get when you call Graph.value() is misleading and unhelpful. In fact, the docstring for this method is *flatly* wrong; it says "Useful if one knows that there may only be one value"... Hmm, actually, value() is *totally* broken if there is more than one value, regardless of what one knows or whether one knows that there *may* be more than one value (what does that even mean?)... And the any keyword arg really makes it worse... the docstring says if any=True, then value() will "return any value in the case there is more than one" -- huh? Is there any guarantee as to which one it returns? The first? A random one? Is it deterministic? How is this useful? Why not return a Python set of the values? Or a list? Or a tuple? (And if the value is an *RDF* list, make a class to distinguish that case and return an instance of that class...)

I really shouldn't have to suffer Sparta to get this kind of sane support from bare rdflib.

So, this is open source, which means, basically, "put up or shut up". And I'm prepared to do just that, since I have projects where I need sane RDF support in Python. If yr interested in working on rdflib with funding, ping me and we can talk. I don't know if fixing rdflib, or forking it, or starting over is the right thing. But something needs to be done, and soon, to improve the state of RDF libraries for Python.

Cheers,
Kendall Clark


_______________________________________________
Dev mailing list
[email protected]
http://rdflib.net/mailman/listinfo/dev

Reply via email to