On May 2, 2006, at 1:02 AM, Michel Pelletier wrote:
even very large data sets, and I was able to index 2MT from a 10MT
Swoogle dump and still get subsecond search speeds for searches. I
think this could really differentiate rdflib, especially given that
we are not text indexing using xapian or any kind of black box, but
instead keep all the index data in rdf itself.
This sounds neat, but I keep running into bugs and poor code support
and documentation in rdflib, and it's getting to the point that I'm
wondering whether I should be using it in future production scenarios
-- just a few, er, highlights:
1. ntriples serialization is broken, as I reported last week
2. the whole *InputSource thing seems totally undocumented -- if it
is documented, I'd love to know where?
3. the support for datatyped literals is remarkably poor. I've been
complaining about this for years, have sent design ideas and even a
patch or two, as I recall, and it's still really bad... (Every time I
teach someone rdflib, or use it in a project, I have to start by
showing them how to do real, *minimally* decent datatyped-literal
support by writing a wrapper around Literal, which maps basic Python
types to RDF datatypes. This gets *so* old!)
4. no *real* query language support, though it seems Chimezie has
fixed this to some extent by allowing Versa querying... Would be
*really* nice to have that rolled into the next major release. SPARQL
would be nice too, but *any* query language is better than none.
5. Is there any inference support at all, even for just simple RDFS?
It's not clear what the state of play here is, and that's a problem
in addition to what appears to be the lack of support...
6. A shocking amount of API instability for a 2.x versioned library.
7. Sparta -- which I hear may be rolled into a future release -- is
really unhelpful in many cases; for anything but a very simple RDF
graph, it's a lot more trouble than it's worth to create a bunch of
in-mem Py objects and then just interact with *them*... That's not my
idea at all of a Python-RDF databinding tool. Yeah, I know, Sparta
isn't *really* a part of rdflib; fine, so where is rdflib's good
Python-RDF databinding tool?
(FWIW: Sparta's use of OWL cardinality constraints is *completely*
broken. OWL cardinality constraints are *not* database constraints,
at all, but that's how Sparta uses and describes them. Which just
*spreads confusion*. That's just broken by design. Nothing prevennts
anyone from doing database constraints, but those properties should
*not* be in the OWL namespace. Database integrity constraints are a
*very* good thing, but that's *not* what OWL max cardinality is about.)
8. Graph.value() is also completely broken with respect to RDF
semantics, and the explanation in the error message that you get when
you call Graph.value() is misleading and unhelpful. In fact, the
docstring for this method is *flatly* wrong; it says "Useful if one
knows that there may only be one value"... Hmm, actually, value() is
*totally* broken if there is more than one value, regardless of what
one knows or whether one knows that there *may* be more than one
value (what does that even mean?)... And the any keyword arg really
makes it worse... the docstring says if any=True, then value() will
"return any value in the case there is more than one" -- huh? Is
there any guarantee as to which one it returns? The first? A random
one? Is it deterministic? How is this useful? Why not return a Python
set of the values? Or a list? Or a tuple? (And if the value is an
*RDF* list, make a class to distinguish that case and return an
instance of that class...)
I really shouldn't have to suffer Sparta to get this kind of sane
support from bare rdflib.
So, this is open source, which means, basically, "put up or shut up".
And I'm prepared to do just that, since I have projects where I need
sane RDF support in Python. If yr interested in working on rdflib
with funding, ping me and we can talk. I don't know if fixing rdflib,
or forking it, or starting over is the right thing. But something
needs to be done, and soon, to improve the state of RDF libraries for
Python.
Cheers,
Kendall Clark
_______________________________________________
Dev mailing list
[email protected]
http://rdflib.net/mailman/listinfo/dev