Hi Scott (hi all),
first of all, thank you for your email and nice to "meet" you. Even if
only via email, and even if we have never had the chance to interact
before. (We clearly have common contacts though!).
We (@Talis) use Lucene as well as Solr (as well as something else in the
future) to provide our free text search capabilities. However we do not
actually store RDF into Lucene indexes. For that, we use a "proper" RDF
store with SPARQL support which otherwise you will need to implement on
top of Lucene (and it's not a trivial task).
I am very interested in the topic of free text search in the context
of RDF and how free text searches can be 'integrated' with SPARQL.
I'd like to know more about your project plans and, indeed, your motivations.
I am not completely sure if your attachment made it to the jena-dev mailing
list. I have received the attachment anyway, since you added my work related
email (which I tend to try to protect from evil spammers) to the To: field.
I am subscribed to the [email protected] mailing list, so we
can discuss here.
Coming back to the idea of "placing Lucene and Solr into Jena as persistent
store", can I suggest you take a look at SIREn [1]? There is a good chapter
(a case study) on the "Lucene in Action, Second Edition" book [2]. I really
recommend the book, it's a good one.
SIREn's aim is to use Lucene indexes to provide a complete storage system
for RDF, however I cannot possibly comment on the support for RDF store
APIs or their level of compliance in relation to SPARQL queries, for example.
A different approach it the one taken by LARQ [3] (and/or similar):
"LARQ is a combination of ARQ and Lucene. It gives ARQ the ability to
perform free text searches. Lucene indexes are additional information
for accessing the RDF graph, not storage for the graph itself."
-- http://openjena.org/ARQ/lucene-arq.html
LARQ is, at the moment, included in ARQ, but we have an open JIRA issue
(i.e. JENA-9 [4]) to separate it out as a separate module depending on ARQ.
A development version or LARQ as separate module, ready to be tested,
is available here: https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/
If you, or some of your students have time to try it, let me know if you
have problems with it.
As an experiment, I did a similar thing with Solr, it's called SARQ
and it's available here: https://github.com/castagna/SARQ.
Labeled "experimental (and unsupported)" since I did it out-of-band as
a proof of concept, but, because the design and functionalities are the
same as LARQ, it should not require a lot of effort to make it ready for
production. If others think this might be useful.
While, I was writing SARQ, I though: "wouldn't be nice to make it
extremely easy for developers to plug-in different indexing systems
such as Lucene, Solr or Elastic Search?". So, I gave it a go at EARQ.
It's available here: https://github.com/castagna/EARQ.
Again, it's labeled "experimental (and unsupported)", but if needed
and people are interested in it, it might require only little
improvements.
One of the biggest problem I had in relation to LARQ, SARQ and EARQ is
how to manage "deletes/removals". I've used a Jena Model as source for
a poor man's reference counting to decide when to remove a document
from the Lucene index. The source code should be clear on this.
Last but not least, in relation to part of the content of your attachment,
Jena is still in its incubating phase at Apache, but things work almost
the same as for the Apache Software Foundation. Please, have a look at
"How the ASF works" [5].
Let's keep the discussion flowing and invite your students to interact
with us on the jena-dev.
Let me know your motivations for wanting to store RDF in a Lucene/Solr
index.
Regarding the "cloud" references in your project proposal, we should
probably discuss it on a separate thread/message, always, on jena-dev.
Paolo
[1] http://siren.sindice.com/
[2] http://www.manning.com/hatcher3/
[3] http://openjena.org/ARQ/lucene-arq.html
[4] https://issues.apache.org/jira/browse/JENA-9
[5] http://www.apache.org/foundation/how-it-works.html
Damian Steer wrote:
(I didn't get a moderation message about this, but Paolo was Ccd and forwarded
to me. Is moderation working for anyone?)
Begin forwarded message:
---------- Forwarded message ----------
From: Scott Streit <[email protected]>
Date: Wed, Feb 9, 2011 at 12:44 PM
Subject: Lucene/Solr and Jena
To: [email protected], [email protected]
Jena-dev,
A group of my students at Villanova would like their Master's Degree
project to include placing lucene and solr into Jena as a persistent
store. We are adding two more students.
Attached is an overall project plan. Upon your approval, the next
step is a design document.
Scott Streit
--
'Moveable feast' is a metaphor for things which change over time.
"If you are lucky enough to have lived in Paris as a young man, then
wherever you go for the rest of your life, it stays with you, for
Paris is a moveable feast." - Ernest Hemingway
http://www.intervise.com/cs/scott_streit
------------------------------------------------------------------------