Hi Paolo, On 06/03/12 14:56, Paolo Castagna wrote: > Alexander Dutton wrote: >> This is the way we're going with our site, data.ox.ac.uk. After >> each update to the triplestore we'll regenerate an ElasticSearch >> index from a SPARQL query. […] > > interesting... > > How do you update your triplestore (SPARQL Update, Jena APIs via > custom code, manually from command line, ...)?
Our administration interface manages grabbing data from elsewhere, transforming it in various ways, and then uses the graph store HTTP protocol to push it into Fuseki. Once that's done it fires off a notification on a redis pubsub channel to say "this update just completed". There's then something that listens on the relevant channel which will perform the ElasticSearch update. (There are other things that handle uploading dataset metadata to thedatahub, and archiving datasets for bulk download). There's code at https://github.com/oucs/humfrey, but it's a bit of a nightmare to set up and (surprise, surprise) lacks documentation. The ElasticSearch stuff is still in development on the elasticsearch branch. At some point I'll find the time to make it easier to install and create a demo site. (as you may have noticed, the whole thing is an eclectic mix of technologies; Django, ElasticSearch, redis, PostgreSQL, Apache httpd…) > We (still) have two related JIRA 'issues': > > - LARQ needs to update the Lucene index when a SPARQL Update request > is received https://issues.apache.org/jira/browse/JENA-164 > > - Refactor LARQ so that it becomes easy to plug in different indexes > such as Solr or ElasticSearch instead of Lucene > https://issues.apache.org/jira/browse/JENA-17 > > I am still unclear how to intercept all the possible update routes > (i.e. SPARQL Update, APIs, bulk loaders, etc...). Our approach is to limit the ways in which updates can happen (i.e. things will become inconsistent if it doesn't happen through our admin interface). This obviously doesn't work in the general case, but could be a useful half-way house (e.g. say "'INSERT … WHERE …' will leave you with a stale index. If you care, use 'CONSTRUCT' and the graph store protocol instead"). > But, I think it would be useful to allow people to use Apache Solr > and/or ElasticSearch indexes (and/or other custom indexes) and keep > those up-to- date when changes come in. For external indexes presumably you either need something that gets hooked into the JVM and listens for updates there, or a way to push notifications to external applications/services when things happen. > What do you store in ElasticSearch? Technically, nothing yet, as I'm still implementing it ;-). Once it's implemented it'll build indexes tailored to the types of modelling patterns we expect to have in the store. For example, we might SPARQL for organisations like <http://is.gd/gsc1Zs> and for each create a chunk of JSON to feed into ElasticSearch. Targets for indexing so far include organisations, people, vacancies, courses, and equipment. We'll add more indexes as we add new types of things. All the best, Alex PS. I'd be interested to know whether our approach is generally considered sane…
