Hi,
I just want to share an approach to do inference a la RIOT infer command
line but faster (i.e. using MapReduce).
I've done only limited testing, but it should work. It's quite simple and
it is just a map only job.
The driver is InferDriver.java [1] and the map function is InferMapper.java
[2]. Now, I am interested in what parts of OWL can be done in a similar way.
In comparison to other (very interesting) approaches (for example: [3]) this
is extremely simple, but its simplicity is a very big plus in practice.
It also satisfies a lot of use cases.
Next step is: how to I do the same when I receive a (typically small) update?
How to I intercept the update?
What if the update deletes stuff (with stuff 1) vocabulary data 2) instance
data)? 2) is what I think is more likely to happen in practice.
Cheers,
Paolo
PS:
I've been using Apache Whirr to test this and it works perfectly with small
Hadoop clusters (i.e. < 10 nodes). Unfortunately, I am having issues with
larger clusters (i.e. > 20 nodes) [4]. Apache Whirr just went out incubation
and it's a really great project, I really recommend you look at it if you
ever need to have an Hadoop cluster running on EC2. Whirr also is not limited
to Hadoop.
[1]
https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferDriver.java
[2]
https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferMapper.java
[3] http://www.few.vu.nl/~jui200/webpie.html
[4] http://markmail.org/thread/tseifrs7y3kiebih