nat lu wrote:
If you ever do anything with MapReduce and RDF, use N-Quads/N-Triples
format (they are splittable by nature) and be careful how you deal
with blank nodes! :-)
Tips ?
See what tdbloader3 does here:
https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/io/MapReduceLabelToNode.java
... and be extra careful when you have a sequence of multiple MapReduce
jobs, the first job needs to generate blank node labels respecting
the scope according to the input file, but the subsequent jobs must
not change the blank node label since the data is partitioned and
mixed up each time.
If someone has experience on this, better suggestions, feedback is
welcome.
[...]
Do you have an Hadoop cluster to run your experiments on?
I will have access for a limited time, out of hours, to an appliance.
Other experiments, not unrelated, to take place first. But longer term,
for the purposes of this discussion, and as homework/experiment, more
interested in commodity or even "discarded" hardware (Android devices ?!
:-) ), but thats more of an hadoop thing.
Nah... don't make the mistake to think about commodity hardware as
machines you might drop in the bin and you use instead for an Hadoop
cluster. It isn't like that. On Android devices? Forget it.
If I come up with anything interesting or contentious (RDF wise) I'll
revert.
?
Paolo