On 28/06/11 14:37, Paolo Castagna wrote:
Andy Seaborne wrote:
public Node create(String label) {
return Node.createAnon(new AnonId(filename + "-" + label)) ;
}
The way I thought was to allocate a UUID per parser run (or any other
sufficiently large random number), xor the label into the UUID to
produce the bNode label. This is a non-localised label allocation scheme.
Hi Andy,
I am not sure this would work with MapReduce as filers are split into multiple
chunks and different machines can process splits from the same file.
Exactly - by "parser run" I mean all the separate parsing actions in one
step of the process. Allocate one large job random number as the base
of bNode label generation across the whole cluster.
Per job instance, means it's different next time, important if the data
is merged with other data.
Let's say I have this file, split into two chunks:
----------------------------
<foo:bar> <foo:p> _:bnode1 . split 1
_:bnode1<foo:q> "1" .
----------------------------
_:bnode1<foo:r> "2" . split 2
----------------------------
I need to ensure the 'bnode1' label in split 1 and 2 refers to the same blank
node even if the splits are parsed separately. However, the same 'bnode1' label
from a different file must represent a different blank node. In practice, with
MapReduce, I cannot assume that a file is parsed in a single "parser run".
Therefore, I would like to have my own
LabelToNode implementation with an Allocator<String, Node> which
takes into
account the filename (or an hash of it) when it creates a new blank node.
But LabelToNode constructor is private.
Could we make it protected?
Now public.
Thanks.
Paolo
Or, alternatively, how can I construct a LabelToNode object which will
be using
my MapReduceAllocator?
LabelToNode createUseLabelAsGiven()
Andy