All,
I am working on removing memory limits for CONSTRUCT queries. Since the
spec requires that the triples created by the graph template be combined by
a set union [1] (section 16.2 in the 1.1 WD), it means we need to remove
duplicates. As the triples do not need to be ordered (section 16.2.3), it
seems like DistinctDataBag<Triple> will fit the bill as a temporary storage
container/duplicate remover.
The current implementation of DistinctDataBag is implemented as an in-memory
HashMap that spills sorted data to disk. When the data is read back using a
merge-sort, all duplicates will be adjacent, and can thus be easily removed.
Anyway, to make a long story short, I need a Comparator<Triple> to pass to
the DistinctDataBag. Do we have such a thing? If not, what is the best way
to implement this? I have an attempt below, but I need some help on the
NodeComparator class.
-Stephen
[1] It would be nice if it was a bag union and we had something like
CONSTRUCT DISTINCT for set. I'm guessing it is not possible to change the
spec for backwards compatibility reasons with SPARQL 1.0?
==
public class TripleComparator implements Comparator<Triple>
{
private final NodeComparator nc = new NodeComparator();
public int compare(Triple o1, Triple o2)
{
int toReturn = nc.compare(o1.getSubject(), o2.getSubject());
if (toReturn == 0)
{
toReturn = nc.compare(o1.getPredicate(), o2.getPredicate());
if (toReturn == 0)
{
toReturn = nc.compare(o1.getObject(), o2.getObject());
}
}
return toReturn;
}
}
public class NodeComparator implements Comparator<Node>
{
public int compare(Node o1, Node o2)
{
// TODO: What should I be doing here?
return
o1.getIndexingValue().toString().compareTo(o2.getIndexingValue().toString())
;
}
}