Re: Comparators for Triple or Node?

Andy Seaborne Wed, 28 Sep 2011 05:27:00 -0700

On 27/09/11 21:14, Stephen Allen wrote:

All,


I am working on removing memory limits for CONSTRUCT queries.  Since the
spec requires that the triples created by the graph template be combined by
a set union [1] (section 16.2 in the 1.1 WD), it means we need to remove
duplicates.

Sort of - something has to remove duplicates but if they are beinginserted into a store that does duplicate suppression then the CONSTRUCTneed not perfectly.

DISTINCT on the WHERE part would be effective in reducing the amount oftriples even is there is still the possibility of duplicates.

Other simple techniques like a sliding window of previously generatedtriples might also be a lot easier than perfect duplicate removal.

As the triples do not need to be ordered (section 16.2.3), it
seems like DistinctDataBag<Triple>  will fit the bill as a temporary storage
container/duplicate remover.

The current implementation of DistinctDataBag is implemented as an in-memory
HashMap that spills sorted data to disk.  When the data is read back using a
merge-sort, all duplicates will be adjacent, and can thus be easily removed.

Anyway, to make a long story short, I need a Comparator<Triple>  to pass to
the DistinctDataBag.  Do we have such a thing?  If not, what is the best way
to implement this?  I have an attempt below, but I need some help on the
NodeComparator class.


Not directly but there is:

NodeUtils.compareRDFTerms
NodeValue.compareAlways
NodeValue.compare

I think you want NodeValue.compareAlways.



NodeValue.compare backs =, <, > is SPARQL.

This is value-based.

123 > 45
123 < 456

Not everything is comparable. ARQ sorting does impose total order onbindings by using NodeValue.compare, then the SPARQL rules for unlikeobjects, then doing a guaranteed-to-have-an-answerNodeUtils.compareRDFTerms.

The end result is arbitrary - e.g. bNode labels are compared to orderbNodes.

So it's value-comparison, then SPARQL type ordering, then a totalordering Nodes.


-Stephen

[1] It would be nice if it was a bag union and we had something like
CONSTRUCT DISTINCT for set.  I'm guessing it is not possible to change the
spec for backwards compatibility reasons with SPARQL 1.0?

123 < 45
Correct.  And also an RDF graph is a set of triples so it's not just SPARQL.

Other systems (Sesame, Redland) have or had duplicated triple resultsarguing it was then the app to sort it out if it cared (a choice ofefficiency argument) but these have all gone now, I think. Users wantedno duplicates.


        Andy


==

public class TripleComparator implements Comparator<Triple>
{
     private final NodeComparator nc = new NodeComparator();

     public int compare(Triple o1, Triple o2)
     {
         int toReturn = nc.compare(o1.getSubject(), o2.getSubject());
         if (toReturn == 0)
         {
             toReturn = nc.compare(o1.getPredicate(), o2.getPredicate());
             if (toReturn == 0)
             {
                 toReturn = nc.compare(o1.getObject(), o2.getObject());
             }
         }

         return toReturn;
     }
}


public class NodeComparator implements Comparator<Node>
{
     public int compare(Node o1, Node o2)
     {
         // TODO: What should I be doing here?
         return
o1.getIndexingValue().toString().compareTo(o2.getIndexingValue().toString())
;
     }
}

Re: Comparators for Triple or Node?

Reply via email to