All,

I am working on removing memory limits for CONSTRUCT queries.  Since the
spec requires that the triples created by the graph template be combined by
a set union [1] (section 16.2 in the 1.1 WD), it means we need to remove
duplicates.  As the triples do not need to be ordered (section 16.2.3), it
seems like DistinctDataBag<Triple> will fit the bill as a temporary storage
container/duplicate remover.

The current implementation of DistinctDataBag is implemented as an in-memory
HashMap that spills sorted data to disk.  When the data is read back using a
merge-sort, all duplicates will be adjacent, and can thus be easily removed.

Anyway, to make a long story short, I need a Comparator<Triple> to pass to
the DistinctDataBag.  Do we have such a thing?  If not, what is the best way
to implement this?  I have an attempt below, but I need some help on the
NodeComparator class.

-Stephen

[1] It would be nice if it was a bag union and we had something like
CONSTRUCT DISTINCT for set.  I'm guessing it is not possible to change the
spec for backwards compatibility reasons with SPARQL 1.0?

==

public class TripleComparator implements Comparator<Triple>
{
    private final NodeComparator nc = new NodeComparator();
    
    public int compare(Triple o1, Triple o2)
    {
        int toReturn = nc.compare(o1.getSubject(), o2.getSubject());
        if (toReturn == 0)
        {
            toReturn = nc.compare(o1.getPredicate(), o2.getPredicate());
            if (toReturn == 0)
            {
                toReturn = nc.compare(o1.getObject(), o2.getObject());
            }
        }
        
        return toReturn;
    }
}


public class NodeComparator implements Comparator<Node>
{
    public int compare(Node o1, Node o2)
    {
        // TODO: What should I be doing here?
        return
o1.getIndexingValue().toString().compareTo(o2.getIndexingValue().toString())
;
    }
}


Reply via email to