Well here's what I've done about this for the moment:
1. Get DISTINCT uris for all eg:User into distinctUsers array
2. Create empty array to hold Representatives (ie one uri per synonym
group)
3. Recursively filter the distinctUsers array, adding representatives
to reps, removing synonym groups until distinctUsers array is empty
String currUrl = distinctUsers.remove(0); // urls.pop
reps.add(currUrl); // This is the representative
ArrayList<String> synonyms = getSynonyms(currUrl, model);
distinctUsers.removeAll(synonyms);
if (urls.isEmpty()) {
return reps;
} else {
// recurse and continue
return whittle(urls, reps, model);
}
}
getSynonyms() is just a SPARQL query (with results put into an
ArrayList<String>)
"WHERE { <" + currUrl +"> owl:sameAs ?synonym .}"
That gives me the minimal set of uris for synonym groups, and it isn't
_too_ computationally expensive. Still wish I could do this via a
single SPARQL query (perhaps with FILTERs).
Cheers,
James
On Apr 12, 2008, at 11:30 PM, James Howison wrote:
I'm trying to understand how to get a minimal set of URIs to refer
to a set of individuals[1], where multiple URIs might have been
declared owl:sameAs each other. This would be useful for counting
individuals of a particular owl:Class, while respecting owl:sameAs,
but also for UI where you don't want to show the individual multiple
times (once for each synonym URI). The set would be such that all
the URIs would be owl:differentFrom each other, and there would be
one (and only one) for each set of URIs declared owl:sameAs each
other.
I note that the COUNT extensions I've looked at, such as ARQ, count
URIs rather than attempting to count semantic entities.
Minimal example:
eg:User rdf:type owl:Class .
eg:userA rdf:type eg:User .
eg:userB rdf:type eg:User .
eg:userC rdf:type eg:User .
# now add new knowledge that eg:userA and eg:userB
# are actually synonyms for the same person, but
# that eg:userC refers to a separate person
eg:userA owl:sameAs eg:userB ;
owl:differentFrom eg:userC .
So there are actually two people, where one has two synonyms
(eg:userA and eg:userB)
Now if I use OWL inference and SPARQL I could find the first URI for
any eg:User:
WHERE { ?uri rdf:type eg:User } LIMIT 1
getting, for example, the result eg:userC, and then run a second
query like:
WHERE { ?user owl:differentFrom eg:userC }
but that would give me both eg:userA and eg:userB. If I then use
that list to count I get 3, rather than the desired 2. If I use it
to draw a UI I get repetition of an individual.
I'm hoping to end up with a set or URIs, such that all the member
URIs are owl:differentFrom each other, and there is one URI for each
individual in the set.
Any SPARQL methods to do this, or do I need to post-process the
results of the second query to 'whittle' down the results
recursively removing elements that are owl:sameAs each other? Seems
like a problem others would have faced. Perhaps owl:allDifferent is
relevant here, can that be used in SPARQL queries in some way?
Apologies if people saw a similar query a few days ago on jena-dev,
I didn't get any answers so I tried to clean it up, cut it down a
bit and find the right venue.
Thanks,
James
ps. I realize that the idea of counting individuals this way
violates the open world assumption (there may, of course, be many
more 'out there') but for many purposes (like UIs) this is still a
valid desire, I think.
[1] Individual as distinct from URI. ie if eg:a owl:sameAs eg:b
there are two URIs but only a single individual (with two
synonyms). I hope that's the right nomenclature. Happy to be
corrected.