Re: minimal set of URIs for individuals (in context of owl:sameAs)

James Howison Mon, 14 Apr 2008 08:32:59 -0700


Well here's what I've done about this for the moment:


1. Get DISTINCT uris for all eg:User into distinctUsers array

2. Create empty array to hold Representatives (ie one uri per synonymgroup)

3. Recursively filter the distinctUsers array, adding representativesto reps, removing synonym groups until distinctUsers array is empty

                
 String currUrl = distinctUsers.remove(0); // urls.pop
 reps.add(currUrl); // This is the representative
 ArrayList<String> synonyms = getSynonyms(currUrl, model);
 distinctUsers.removeAll(synonyms);
 if (urls.isEmpty()) {
   return reps;
 } else {
   // recurse and continue
   return whittle(urls, reps, model);
 }
}

getSynonyms() is just a SPARQL query (with results put into anArrayList<String>)


"WHERE { <" + currUrl +"> owl:sameAs ?synonym .}"

That gives me the minimal set of uris for synonym groups, and it isn't_too_ computationally expensive. Still wish I could do this via asingle SPARQL query (perhaps with FILTERs).


Cheers,
James

On Apr 12, 2008, at 11:30 PM, James Howison wrote:

I'm trying to understand how to get a minimal set of URIs to referto a set of individuals[1], where multiple URIs might have beendeclared owl:sameAs each other. This would be useful for countingindividuals of a particular owl:Class, while respecting owl:sameAs,but also for UI where you don't want to show the individual multipletimes (once for each synonym URI). The set would be such that allthe URIs would be owl:differentFrom each other, and there would beone (and only one) for each set of URIs declared owl:sameAs eachother.
I note that the COUNT extensions I've looked at, such as ARQ, countURIs rather than attempting to count semantic entities.
Minimal example:

eg:User rdf:type owl:Class .

eg:userA rdf:type eg:User .

eg:userB rdf:type eg:User .

eg:userC rdf:type eg:User .

# now add new knowledge that eg:userA and eg:userB
# are actually synonyms for the same person, but
# that eg:userC refers to a separate person

eg:userA owl:sameAs        eg:userB ;
        owl:differentFrom eg:userC .
So there are actually two people, where one has two synonyms(eg:userA and eg:userB)
Now if I use OWL inference and SPARQL I could find the first URI forany eg:User:
WHERE { ?uri rdf:type eg:User } LIMIT 1
getting, for example, the result eg:userC, and then run a secondquery like:
WHERE { ?user owl:differentFrom eg:userC }
but that would give me both eg:userA and eg:userB. If I then usethat list to count I get 3, rather than the desired 2. If I use itto draw a UI I get repetition of an individual.
I'm hoping to end up with a set or URIs, such that all the memberURIs are owl:differentFrom each other, and there is one URI for eachindividual in the set.
Any SPARQL methods to do this, or do I need to post-process theresults of the second query to 'whittle' down the resultsrecursively removing elements that are owl:sameAs each other? Seemslike a problem others would have faced. Perhaps owl:allDifferent isrelevant here, can that be used in SPARQL queries in some way?
Apologies if people saw a similar query a few days ago on jena-dev,I didn't get any answers so I tried to clean it up, cut it down abit and find the right venue.
Thanks,
James
ps. I realize that the idea of counting individuals this wayviolates the open world assumption (there may, of course, be manymore 'out there') but for many purposes (like UIs) this is still avalid desire, I think.
[1] Individual as distinct from URI. ie if eg:a owl:sameAs eg:bthere are two URIs but only a single individual (with twosynonyms). I hope that's the right nomenclature. Happy to becorrected.

Re: minimal set of URIs for individuals (in context of owl:sameAs)

Reply via email to