Re: Way to generate LOD cloud diagram Interlinking Stats from the Virtuoso OpenSource SPARQL endpoint named graphs?

2010-09-09 Thread Peter DeVries
Hi Richard,

I made a slightly different version after looking though my link outs in the
RDF.
There are probably other predicates that link out but this should be the
majority of them.

---
PREFIX owl:   
PREFIX skos:  
PREFIX txn:   
PREFIX foaf:  
PREFIX umbel: 

SELECT ?domain_s ?domain_o (COUNT(*) AS ?count)
WHERE {
   {
   SELECT (bif:regexp_substr("http://([^/]*)", STR(?s), 1) AS ?domain_s)
(bif:regexp_substr("http://([^/]*)", STR(?o), 1) AS ?domain_o)
   WHERE {
   { ?s owl:sameAs ?o }
   UNION
   { ?s skos:exactMatch ?o }
   UNION
   { ?s skos:broadMatch ?o }
   UNION
   { ?s skos:narrowMatch ?o }
   UNION
   { ?s skos:relatedMatch ?o }
   UNION
   { ?s skos:closeMatch ?o }
   UNION
   { ?s txn:speciesConceptHasSpeciesNameString ?o }
   UNION
   { ?s txn:speciesNameStringHasSpeciesTaxonConcept ?o }
   UNION
   { ?s txn:speciesConceptHasBasionymNameString  ?o }
   UNION
   { ?s txn:basionymNameStringHasSpeciesTaxonConcept  ?o }
   UNION
   { ?s txn:hasPDFVersion  ?o }
   UNION
   { ?s txn:hasAuthorURI  ?o }
   UNION
   { ?s foaf:page  ?o }
   UNION
   { ?s foaf:topic  ?o }
   UNION
   { ?s txn:inDBpediaClade  ?o }
   UNION
   { ?s txn:occurrenceInContinent  ?o }
   UNION
   { ?s txn:occurrenceInStateProvince  ?o }
   UNION
   { ?s txn:occurrenceInCounty  ?o }
   }
   }
}
GROUP BY ?domain_s ?domain_o

--

This query on the latest TaxonConcept.org RDF gives the following:

*domain_s*   *domain_o*   *count*
lod.geospecies.org lod.taxonconcept.org 71143
www.uniprot.org lod.taxonconcept.org 21570
bio2rdf.org lod.taxonconcept.org 21570
dbpedia.org lod.taxonconcept.org 18790
eunis.eea.europa.eu lod.taxonconcept.org 2986
www.bbc.co.uk lod.taxonconcept.org 318
lod.taxonconcept.org lod.geospecies.org 71142
lod.taxonconcept.org www.uniprot.org 21570
lod.taxonconcept.org bio2rdf.org 21799
lod.taxonconcept.org dbpedia.org 94441
lod.taxonconcept.org eunis.eea.europa.eu 5972
lod.taxonconcept.org www.bbc.co.uk636
rdf.freebase.com lod.taxonconcept.org 118
 lod.taxonconcept.org 72
lod.taxonconcept.org rdf.freebase.com 118
lod.taxonconcept.org  24902
sw.opencyc.org lod.taxonconcept.org 23
lod.taxonconcept.org sw.opencyc.org   23
lod.taxonconcept.org gni.globalnames.org 72687
gni.globalnames.org lod.taxonconcept.org 72687
lod.taxonconcept.org www.americanarachnology.org 1
lod.taxonconcept.org assets.geospecies.org 3
lod.taxonconcept.org www.itis.gov 42097
lod.taxonconcept.org data.gbif.org 1152
lod.taxonconcept.org bugguide.net 3296
lod.taxonconcept.org www.eol.org 516
lod.taxonconcept.org en.wikipedia.org 18790
lod.taxonconcept.org species.wikimedia.org 9309
lod.taxonconcept.org www.boldsystems.org 39
lod.taxonconcept.org www.catalogueoflife.org 53
lod.taxonconcept.org lod.taxonconcept.org 284592
lod.taxonconcept.org mushroomobserver.org 5
assets.geospecies.org lod.geospecies.org 10
assets.geospecies.org lod.taxonconcept.org 1
assets.geospecies.org media.geospecies.org 5
static.flickr.com www.flickr.com 33
bugguide.net lod.taxonconcept.org 3245
media.geospecies.org lod.taxonconcept.org 19
ocs.geospecies.org lod.taxonconcept.org 26
media.geospecies.org dbpedia.org 14
assets.geospecies.org dbpedia.org 1
media.geospecies.org lod.geospecies.org 37
mushroomobserver.org lod.taxonconcept.org 5
media.geospecies.org media.geospecies.org 29
ocs.geospecies.org ocs.geospecies.org 53
media.geospecies.org assets.geospecies.org 15
media.geospecies.org static.flickr.com 2
bugguide.net bugguide.net 1
mushroomobserver.org mushroomobserver.org 3
bugguide.net dbpedia.org 2
mushroomobserver.org dbpedia.org 1
ocs.geospecies.org sws.geonames.org 39

- Pete




On Thu, Sep 9, 2010 at 4:18 PM, Peter DeVries wrote:

> Hi Richard,
>
> You appear to be correct about versions. The public site is running the
> ubuntu package which is a little order.
>
>  I have a private instance that is running the compiled version and that
> does not have a problem with the AS.
>
> I am updating the data set on that machine so that the two match and then
> will run the query you sent to get the latest info.
>
> Thanks!
>
> - Pete
>
>
> On Thu, Sep 9, 2010 at 10:39 AM, Richard Cyganiak wrote:
>
>> Peter,
>>
>>
>> On 9 Sep 2010, at 02:54, Peter DeVries wrote:
>>
>>> I was wondering if anyone has figured out a way to generate the LOD
>>> interlinking (InLinks/OutLinks) stats from a Virtuoso OpenSource SPARQL
>>

Re: Way to generate LOD cloud diagram Interlinking Stats from the Virtuoso OpenSource SPARQL endpoint named graphs?

2010-09-09 Thread Nathan

Richard Cyganiak wrote:
I used this one here a lot. It makes use of Viruoso's awesome built-in 
function library. Unfortunately it doesn't work on your endpoint, 
complains about the AS in the SELECT clause. Old Virtuoso version?


PREFIX owl:  
PREFIX skos: 
SELECT ?domain_s ?domain_o (COUNT(*) AS ?count)
WHERE {
{
SELECT (bif:regexp_substr("http://([^/]*)", STR(?s), 1) AS 
?domain_s) (bif:regexp_substr("http://([^/]*)", STR(?o), 1) AS ?domain_o)


jus wondering if it might be worth considering uri's with an https:// 
scheme too :)


Best,

Nathan



Re: Way to generate LOD cloud diagram Interlinking Stats from the Virtuoso OpenSource SPARQL endpoint named graphs?

2010-09-09 Thread Peter DeVries
Hi Richard,

You appear to be correct about versions. The public site is running the
ubuntu package which is a little order.

 I have a private instance that is running the compiled version and that
does not have a problem with the AS.

I am updating the data set on that machine so that the two match and then
will run the query you sent to get the latest info.

Thanks!

- Pete

On Thu, Sep 9, 2010 at 10:39 AM, Richard Cyganiak wrote:

> Peter,
>
>
> On 9 Sep 2010, at 02:54, Peter DeVries wrote:
>
>> I was wondering if anyone has figured out a way to generate the LOD
>> interlinking (InLinks/OutLinks) stats from a Virtuoso OpenSource SPARQL
>> Endpoint.
>>
>
> I used this one here a lot. It makes use of Viruoso's awesome built-in
> function library. Unfortunately it doesn't work on your endpoint, complains
> about the AS in the SELECT clause. Old Virtuoso version?
>
> Richard
>
>
>
> PREFIX owl:  
> PREFIX skos: 
> SELECT ?domain_s ?domain_o (COUNT(*) AS ?count)
> WHERE {
>{
>SELECT (bif:regexp_substr("http://([^/]*)", STR(?s), 1) AS
> ?domain_s) (bif:regexp_substr("http://([^/]*)", STR(?o), 1) AS ?domain_o)
>WHERE {
>{ ?s owl:sameAs ?o }
>UNION
>{ ?s skos:exactMatch ?o }
>UNION
>{ ?s skos:broadMatch ?o }
>UNION
>{ ?s skos:narrowMatch ?o }
>UNION
>{ ?s skos:relatedMatch ?o }
>UNION
>{ ?s skos:closeMatch ?o }
>}
>}
> }
> GROUP BY ?domain_s ?domain_o
>
>
>
>
>> The two named graphs I am most interested in are:
>>
>>  *taxonconcept*
>> *geospecies*
>> *
>> *
>> On this endpoint http://lsd.taxonconcept.org/sparql
>>
>> Thanks!
>>
>> - Pete
>>
>> 
>> Pete DeVries
>> Department of Entomology
>> University of Wisconsin - Madison
>> 445 Russell Laboratories
>> 1630 Linden Drive
>> Madison, WI 53706
>> TaxonConcept Knowledge Base  / GeoSpecies
>> Knowledge Base 
>> About the GeoSpecies Knowledge Base 
>> 
>>
>
>


-- 

Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base  / GeoSpecies
Knowledge Base 
About the GeoSpecies Knowledge Base 



Re: Way to generate LOD cloud diagram Interlinking Stats from the Virtuoso OpenSource SPARQL endpoint named graphs?

2010-09-09 Thread Richard Cyganiak

Peter,

On 9 Sep 2010, at 02:54, Peter DeVries wrote:

I was wondering if anyone has figured out a way to generate the LOD
interlinking (InLinks/OutLinks) stats from a Virtuoso OpenSource  
SPARQL

Endpoint.


I used this one here a lot. It makes use of Viruoso's awesome built-in  
function library. Unfortunately it doesn't work on your endpoint,  
complains about the AS in the SELECT clause. Old Virtuoso version?


Richard



PREFIX owl:  
PREFIX skos: 
SELECT ?domain_s ?domain_o (COUNT(*) AS ?count)
WHERE {
{
SELECT (bif:regexp_substr("http://([^/]*)", STR(?s), 1) AS ? 
domain_s) (bif:regexp_substr("http://([^/]*)", STR(?o), 1) AS ?domain_o)

WHERE {
{ ?s owl:sameAs ?o }
UNION
{ ?s skos:exactMatch ?o }
UNION
{ ?s skos:broadMatch ?o }
UNION
{ ?s skos:narrowMatch ?o }
UNION
{ ?s skos:relatedMatch ?o }
UNION
{ ?s skos:closeMatch ?o }
}
}
}
GROUP BY ?domain_s ?domain_o





The two named graphs I am most interested in are:

 *taxonconcept*
*geospecies*
*
*
On this endpoint http://lsd.taxonconcept.org/sparql

Thanks!

- Pete


Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base  /  
GeoSpecies

Knowledge Base 
About the GeoSpecies Knowledge Base 






Re: Propagation of bad sameAs statements

2010-09-09 Thread Juan Sequeda
Hugh,

Great to understand how this all works. I'm now expecting somebody to take
all these sameAs links and run some type of page rank algorithm and rank
what actually is sameAs.

Cheers

Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Thu, Sep 9, 2010 at 8:23 AM, Hugh Glaser  wrote:

> Hi,
> Thank you for your interest.
> Here are some sort of answers to this and other questions.
> In fact, this has become something of a dialogue with myself :-)
>
> sameas.org does not itself do any interesting inference, other than
> A sameas B & B sameas C => A sameas C when asked about A.
> It aims to gather equivalence information from existing sources and service
> the results in a convenient (single) place.
> (It also aims to address the problem of owl:sameas being a pairwise
> statement, which gives an unpleasant explosion (n**2) of statements for
> groups of equivalences, which can be quite hard to handle.)
>
> Who chooses what data is acceptable?
> Er, me.
> I look at it and decide.
>
> Is it a spider (people sometimes ask this)?
> No - when I am bored with the other things I am doing I add more to it, by
> downloading dumps or querying SPARQL endpoints, often as a result of
> messages on this and other lists.
>
> Is owl:sameAs the only predicate recognised?
> As you have worked out, no.
> It is a service giving equivalent URIs, and one of the formats you can get
> back is owl:sameAs. But you can get other formats if you want. So the
> inputs
> include things like skos:exactMatch and skos:closeMatch (as I recall).
> And we could output other formats such as these if asked.
> At the moment we only do rdf+xml, text/n3, application/json, text/plain,
> see
> http://www.sameas.org/about.php.
> What has now been noticed is that I decided that dbpedia redirects should
> be
> treated as equivalent.
> The reason I did this is that it meant that a lot of expected URIs now
> worked.
> Eg http://dbpedia.org/resource/UN/LOCODE:GBLON and
> even http://dbpedia.org/resource/Capital_of_the_UK get to
> http://data.ordnancesurvey.co.uk/id/70041428 and
> http://statistics.data.gov.uk/id/eer/07.
> The downside is that there is quite a lot of cruft in the redirects, and so
> some strange things happen (as has been observed).
>
> Do I know about errors in sameas.org?
> Yes.
> I like the Iron Maiden one to opencyc, for example.
> But I don't aim to correct these, any more than Google aims to correct
> things it links to.
>
> Why such a liberal attitude to equivalence?
> I eventually worked out that sameas.org was a discovery service.
> We have other sameas services, called crs services, on our systems (eg
> http://opencyc.rkbexplorer.com/crs/ is an external one) which are
> definitional (I hesitate to use a word like authoritative, with all its
> other connotations).
> And so in that vein, I have cast the net wider for sameas.org.
> This was the case early in its life, as the wordnet equivalence to dbpedia
> is in fact the equivalence of the word to the thing, which is wrong at
> some/any level.
> But I have taken the view that people/agents that come to sameas.org are
> looking for things, and might not care about such subtleties, not least
> because they may not have understand them when they constructed their RDF.
>
> If I had the time/funding, I would provide other services that took
> different views of equivalence, in terms of discovery/definitional or
> liberal/conservative (precision/recall is another way of saying that).
>
> Mind you it is probably the case that the sameas.org data is no worse than
> a
> lot of the data in the LOD diagram, in terms of reliably identifying
> resources, as I have rejected a bunch of them as being substandard.
>
> On 08/09/2010 15:42, "joel sachs"  wrote:
>
> >
> ...
> > So, a request for the sameas.org folks: Would it be possible to include
> a
> > provenance column for all sameAs assertions you keep track of?  In cases
> > where the sameAs assertion isn't actually asserted on the web, you could
> > indicate the provenance as "inferred" in the provenance column. Also,
> have
> > you published the heuristics you use (if any) to infer sameAs relations?
> >
> ...
> >
> > Thanks!
> > Joel.
> >
> >
> >
> So finally getting round to your specific question (although hopefully the
> other stuff has also helped).
> It would be hard to provide the extra column for quite a few reasons.
> We do know where we got the data from, but it may be a SPARQL endpoint, a
> dump downloaded, or an email sent to me, for examples. So it would not be
> very easy to interpret.
> But only a small number of the pairs would be so identified, as all the
> rest
> are inferred from the other pairwise assertions.
> We can actually have our own visualisation tools for bundles, with
> assertions and dates, etc, but the tool is hard to read if you don't know
> what is happening, and...
> 1) Finding the resources to make it more accessible would be hard.
> sameas.org has effectively never been funded - it is my hobby wi

Re: WordNet RDF

2010-09-09 Thread Ian Davis
On Wed, Sep 8, 2010 at 12:39 PM, Toby Inkster  wrote:
> Dear all,
>
> I've created a think RDF wrapper around the WordNet 3.0 database (nouns
> only). For example:
>
>  http://ontologi.es/WordNet/data/Fool

Great work.

There is a SPARQL'able version of Wordnet 3.0 available via the Talis Platform:

http://api.talis.com/stores/wordnet

This is based on the RDF conversion at  http://semanticweb.cs.vu.nl/lod/wn30/

How similar is your work to this version?

Ian



Re: Way to generate LOD cloud diagram Interlinking Stats from the Virtuoso OpenSource SPARQL endpoint named graphs?

2010-09-09 Thread Kingsley Idehen


I was wondering if anyone has figured out a way to generate the LOD 
interlinking (InLinks/OutLinks) stats from a Virtuoso OpenSource 
SPARQL Endpoint.


The two named graphs I am most interested in are:

 *taxonconcept*
 *geospecies*
*
*
On this endpoint http://lsd.taxonconcept.org/sparql

Thanks!

- Pete


Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base  / 
GeoSpecies Knowledge Base 

About the GeoSpecies Knowledge Base 



Pete,

We'll put together a SPARQL query collection re. this matter.

ETA -- later today :-)

--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen







Re: Propagation of bad sameAs statements

2010-09-09 Thread Hugh Glaser
Hi,
Thank you for your interest.
Here are some sort of answers to this and other questions.
In fact, this has become something of a dialogue with myself :-)

sameas.org does not itself do any interesting inference, other than
A sameas B & B sameas C => A sameas C when asked about A.
It aims to gather equivalence information from existing sources and service
the results in a convenient (single) place.
(It also aims to address the problem of owl:sameas being a pairwise
statement, which gives an unpleasant explosion (n**2) of statements for
groups of equivalences, which can be quite hard to handle.)

Who chooses what data is acceptable?
Er, me.
I look at it and decide.

Is it a spider (people sometimes ask this)?
No - when I am bored with the other things I am doing I add more to it, by
downloading dumps or querying SPARQL endpoints, often as a result of
messages on this and other lists.

Is owl:sameAs the only predicate recognised?
As you have worked out, no.
It is a service giving equivalent URIs, and one of the formats you can get
back is owl:sameAs. But you can get other formats if you want. So the inputs
include things like skos:exactMatch and skos:closeMatch (as I recall).
And we could output other formats such as these if asked.
At the moment we only do rdf+xml, text/n3, application/json, text/plain, see
http://www.sameas.org/about.php.
What has now been noticed is that I decided that dbpedia redirects should be
treated as equivalent.
The reason I did this is that it meant that a lot of expected URIs now
worked.
Eg http://dbpedia.org/resource/UN/LOCODE:GBLON and
even http://dbpedia.org/resource/Capital_of_the_UK get to
http://data.ordnancesurvey.co.uk/id/70041428 and
http://statistics.data.gov.uk/id/eer/07.
The downside is that there is quite a lot of cruft in the redirects, and so
some strange things happen (as has been observed).

Do I know about errors in sameas.org?
Yes.
I like the Iron Maiden one to opencyc, for example.
But I don't aim to correct these, any more than Google aims to correct
things it links to.

Why such a liberal attitude to equivalence?
I eventually worked out that sameas.org was a discovery service.
We have other sameas services, called crs services, on our systems (eg
http://opencyc.rkbexplorer.com/crs/ is an external one) which are
definitional (I hesitate to use a word like authoritative, with all its
other connotations).
And so in that vein, I have cast the net wider for sameas.org.
This was the case early in its life, as the wordnet equivalence to dbpedia
is in fact the equivalence of the word to the thing, which is wrong at
some/any level.
But I have taken the view that people/agents that come to sameas.org are
looking for things, and might not care about such subtleties, not least
because they may not have understand them when they constructed their RDF.

If I had the time/funding, I would provide other services that took
different views of equivalence, in terms of discovery/definitional or
liberal/conservative (precision/recall is another way of saying that).

Mind you it is probably the case that the sameas.org data is no worse than a
lot of the data in the LOD diagram, in terms of reliably identifying
resources, as I have rejected a bunch of them as being substandard.

On 08/09/2010 15:42, "joel sachs"  wrote:

> 
...
> So, a request for the sameas.org folks: Would it be possible to include a
> provenance column for all sameAs assertions you keep track of?  In cases
> where the sameAs assertion isn't actually asserted on the web, you could
> indicate the provenance as "inferred" in the provenance column. Also, have
> you published the heuristics you use (if any) to infer sameAs relations?
> 
...
> 
> Thanks!
> Joel.
> 
> 
> 
So finally getting round to your specific question (although hopefully the
other stuff has also helped).
It would be hard to provide the extra column for quite a few reasons.
We do know where we got the data from, but it may be a SPARQL endpoint, a
dump downloaded, or an email sent to me, for examples. So it would not be
very easy to interpret.
But only a small number of the pairs would be so identified, as all the rest
are inferred from the other pairwise assertions.
We can actually have our own visualisation tools for bundles, with
assertions and dates, etc, but the tool is hard to read if you don't know
what is happening, and...
1) Finding the resources to make it more accessible would be hard.
sameas.org has effectively never been funded - it is my hobby with Ian
Millard, and we would love to have the resources to do this sort of stuff.
I actually have plans for a more sophisticated architecture behind
sameas.org which facilitate this and a lot of other stuff, but again it is a
question of resources.

2) What is the Ontology?
A big question with giving more information is, what is the ontology?
We live in the Linked Data world (for sameas.org), and machine-interpretable
structures.
So sameas.org is designed to be used by services, and t

Re: WordNet RDF

2010-09-09 Thread Daniel O'Connor
see also
http://blog.freebase.com/2010/03/12/help-us-map-wordnet-to-freebase/