Hi,

I am the principal maintainer of the endpoints provided by OpenLink.

> Hello everyone! I have three questions/requests, hopefully they will be easy 
> ones to answer/implement.
> 
> 
> First, I would like to run an automated, periodic SPARQL query against a 
> somewhat up-to-date DBpedia endpoint, and get, at a minimum, the 
> <http://dbpedia.org/ontology/abstract> and 
> <http://www.w3.org/2000/01/rdf-schema#label> properties. I am currently using 
> the query:
> 
> SELECT * WHERE {
>   <http://dbpedia.org/resource/$title> ?rel ?value .
> }
> 
> where $title is inserted from a list of (currently) 125 articles I am 
> interested in.
> I had been running it against http://dbpedia.org/sparql this past week while 
> I developed the code, but today saw this thread:
> http://answers.semanticweb.com/questions/10204/how-often-dbpedia-is-updated-with-wikipedia-data
> and tried changing the endpoint to http://live.dbpedia.org/sparql .
> That server appears to be rate-limited or actively anti-automation, as I got 
> a 503 error on the 9th request (the first 8 went through in a second or two).
> 
> So, is there a place I can go to, or an API key I can obtain, such that I'd 
> be able to refresh our Wikipedia abstracts on, say, a daily or weekly basis 
> using fresh Wikipedia data? Again, it is a very limited set of articles I am 
> interested in (low hundreds), so the burden on the other end would be fairly 
> minimal, and I can schedule it to whatever time suits you.

There are two live Dbpedia endpoints at this time:

        http://live.dbpedia.org/sparql

        http://dbpedia-live.openlinksw.com/sparql

Both of which have rate limiting to curb over-enthusiastic use by a single IP 
address. Note that at this time the service provided by OpenLink accepts higher 
connection rates.

This does not mean that you can just query an endpoint and bombard it with 
requests without taking a few precautions.

First and foremost since your requests are to the /sparql endpoint using the 
HTTP protocol, you could and should check the HTTP status requests to make sure 
your query actually executed successfully. Your code could easily check for a 
503 and do the appropriate thing by sleeping an arbitrary amount of time (say 1 
min) and try the same again, which in most cases would return a result as the 
block is lifted. There are other HTP status codes you should probably check too.

Secondly since you state you only want to do this 'periodically' you could also 
just sleep 1 seconds between each query. Since your service seems to be a 
background lookup, this would ensure you never hit a rate limit either.  

I checked our systems and our endpoints allow at least 20 requests per second 
from a single IP address.

Should this be impossible for your service, please contact me 
<mailto:pkl...@openlinksw.com> and i can see what we can do.

> 
> Secondly, another downside is the two live endpoints mentioned in that thread 
> (the other being http://dbpedia-live.openlinksw.com/sparql) have a different 
> set of triples from both each other and from the biannual regular DBpedia. 
> Neither of them include the full abstract that I am interested in.
> 
> Contrast:
> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+*+WHERE+%7B+dbpedia%3AGiselle+%3Fp+%3Fo+%7D&format=text%2Fhtml&timeout=0&debug=on
> 
> http://live.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+*+WHERE+%7B+dbpedia%3AGiselle+%3Fp+%3Fo+%7D&format=text%2Fhtml&timeout=0&debug=on
> 
> http://dbpedia-live.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+*+where+%7B+dbpedia%3AGiselle+%3Frel+%3Fvalue+%7D&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on
> 
> Against the very detailed:
> http://lod2.openlinksw.com/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FGiselle&p=1&sid=373834&graph=http%3A%2F%2Fdbpedia.org%2Fdata%2FGiselle.xml&lp=2&op=-1&next=&gp=1
> 
> (though the lod2 one seems to be lacking in @xml:lang attributes on its 
> non-English HTML elements!)
> 
> Does anyone feel like adding these missing predicates to the live DBpediae?

We are currently looking into this will see if we can add some extra static 
datasets for existing articles.

Note that the various endpoints have different versions of the DBpedia data:

        lod2.openlinksw.com             dbpedia 3.9             This server is 
scheduled for a reload soon and will load the dbpedia 3.10 / 2014 data
        dbpedia.org                     dbpedia 3.10            Data from 
Wikipedia dump april/may
        live.dbpedia.org                dbpedia 3.10 + updates  Up to date with 
latest page updates from Wikipedia       
        dbpedia-live.openlinksw.com     dbpedia 3.10 + updates  Up to date with 
latest page updates from Wikipedia

> 
> 
> My third question is, all of the endpoints provide a foaf:primaryTopicOf edge 
> pointing to the English wikipedia page — surely it should have all languages? 
> Ideally, I would like links to each of the other language Wikipedia articles 
> with some tie between the abstract and the wikipedia URL it came from (as 
> there's not a 1:1 relation between ISO language codes and wikipedia 
> subdomains, e.g. "Chū-jiân"@nan -> 
> http://zh-min-nan.wikipedia.org/wiki/Ch%C5%AB-ji%C3%A2n so trying to generate 
> a URI myself using the refs:label + language code will not always work).
> How could this be done, and is anyone willing to do it? TBH I would be happy 
> if the URIs were merely string literals tagged with the corresponding ISO 
> language, though that's obviously far from ideal in terms of LOD. Perhaps 
> both string literals and an array of (untagged) foaf:primaryTopicOf triples 
> would be good enough.


I will discuss this within the dbpedia live team.



Patrick
---
OpenLink Software
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to