Re: DBpedia hosting burden
On Wed, Apr 14, 2010 at 11:50 PM, Daniel Koller dakol...@googlemail.com wrote: Dan, ...I just setup some torrent files containing the current english and german dbpedia content: (.. as a test/proof of concept, was just curious to see how fast a network effect via p2p networks). To try, go to http://dakoller.net/dbpedia_torrents/dbpedia_torrents.html. I presume to get it working you need just the first people downloading (and keep spreading it around w/ their Torrent-Clients)... as long as the *.torrent-files are consistent. (layout of the link page courtesy of the dbpedia-people) Thanks! OK, let's see if my laptop has enough disk space left ;) could you post an 'ls -l' too, so we have an idea of the file sizes? Transmission.app on OSX says Downloading from 1 or 1 peers now (for a few of them), and from 0 of 0 peers for others. Perhaps you have some limits/queue in place? Now this is where my grip on the protocol is weak --- I'm behind NAT currently, and I forget how this works - can other peers find my machine via your public seeder? I'll try this on an ubuntu box too. Would be nice if someone could join with a single simple script... cheers, Dan I was working my way down the list in http://dakoller.net/dbpedia_torrents/dbpedia_torrents.html although when I got to Raw Infobox Property Definitions the first two links 404'd...
Re: DBpedia hosting burden
Ivan Mikhailov wrote: If I were The Emperor of LOD I'd ask all grand dukes of datasources to put fresh dumps at some torrent with control of UL/DL ratio :) Last time I checked (which was quite a while ago though), loading DBpedia in a normal triple store such as Jena TDB didn't work very well due to many issues with the DBpedia RDF (e.g., problems with the URIs of external links scraped from Wikipedia). I don't know whether this is a bug in TDB or DBpedia but I guess this is one of the problems causing people to use DBpedia online only - even if, due to performance reasons, running it locally would be far better. Regards Malte
Re: DBpedia hosting burden
Last time I checked (which was quite a while ago though), loading DBpedia in a normal triple store such as Jena TDB didn't work very well due to many issues with the DBpedia RDF (e.g., problems with the URIs of external links scraped from Wikipedia). Agree. Common errors in LOD are: -- single quoted and double quoted strings with newlines; -- bnode predicates (but SPARQL processor may ignore them!); -- variables, but triples with variables are ignored; -- literal subjects, but triples with them are ignored; -- '/', '#', '%' and '+' in local part of QName (Qname with path); -- invalid symbols between '' and '', i.e. in relative IRIs. That's why my own TURTLE parser is configurable to selectively report or ignore these errors. In addition I can relax TURTLE syntax to include popular violations like redundant delimiters and/or try to recover from lexical errors as much as it is possible, even if I should lose some ill triples together with some limited number of proper triples around them (GIGO mode, for Garbage In Garbage Out). Best Regards, Ivan Mikhailov OpenLink Software http://virtuoso.openlinksw.com
Re: DBpedia hosting burden
I ran the files from http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt through an N-Triples parser with checking: The report is here (it's 25K lines long): http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt It covers both strict errors and warnings of ill-advised forms. A few examples: Bad IRI: =?(''[[Nepenthes Bad IRI: http://www.european-athletics.org‎ Bad lexical forms for the value space: 1967-02-31^^http://www.w3.org/2001/XMLSchema#date (there is no February the 31st) Warning of well known ports of other protocols: http://stream1.securenetsystems.net:443 Warning about explicit about port 80: http://bibliotecadigitalhispanica.bne.es:80/ and use of . and .. in absolute URIs which are all from the standard list of IRI warnings. Bad IRI: http://dbpedia.org/resource/.. Code: 8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ not at the beginning of a relative reference, or it contains a /./ These should be removed. Andy Software used: The IRI checker, by Jeremy Carroll, is available from http://www.openjena.org/iri/ and Maven. The lexical form checking is done by Apache Xerces. The N-triples parser is the one from TDB v0.8.5 which bundles the above two together. On 15/04/2010 9:54 AM, Malte Kiesel wrote: Ivan Mikhailov wrote: If I were The Emperor of LOD I'd ask all grand dukes of datasources to put fresh dumps at some torrent with control of UL/DL ratio :) Last time I checked (which was quite a while ago though), loading DBpedia in a normal triple store such as Jena TDB didn't work very well due to many issues with the DBpedia RDF (e.g., problems with the URIs of external links scraped from Wikipedia). I don't know whether this is a bug in TDB or DBpedia but I guess this is one of the problems causing people to use DBpedia online only - even if, due to performance reasons, running it locally would be far better. Regards Malte
Re: DBpedia hosting burden
Andy Seaborne wrote: On 15/04/2010 2:44 PM, Kingsley Idehen wrote: Andy, Great stuff, this is also why we are going to leave the current DBpedia 3.5 instance to stew for a while (until end of this week or a little later). DBpedia users: Now is the time to identify problems with the DBpedia 3.5 dataset dumps. We don't want to continue reloading DBpedia (Static Edition and then recalibrating DBpedia-Live) based on faulty datasets related matters, we do have other operational priorities etc.. Faulty is a bit strong. Imperfect then, however subjective that might be :-) Many of the warnings are legal RDF, but bad lexical forms for the datatype, or IRIs that trigger some of the standard warnings (but they are still legal IRIs). Should they be included or not? Seems to me you can argue both for and against. external_links_en.nt.bz2 is the largest source of broken IRIs. DBpedia is a wonderful and important dataset, and being derived from elsewhere is unlikely to ever be perfect (for some definition of perfect). Better to have the data than to wait for perfection. That's been the approach thus far. Anyway, as I said, we have a window of opportunity to identify current issues prior to performing a 3.5.1 reload. I just don't want to reduce the reload cycles due to other items on our todo etc.. Andy -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: DBpedia hosting burden
Kingsley Idehen wrote: Andy Seaborne wrote: On 15/04/2010 2:44 PM, Kingsley Idehen wrote: Andy, Great stuff, this is also why we are going to leave the current DBpedia 3.5 instance to stew for a while (until end of this week or a little later). DBpedia users: Now is the time to identify problems with the DBpedia 3.5 dataset dumps. We don't want to continue reloading DBpedia (Static Edition and then recalibrating DBpedia-Live) based on faulty datasets related matters, we do have other operational priorities etc.. Faulty is a bit strong. Imperfect then, however subjective that might be :-) Many of the warnings are legal RDF, but bad lexical forms for the datatype, or IRIs that trigger some of the standard warnings (but they are still legal IRIs). Should they be included or not? Seems to me you can argue both for and against. external_links_en.nt.bz2 is the largest source of broken IRIs. DBpedia is a wonderful and important dataset, and being derived from elsewhere is unlikely to ever be perfect (for some definition of perfect). Better to have the data than to wait for perfection. That's been the approach thus far. Actually meant to say: Anyway, as I said, we have a window of opportunity to identify current issues prior to performing a 3.5.1 reload. ** I jwant to reduce the reload cycles due to other items on our todo etc.. *** :-) -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Please report bugs to be fixed for the DBpedia 3.5.1 release
Hi all, Great stuff, this is also why we are going to leave the current DBpedia 3.5 instance to stew for a while (until end of this week or a little later). DBpedia users: Now is the time to identify problems with the DBpedia 3.5 dataset dumps. We don't want to continue reloading DBpedia (Static Edition and then recalibrating DBpedia-Live) based on faulty datasets related matters, we do have other operational priorities etc.. Yes, the testing by the community has exposed enough small and medium bugs in the datasets so that we are going to extract a new fixed 3.5.1. release next week. I'm my opinion the bugs do not impair Robert's and Anja's great achievement of porting the extraction framework from PHP to Scala. If you rewrite more than 10.000 lines of code for something as complex as a multilingual Wikipedia extraction, I think it is normal that some minor bugs remain even after their tough testing. So, if you have discovered additional bugs and want them fixed. Please report them to the DBpedia bug tracker until Friday EOB. http://sourceforge.net/tracker/?group_id=190976 Cheers, Chris -Ursprüngliche Nachricht- Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im Auftrag von Kingsley Idehen Gesendet: Donnerstag, 15. April 2010 15:44 An: Andy Seaborne Cc: public-lod@w3.org; dbpedia-discussion Betreff: Re: DBpedia hosting burden Andy Seaborne wrote: I ran the files from http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt through an N-Triples parser with checking: The report is here (it's 25K lines long): http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt It covers both strict errors and warnings of ill-advised forms. A few examples: Bad IRI: =?(''[[Nepenthes Bad IRI: http://www.european-athletics.org‎ Bad lexical forms for the value space: 1967-02-31^^http://www.w3.org/2001/XMLSchema#date (there is no February the 31st) Warning of well known ports of other protocols: http://stream1.securenetsystems.net:443 Warning about explicit about port 80: http://bibliotecadigitalhispanica.bne.es:80/ and use of . and .. in absolute URIs which are all from the standard list of IRI warnings. Bad IRI: http://dbpedia.org/resource/.. Code: 8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ not at the beginning of a relative reference, or it contains a /./ These should be removed. Andy Software used: The IRI checker, by Jeremy Carroll, is available from http://www.openjena.org/iri/ and Maven. The lexical form checking is done by Apache Xerces. The N-triples parser is the one from TDB v0.8.5 which bundles the above two together. On 15/04/2010 9:54 AM, Malte Kiesel wrote: Ivan Mikhailov wrote: If I were The Emperor of LOD I'd ask all grand dukes of datasources to put fresh dumps at some torrent with control of UL/DL ratio :) Last time I checked (which was quite a while ago though), loading DBpedia in a normal triple store such as Jena TDB didn't work very well due to many issues with the DBpedia RDF (e.g., problems with the URIs of external links scraped from Wikipedia). I don't know whether this is a bug in TDB or DBpedia but I guess this is one of the problems causing people to use DBpedia online only - even if, due to performance reasons, running it locally would be far better. Regards Malte Andy, Great stuff, this is also why we are going to leave the current DBpedia 3.5 instance to stew for a while (until end of this week or a little later). DBpedia users: Now is the time to identify problems with the DBpedia 3.5 dataset dumps. We don't want to continue reloading DBpedia (Static Edition and then recalibrating DBpedia-Live) based on faulty datasets related matters, we do have other operational priorities etc.. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Infobox information goes into abstracts
A colleague has just pointed out that the English abstract in dbPedia now contains a set of infobox templated data. At least it does for the two Roman emperors I have tried ... [1] Other language abstracts are unaffected. Is this accident or design? Richard [1] http://dbpedia.org/resource/Caligula -- Richard Light
Re: DBpedia hosting burden
On Wed, Apr 14, 2010 at 8:04 PM, Dan Brickley dan...@danbri.org wrote: Bills the major operative word in a world where the Bill Payer and Database Maintainer is a footnote (at best) re. perception of what constitutes the DBpedia Project. If dbpedia.org linked to the sparql endpoints of mirrors then that would be a way of sharing the burden. Ian
Re: DBpedia hosting burden
Ian Davis wrote: On Wed, Apr 14, 2010 at 8:04 PM, Dan Brickley dan...@danbri.org wrote: Bills the major operative word in a world where the Bill Payer and Database Maintainer is a footnote (at best) re. perception of what constitutes the DBpedia Project. If dbpedia.org linked to the sparql endpoints of mirrors then that would be a way of sharing the burden. Ian Ian, When you use the term: SPARQL Mirror (note: Leigh's comments yesterday re. not orienting towards this), you open up a different set of issues. I don't want to revisit SPARQL and SPARQL extensions debate etc.. Esp. as Virtuoso's SPARQL extensions are integral part of what makes the DBpedia SPARQL endpoint viable, amongst other things. The burden issue is basically veering away from the key points, which are: 1. Use the DBpedia instance properly 2. When the instance enforces restrictions, understand that this is a Virtuoso *feature* not a bug or server shortcoming. Beyond the dbpedia.org instance, there are other locations for: 1. Data Sets 2. SPARQL endpoints (like yours and a few others, where functionality mirroring isn't an expectation). Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies, Cache directives, and some 303 heuristics etc.. Are the real issues of interest. Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for Resource Descriptors to a zillion mirrors (maybe next year's April Fool's joke re. beauty of Linked Data crawling) and it will only make broaden the scope of my dysfunctional behavior. The behavior itself has to be handled (one or a zillion mirrors). Anyway, we will publish our guide for working with DBpedia very soon. I believe this will add immense clarity to this matter. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: DBpedia hosting burden
On Thu, Apr 15, 2010 at 9:57 PM, Kingsley Idehen kide...@openlinksw.com wrote: Ian Davis wrote: When you use the term: SPARQL Mirror (note: Leigh's comments yesterday re. not orienting towards this), you open up a different set of issues. I don't want to revisit SPARQL and SPARQL extensions debate etc.. Esp. as Virtuoso's SPARQL extensions are integral part of what makes the DBpedia SPARQL endpoint viable, amongst other things. Having the same dataset available via different implementations of SPARQL can only be healthy. If certain extensions are necessary, this will only highlight their importance. If there are public services offering SPARQL-based access to the DBpedia datasets (or subsets) out there on the Web, it would be rather useful if we could have them linked from a single easy to find page, along with information about any restrictions, quirks, subsetting, or value-adding features special to that service. I suggest using a section in http://en.wikipedia.org/wiki/DBpedia for this, unless someone cares to handle that on dbpedia.org. The burden issue is basically veering away from the key points, which are: 1. Use the DBpedia instance properly 2. When the instance enforces restrictions, understand that this is a Virtuoso *feature* not a bug or server shortcoming. Yes, the showcase implementation needs to be used properly if it is going to survive the increasing attention developer LOD is getting. It is perfectly reasonable of you to make clear when there are limits they are for everyone's benefit. Beyond the dbpedia.org instance, there are other locations for: 1. Data Sets 2. SPARQL endpoints (like yours and a few others, where functionality mirroring isn't an expectation). Is there a list somewhere of related SPARQL endpoints? (also other Wikipedia-derrived datasets in RDF) Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies, Cache directives, and some 303 heuristics etc.. Are the real issues of interest. (am chatting with Daniel Koller in Skype now re the BitTorrent experiments...) Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for Resource Descriptors to a zillion mirrors (maybe next year's April Fool's joke re. beauty of Linked Data crawling) and it will only make broaden the scope of my dysfunctional behavior. The behavior itself has to be handled (one or a zillion mirrors). Sure. But on balance, more mirrors rather than fewer should benefit everyone, particularly if 'good behaviour' is documented and enforced... Anyway, we will publish our guide for working with DBpedia very soon. I believe this will add immense clarity to this matter. Great! cheers, Dan
Re: DBpedia hosting burden
On Thu, Apr 15, 2010 at 9:57 PM, Kingsley Idehen kide...@openlinksw.com wrote: Ian Davis wrote: When you use the term: SPARQL Mirror (note: Leigh's comments yesterday re. not orienting towards this), you open up a different set of issues. I don't want to revisit SPARQL and SPARQL extensions debate etc.. Esp. as Virtuoso's SPARQL extensions are integral part of what makes the DBpedia SPARQL endpoint viable, amongst other things. Having the same dataset available via different implementations of SPARQL can only be healthy. If certain extensions are necessary, this will only highlight their importance. If there are public services offering SPARQL-based access to the DBpedia datasets (or subsets) out there on the Web, it would be rather useful if we could have them linked from a single easy to find page, along with information about any restrictions, quirks, subsetting, or value-adding features special to that service. I suggest using a section in http://en.wikipedia.org/wiki/DBpedia for this, unless someone cares to handle that on dbpedia.org. The burden issue is basically veering away from the key points, which are: 1. Use the DBpedia instance properly 2. When the instance enforces restrictions, understand that this is a Virtuoso *feature* not a bug or server shortcoming. Yes, the showcase implementation needs to be used properly if it is going to survive the increasing attention developer LOD is getting. It is perfectly reasonable of you to make clear when there are limits they are for everyone's benefit. Beyond the dbpedia.org instance, there are other locations for: 1. Data Sets 2. SPARQL endpoints (like yours and a few others, where functionality mirroring isn't an expectation). Is there a list somewhere of related SPARQL endpoints? (also other Wikipedia-derrived datasets in RDF) Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies, Cache directives, and some 303 heuristics etc.. Are the real issues of interest. (am chatting with Daniel Koller in Skype now re the BitTorrent experiments...) Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for Resource Descriptors to a zillion mirrors (maybe next year's April Fool's joke re. beauty of Linked Data crawling) and it will only make broaden the scope of my dysfunctional behavior. The behavior itself has to be handled (one or a zillion mirrors). Sure. But on balance, more mirrors rather than fewer should benefit everyone, particularly if 'good behaviour' is documented and enforced... Anyway, we will publish our guide for working with DBpedia very soon. I believe this will add immense clarity to this matter. Great! cheers, Dan
Re: DBpedia hosting burden
Dan Brickley wrote: On Thu, Apr 15, 2010 at 9:57 PM, Kingsley Idehen kide...@openlinksw.com wrote: Ian Davis wrote: When you use the term: SPARQL Mirror (note: Leigh's comments yesterday re. not orienting towards this), you open up a different set of issues. I don't want to revisit SPARQL and SPARQL extensions debate etc.. Esp. as Virtuoso's SPARQL extensions are integral part of what makes the DBpedia SPARQL endpoint viable, amongst other things. Having the same dataset available via different implementations of SPARQL can only be healthy. If certain extensions are necessary, this will only highlight their importance. If there are public services offering SPARQL-based access to the DBpedia datasets (or subsets) out there on the Web, it would be rather useful if we could have them linked from a single easy to find page, along with information about any restrictions, quirks, subsetting, or value-adding features special to that service. I suggest using a section in http://en.wikipedia.org/wiki/DBpedia for this, unless someone cares to handle that on dbpedia.org. +1 The burden issue is basically veering away from the key points, which are: 1. Use the DBpedia instance properly 2. When the instance enforces restrictions, understand that this is a Virtuoso *feature* not a bug or server shortcoming. Yes, the showcase implementation needs to be used properly if it is going to survive the increasing attention developer LOD is getting. It is perfectly reasonable of you to make clear when there are limits they are for everyone's benefit. Yep, and as promised we will publish a document, this is certainly a missing piece of the puzzle right now. Beyond the dbpedia.org instance, there are other locations for: 1. Data Sets 2. SPARQL endpoints (like yours and a few others, where functionality mirroring isn't an expectation). Is there a list somewhere of related SPARQL endpoints? (also other Wikipedia-derrived datasets in RDF) See: http://delicious.com/kidehen/sparql_endpoint, that's how I track SPARQL endpoints, at the current time. Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies, Cache directives, and some 303 heuristics etc.. Are the real issues of interest. (am chatting with Daniel Koller in Skype now re the BitTorrent experiments...) Yes, seeing progress. Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for Resource Descriptors to a zillion mirrors (maybe next year's April Fool's joke re. beauty of Linked Data crawling) and it will only make broaden the scope of my dysfunctional behavior. The behavior itself has to be handled (one or a zillion mirrors). Sure. But on balance, more mirrors rather than fewer should benefit everyone, particularly if 'good behaviour' is documented and enforced... Yes, LinkedData DNS remains a personal aspiration of mine, but no matter what we build, enforcement needs to be understood as a *feature* rather than a bug or deficiency etc.. Anyway, we will publish our guide for working with DBpedia very soon. I believe this will add immense clarity to this matter. Great! cheers, Dan -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
semantic pingback improvement request for foaf
Hi, I often get asked how one solve the friend request problem on open social networks that use foaf in the hyperdata way. On the closed social networks when you want to make a friend, you send them a request which they can accept or refuse. It is easy to set up, because all the information is located in the same database, owned by the same company. In a distributed social foaf network anyone can link to you, from anywhere, and your acceptance can be expressed most clearly by linking back. The problem is: you need to find out when someone is linking to you. So then the problem is how does one notify people that one is linking to them. Here are the solutions in order of simplicity. 0. Search engine solution - Wait for a search engine to index the web, then ask the search engine which people are linking to you. Problems: - This will tend to be a bit slow, as a search engine optimised to search the whole web will need to be notified first, even if this is only of minor interest to them - It makes the search engine a core part of the communication between two individuals, taking on the role of the central database in closed social networks - It will not work when people deploy foaf+ssl profiles, where they access control who can see their friends. Search engines will not have access to that information, and so will not be able to index it. 1. HTTP Referer Header -- The absolute simplest solution would be just to use the mis-spelled HTTP Referer Header, that was designed to do this job. In a normal HTTP request the location from which the requested URL was found can be placed in the header of the request. http://en.wikipedia.org/wiki/HTTP_referrer The server receiving the request and serving your foaf profile, can then find the answer to the referrer in the web server logs. Perhaps that is all that is needed! When you make a friend request, do the following: 1. add the friend to your foaf profile http://bblfish.net/#hjs foaf:knows http://kingsley.idehen.name/dataspace/person/kidehen#this . 2. Then just do a GET on their Web ID with the Referrer header set to your Web Id. They will then find in their apache logs, something like this: 93.84.41.131 - - [31/Dec/2008:02:36:54 -0600] GET /dataspace/person/kidehen HTTP/1.1 200 19924 http://bblfish.net/; Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 This can then be analysed using incredibly simple scripts such (as described in [1] for example) 3. The server could then just verify that information by a. doing a GET on the Referer URL to find out if indeed it is linking to the users WebId b. do some basic trust analysis (is this WebId known by any of my friends?), in order to rank it before presenting it to the user The nice thing about the above method is that it will work even when the initial linker's server does not have a Ping service for WebIDs. If the pages linking are in html with RDFa most browsers will send the referrer field. There is indeed a Wikipedia entry for this: it is called Refback. http://en.wikipedia.org/wiki/Refback Exactly why Refback is more prone to spam than the ping back or linkback solution is still a bit of a mystery to me. 2. Referer with foaf+ssl In any case the SPAM problem can be reduced by using foaf+ssl [2]. If the WebId is an https WebId - which it really should be! - then the requestor will authentify himself, at least on the protected portion of the foaf profile. So there are the following types of people who could be making the request on your WebId. P1. the person making the friend request Here their WebId and the referer field will match. (this can be useful, as this should be the first request you will receive - a person making a friend request, should at least test the link!) P2. A friend of the person making the friend request Perhaps a friend of P1 goes to his page, comes across your WebId, clicks on it to find out more, and authentifies himself on your page. If P2 is a friend of yours too, then your service would have something somewhat similar to a LinkedIn introduction! P3. Just someone on the web, a crawler... Then you know that he is making his friendship claim public. :-) The above seems to be just some of the interesting information one could get from the analysing the Referer field logs. 3. Pingback --- For some reason though the Referer Header solution was not enough, and so the pingback protocol was invented. http://www.hixie.ch/specs/pingback/pingback I am still not quite clear what this solution brings in addition to the refback one, other than that - it declares the method of the pingback declaratively. If there is a ping back header, then it is clear that it can be used. The referer header is
UK Govt RDF Data Sets
Ian, While on the subject of mirrors and Linked Open Data in general. Do you have any idea as to the whereabouts of RDF data sets for the SPARQL endpoints associated with data.gov.uk? As you can imagine, I haven't opted to crawl your endpoints for the data bearing in LOD community ethos i.e., publish dataset dump locations for SPARQL endpoints that host Linked Open Data. This best practice was devised SPARQL endpoint crawling in mind. Example: http://data.gov.uk/sparql Where would I get the actual RDF datasets loaded into the endpoint above? Here is the RPI example re. data.gov: http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog_-_Complete . -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: UK Govt RDF Data Sets
Kingsley, You should address your question directly to the project organisers, we're a technology provider and host some of the data but it is not up to us when or where the dumps get shared. My understanding is that because this is officially sanctioned data they want to ensure that the provenance is built into the datasets properly. My hope and wish is that the commitment to making dumps available will be built into the guidelines the UK Government are working on. But those won't be issued during this month because of the election. Ian On Thu, Apr 15, 2010 at 11:19 PM, Kingsley Idehen kide...@openlinksw.com wrote: Ian, While on the subject of mirrors and Linked Open Data in general. Do you have any idea as to the whereabouts of RDF data sets for the SPARQL endpoints associated with data.gov.uk? As you can imagine, I haven't opted to crawl your endpoints for the data bearing in LOD community ethos i.e., publish dataset dump locations for SPARQL endpoints that host Linked Open Data. This best practice was devised SPARQL endpoint crawling in mind. Example: http://data.gov.uk/sparql Where would I get the actual RDF datasets loaded into the endpoint above? Here is the RPI example re. data.gov: http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog_-_Complete . -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: UK Govt RDF Data Sets
Ian Davis wrote: Kingsley, You should address your question directly to the project organisers, we're a technology provider and host some of the data but it is not up to us when or where the dumps get shared. My understanding is that because this is officially sanctioned data they want to ensure that the provenance is built into the datasets properly. My hope and wish is that the commitment to making dumps available will be built into the guidelines the UK Government are working on. But those won't be issued during this month because of the election. Okay, but the need for dumps is working its way into the fundamental guidelines for Linked Open Data. As you can imagine (and I have raised these concerns on the UK Govt mailing list a few times), this project is high profile and closely associated with Linked Open Data; thus, unclarity about these RDF dumps is confusing to say the very least. Anyway, I am set for now, will wait and see re. what happens post election etc.. Kingsley Ian On Thu, Apr 15, 2010 at 11:19 PM, Kingsley Idehen kide...@openlinksw.com wrote: Ian, While on the subject of mirrors and Linked Open Data in general. Do you have any idea as to the whereabouts of RDF data sets for the SPARQL endpoints associated with data.gov.uk? As you can imagine, I haven't opted to crawl your endpoints for the data bearing in LOD community ethos i.e., publish dataset dump locations for SPARQL endpoints that host Linked Open Data. This best practice was devised SPARQL endpoint crawling in mind. Example: http://data.gov.uk/sparql Where would I get the actual RDF datasets loaded into the endpoint above? Here is the RPI example re. data.gov: http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog_-_Complete . -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: UK Govt RDF Data Sets
On Fri, Apr 16, 2010 at 12:09 AM, Kingsley Idehen kide...@openlinksw.com wrote: Ian Davis wrote: Kingsley, You should address your question directly to the project organisers, we're a technology provider and host some of the data but it is not up to us when or where the dumps get shared. My understanding is that because this is officially sanctioned data they want to ensure that the provenance is built into the datasets properly. My hope and wish is that the commitment to making dumps available will be built into the guidelines the UK Government are working on. But those won't be issued during this month because of the election. Okay, but the need for dumps is working its way into the fundamental guidelines for Linked Open Data. As you can imagine (and I have raised these concerns on the UK Govt mailing list a few times), this project is high profile and closely associated with Linked Open Data; thus, unclarity about these RDF dumps is confusing to say the very least. Anyway, I am set for now, will wait and see re. what happens post election etc.. I should also add that some datasets do not have dumps, e.g. the reference time and dates http://reference.data.gov.uk/doc/hour/2010-03-23T21 Ian
Re: UK Govt RDF Data Sets
Ian Davis wrote: On Fri, Apr 16, 2010 at 12:09 AM, Kingsley Idehen kide...@openlinksw.com wrote: Ian Davis wrote: Kingsley, You should address your question directly to the project organisers, we're a technology provider and host some of the data but it is not up to us when or where the dumps get shared. My understanding is that because this is officially sanctioned data they want to ensure that the provenance is built into the datasets properly. My hope and wish is that the commitment to making dumps available will be built into the guidelines the UK Government are working on. But those won't be issued during this month because of the election. Okay, but the need for dumps is working its way into the fundamental guidelines for Linked Open Data. As you can imagine (and I have raised these concerns on the UK Govt mailing list a few times), this project is high profile and closely associated with Linked Open Data; thus, unclarity about these RDF dumps is confusing to say the very least. Anyway, I am set for now, will wait and see re. what happens post election etc.. I should also add that some datasets do not have dumps, e.g. the reference time and dates http://reference.data.gov.uk/doc/hour/2010-03-23T21 Ian Yep, I have that: http://linkeddata.uriburner.com/about/html/http/reference.data.gov.uk/doc/hour/2010-03-23T21 :-) -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
twitter's annotation and metadata
Hopefully everybody has heard that Twitter will release some annotation feature which will allow to add metadata to each tweet. I just read this blog post http://scobleizer.com/2010/04/15/twitter-annotations/ http://scobleizer.com/2010/04/15/twitter-annotations/and the following caught my attention: There aren’t any rules as to what can be in this metadata. YET. All the devs I’ve talked to say they expect Twitter to “bless” namespaces so the industry will have one common way to describe common things I'm just wondering what people here think about this. Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com