Re: DBpedia hosting burden

2010-04-15 Thread Dan Brickley
On Wed, Apr 14, 2010 at 11:50 PM, Daniel Koller dakol...@googlemail.com wrote:
 Dan,
 ...I just setup some torrent files containing the current english and german
 dbpedia content: (.. as a test/proof of concept, was just curious to see how
 fast a network effect via p2p networks).
 To try, go to http://dakoller.net/dbpedia_torrents/dbpedia_torrents.html.
 I presume to get it working you need just the first people downloading (and
 keep spreading it around w/ their Torrent-Clients)... as long as the
 *.torrent-files are consistent. (layout of the link page courtesy of the
 dbpedia-people)

Thanks! OK, let's see if my laptop has enough disk space left ;)
could you post an 'ls -l' too, so we have an idea of the file sizes?

Transmission.app on OSX says Downloading from 1 or 1 peers now (for
a few of them), and from 0 of 0 peers for others. Perhaps you have
some limits/queue in place?

Now this is where my grip on the protocol is weak --- I'm behind NAT
currently, and I forget how this works - can other peers find my
machine via your public seeder?

I'll try this on an ubuntu box too. Would be nice if someone could
join with a single simple script...

cheers,

Dan
I was working my way down the list in
http://dakoller.net/dbpedia_torrents/dbpedia_torrents.html
although when I got to Raw Infobox Property Definitions the first two
links 404'd...



Re: DBpedia hosting burden

2010-04-15 Thread Malte Kiesel

Ivan Mikhailov wrote:


If I were The Emperor of LOD I'd ask all grand dukes of datasources to
put fresh dumps at some torrent with control of UL/DL ratio :)


Last time I checked (which was quite a while ago though), loading 
DBpedia in a normal triple store such as Jena TDB didn't work very well 
due to many issues with the DBpedia RDF (e.g., problems with the URIs of 
external links scraped from Wikipedia).


I don't know whether this is a bug in TDB or DBpedia but I guess this is 
one of the problems causing people to use DBpedia online only - even if, 
due to performance reasons, running it locally would be far better.


Regards
Malte



Re: DBpedia hosting burden

2010-04-15 Thread Ivan Mikhailov
 Last time I checked (which was quite a while ago though), loading 
 DBpedia in a normal triple store such as Jena TDB didn't work very well 
 due to many issues with the DBpedia RDF (e.g., problems with the URIs of 
 external links scraped from Wikipedia).

Agree. Common errors in LOD are:

-- single quoted and double quoted strings with newlines;
-- bnode predicates (but SPARQL processor may ignore them!);
-- variables, but triples with variables are ignored;
-- literal subjects, but triples with them are ignored;
-- '/', '#', '%' and '+' in local part of QName (Qname with path);
-- invalid symbols between '' and '', i.e. in relative IRIs.

That's why my own TURTLE parser is configurable to selectively report or
ignore these errors. In addition I can relax TURTLE syntax to include
popular violations like redundant delimiters and/or try to recover from
lexical errors as much as it is possible, even if I should lose some ill
triples together with some limited number of proper triples around them
(GIGO mode, for Garbage In Garbage Out).

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com





Re: DBpedia hosting burden

2010-04-15 Thread Andy Seaborne
I ran the files from 
http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt through 
an N-Triples parser with checking:


The report is here (it's 25K lines long):

http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt

It covers both strict errors and warnings of ill-advised forms.

A few examples:

Bad IRI: =?(''[[Nepenthes
Bad IRI: http://www.european-athletics.org‎

Bad lexical forms for the value space:
1967-02-31^^http://www.w3.org/2001/XMLSchema#date
(there is no February the 31st)


Warning of well known ports of other protocols:
http://stream1.securenetsystems.net:443

Warning about explicit about port 80:

http://bibliotecadigitalhispanica.bne.es:80/

and use of . and .. in absolute URIs which are all from the standard 
list of IRI warnings.


Bad IRI: http://dbpedia.org/resource/.. Code: 
8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ not 
at the beginning of a relative reference, or it contains a /./ These 
should be removed.


Andy

Software used:

The IRI checker, by Jeremy Carroll, is available from
http://www.openjena.org/iri/ and Maven.

The lexical form checking is done by Apache Xerces.

The N-triples parser is the one from TDB v0.8.5 which bundles the above 
two together.



On 15/04/2010 9:54 AM, Malte Kiesel wrote:

Ivan Mikhailov wrote:


If I were The Emperor of LOD I'd ask all grand dukes of datasources to
put fresh dumps at some torrent with control of UL/DL ratio :)


Last time I checked (which was quite a while ago though), loading
DBpedia in a normal triple store such as Jena TDB didn't work very well
due to many issues with the DBpedia RDF (e.g., problems with the URIs of
external links scraped from Wikipedia).

I don't know whether this is a bug in TDB or DBpedia but I guess this is
one of the problems causing people to use DBpedia online only - even if,
due to performance reasons, running it locally would be far better.

Regards
Malte





Re: DBpedia hosting burden

2010-04-15 Thread Kingsley Idehen

Andy Seaborne wrote:



On 15/04/2010 2:44 PM, Kingsley Idehen wrote:

Andy,

Great stuff, this is also why we are going to leave the current DBpedia
3.5 instance to stew for a while (until end of this week or a little
later).

DBpedia users:
Now is the time to identify problems with the DBpedia 3.5 dataset dumps.
We don't want to continue reloading DBpedia (Static Edition and then
recalibrating DBpedia-Live) based on faulty datasets related matters, we
do have other operational priorities etc..


Faulty is a bit strong.


Imperfect then, however subjective that might be :-)


Many of the warnings are legal RDF, but bad lexical forms for the 
datatype, or IRIs that trigger some of the standard warnings (but they 
are still legal IRIs).  Should they be included or not? Seems to me 
you can argue both for and against.


external_links_en.nt.bz2  is the largest source of broken IRIs.

DBpedia is a wonderful and important dataset, and being derived from 
elsewhere is unlikely to ever be perfect (for some definition of 
perfect).  Better to have the data than to wait for perfection.

That's been the approach thus far.

Anyway, as I said, we have a window of opportunity to identify current 
issues prior to performing a 3.5.1 reload. I just don't want to reduce 
the reload cycles due to other items on our todo etc..




Andy




--

Regards,

Kingsley Idehen	  
President  CEO 
OpenLink Software 
Web: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 









Re: DBpedia hosting burden

2010-04-15 Thread Kingsley Idehen

Kingsley Idehen wrote:

Andy Seaborne wrote:



On 15/04/2010 2:44 PM, Kingsley Idehen wrote:

Andy,

Great stuff, this is also why we are going to leave the current DBpedia
3.5 instance to stew for a while (until end of this week or a little
later).

DBpedia users:
Now is the time to identify problems with the DBpedia 3.5 dataset 
dumps.

We don't want to continue reloading DBpedia (Static Edition and then
recalibrating DBpedia-Live) based on faulty datasets related 
matters, we

do have other operational priorities etc..


Faulty is a bit strong.


Imperfect then, however subjective that might be :-)


Many of the warnings are legal RDF, but bad lexical forms for the 
datatype, or IRIs that trigger some of the standard warnings (but 
they are still legal IRIs).  Should they be included or not? Seems to 
me you can argue both for and against.


external_links_en.nt.bz2  is the largest source of broken IRIs.

DBpedia is a wonderful and important dataset, and being derived from 
elsewhere is unlikely to ever be perfect (for some definition of 
perfect).  Better to have the data than to wait for perfection.

That's been the approach thus far.




Actually meant to say:


Anyway, as I said, we have a window of opportunity to identify current 
issues prior to performing a 3.5.1 reload. ** I jwant to reduce the 
reload cycles due to other items on our todo etc..  ***


:-)

--

Regards,

Kingsley Idehen	  
President  CEO 
OpenLink Software 
Web: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 









Please report bugs to be fixed for the DBpedia 3.5.1 release

2010-04-15 Thread Chris Bizer
Hi all,

 Great stuff, this is also why we are going to leave the current DBpedia
 3.5 instance to stew for a while (until end of this week or a little later).
 
 DBpedia users:
 Now is the time to identify problems with the DBpedia 3.5 dataset dumps.
 We don't want to continue reloading DBpedia (Static Edition and then
 recalibrating DBpedia-Live) based on faulty datasets related matters, we
 do have other operational priorities etc..

Yes, the testing by the community has exposed enough small and medium bugs in 
the datasets so that we are going to extract a new fixed 3.5.1. release next 
week.

I'm my opinion the bugs do not impair Robert's and Anja's great achievement of 
porting the extraction framework from PHP to Scala. If you rewrite more than 
10.000 lines of code for something as complex as a multilingual Wikipedia 
extraction, I think it is normal that some minor bugs remain even after their 
tough testing.

So, if you have discovered additional bugs and want them fixed.

Please report them to the DBpedia bug tracker until Friday EOB.

http://sourceforge.net/tracker/?group_id=190976


Cheers,

Chris
 

 -Ursprüngliche Nachricht-
 Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im Auftrag
 von Kingsley Idehen
 Gesendet: Donnerstag, 15. April 2010 15:44
 An: Andy Seaborne
 Cc: public-lod@w3.org; dbpedia-discussion
 Betreff: Re: DBpedia hosting burden
 
 Andy Seaborne wrote:
  I ran the files from
  http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt
  through an N-Triples parser with checking:
 
  The report is here (it's 25K lines long):
 
  http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt
 
  It covers both strict errors and warnings of ill-advised forms.
 
  A few examples:
 
  Bad IRI: =?(''[[Nepenthes
  Bad IRI: http://www.european-athletics.org‎
 
  Bad lexical forms for the value space:
  1967-02-31^^http://www.w3.org/2001/XMLSchema#date
  (there is no February the 31st)
 
 
  Warning of well known ports of other protocols:
  http://stream1.securenetsystems.net:443
 
  Warning about explicit about port 80:
 
  http://bibliotecadigitalhispanica.bne.es:80/
 
  and use of . and .. in absolute URIs which are all from the standard
  list of IRI warnings.
 
  Bad IRI: http://dbpedia.org/resource/.. Code:
  8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../
  not at the beginning of a relative reference, or it contains a /./
  These should be removed.
 
  Andy
 
  Software used:
 
  The IRI checker, by Jeremy Carroll, is available from
  http://www.openjena.org/iri/ and Maven.
 
  The lexical form checking is done by Apache Xerces.
 
  The N-triples parser is the one from TDB v0.8.5 which bundles the
  above two together.
 
 
  On 15/04/2010 9:54 AM, Malte Kiesel wrote:
  Ivan Mikhailov wrote:
 
  If I were The Emperor of LOD I'd ask all grand dukes of datasources to
  put fresh dumps at some torrent with control of UL/DL ratio :)
 
  Last time I checked (which was quite a while ago though), loading
  DBpedia in a normal triple store such as Jena TDB didn't work very well
  due to many issues with the DBpedia RDF (e.g., problems with the URIs of
  external links scraped from Wikipedia).
 
  I don't know whether this is a bug in TDB or DBpedia but I guess this is
  one of the problems causing people to use DBpedia online only - even if,
  due to performance reasons, running it locally would be far better.
 
  Regards
  Malte
 
 
 
 Andy,
 
 Great stuff, this is also why we are going to leave the current DBpedia
 3.5 instance to stew for a while (until end of this week or a little later).
 
 DBpedia users:
 Now is the time to identify problems with the DBpedia 3.5 dataset dumps.
 We don't want to continue reloading DBpedia (Static Edition and then
 recalibrating DBpedia-Live) based on faulty datasets related matters, we
 do have other operational priorities etc..
 
 
 --
 
 Regards,
 
 Kingsley Idehen
 President  CEO
 OpenLink Software
 Web: http://www.openlinksw.com
 Weblog: http://www.openlinksw.com/blog/~kidehen
 Twitter/Identi.ca: kidehen
 
 
 
 





Infobox information goes into abstracts

2010-04-15 Thread Richard Light


A colleague has just pointed out that the English abstract in dbPedia 
now contains a set of infobox templated data.  At least it does for the 
two Roman emperors I have tried ... [1]  Other language abstracts are 
unaffected.


Is this accident or design?

Richard

[1] http://dbpedia.org/resource/Caligula
--
Richard Light



Re: DBpedia hosting burden

2010-04-15 Thread Ian Davis
On Wed, Apr 14, 2010 at 8:04 PM, Dan Brickley dan...@danbri.org wrote:

 Bills the major operative word in a world where the Bill Payer and
 Database Maintainer is a footnote (at best) re. perception of what
 constitutes the DBpedia Project.


If dbpedia.org linked to the sparql endpoints of mirrors then that
would be a way of sharing the burden.

Ian



Re: DBpedia hosting burden

2010-04-15 Thread Kingsley Idehen

Ian Davis wrote:

On Wed, Apr 14, 2010 at 8:04 PM, Dan Brickley dan...@danbri.org wrote:
  

Bills the major operative word in a world where the Bill Payer and
Database Maintainer is a footnote (at best) re. perception of what
constitutes the DBpedia Project.
  


If dbpedia.org linked to the sparql endpoints of mirrors then that
would be a way of sharing the burden.

Ian


  

Ian,

When you use the term: SPARQL Mirror (note: Leigh's comments yesterday 
re. not orienting towards this), you open up a different set of issues. 
I don't want to revisit SPARQL and SPARQL extensions debate etc.. Esp. 
as Virtuoso's SPARQL extensions are integral part of what makes the 
DBpedia SPARQL endpoint viable, amongst other things.


The burden issue is basically veering away from the key points, which are:

1. Use the DBpedia instance properly
2. When the instance enforces restrictions, understand that this is a 
Virtuoso *feature* not a bug or server shortcoming.


Beyond the dbpedia.org instance, there are other locations for:

1. Data Sets
2. SPARQL endpoints (like yours and a few others, where functionality 
mirroring isn't an expectation).


Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies, 
Cache directives, and some 303 heuristics etc.. Are the real issues of 
interest.


Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for 
Resource Descriptors to a zillion mirrors (maybe next year's April 
Fool's joke re. beauty of Linked Data crawling) and it will only make 
broaden the scope of my dysfunctional behavior. The behavior itself has 
to be handled (one or a zillion mirrors).


Anyway, we will publish our guide for working with DBpedia very soon. I 
believe this will add immense clarity to this matter.


--

Regards,

Kingsley Idehen	  
President  CEO 
OpenLink Software 
Web: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 









Re: DBpedia hosting burden

2010-04-15 Thread Dan Brickley
On Thu, Apr 15, 2010 at 9:57 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 Ian Davis wrote:

 When you use the term: SPARQL Mirror (note: Leigh's comments yesterday re.
 not orienting towards this), you open up a different set of issues. I don't
 want to revisit SPARQL and SPARQL extensions debate etc.. Esp. as Virtuoso's
 SPARQL extensions are integral part of what makes the DBpedia SPARQL
 endpoint viable, amongst other things.

Having the same dataset available via different implementations of
SPARQL can only be healthy. If certain extensions are necessary, this
will only highlight their importance. If there are public services
offering SPARQL-based access to the DBpedia datasets (or subsets) out
there on the Web, it would be rather useful if we could have them
linked from a single easy to find page, along with information about
any restrictions, quirks, subsetting, or value-adding features special
to that service. I suggest using a section in
http://en.wikipedia.org/wiki/DBpedia for this, unless someone cares to
handle that on dbpedia.org.

 The burden issue is basically veering away from the key points, which are:

 1. Use the DBpedia instance properly
 2. When the instance enforces restrictions, understand that this is a
 Virtuoso *feature* not a bug or server shortcoming.

Yes, the showcase implementation needs to be used properly if it is
going to survive the increasing attention developer LOD is getting. It
is perfectly reasonable of you to make clear when there are limits
they are for everyone's benefit.

 Beyond the dbpedia.org instance, there are other locations for:

 1. Data Sets
 2. SPARQL endpoints (like yours and a few others, where functionality
 mirroring isn't an expectation).

Is there a list somewhere of related SPARQL endpoints? (also other
Wikipedia-derrived datasets in RDF)

 Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies,
 Cache directives, and some 303 heuristics etc.. Are the real issues of
 interest.

(am chatting with Daniel Koller in Skype now re the BitTorrent experiments...)

 Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for
 Resource Descriptors to a zillion mirrors (maybe next year's April Fool's
 joke re. beauty of Linked Data crawling) and it will only make broaden the
 scope of my dysfunctional behavior. The behavior itself has to be handled
 (one or a zillion mirrors).

Sure. But on balance, more mirrors rather than fewer should benefit
everyone, particularly if 'good behaviour' is documented and
enforced...

 Anyway, we will publish our guide for working with DBpedia very soon. I
 believe this will add immense clarity to this matter.

Great!

cheers,

Dan



Re: DBpedia hosting burden

2010-04-15 Thread Dan Brickley
On Thu, Apr 15, 2010 at 9:57 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 Ian Davis wrote:

 When you use the term: SPARQL Mirror (note: Leigh's comments yesterday re.
 not orienting towards this), you open up a different set of issues. I don't
 want to revisit SPARQL and SPARQL extensions debate etc.. Esp. as Virtuoso's
 SPARQL extensions are integral part of what makes the DBpedia SPARQL
 endpoint viable, amongst other things.

Having the same dataset available via different implementations of
SPARQL can only be healthy. If certain extensions are necessary, this
will only highlight their importance. If there are public services
offering SPARQL-based access to the DBpedia datasets (or subsets) out
there on the Web, it would be rather useful if we could have them
linked from a single easy to find page, along with information about
any restrictions, quirks, subsetting, or value-adding features special
to that service. I suggest using a section in
http://en.wikipedia.org/wiki/DBpedia for this, unless someone cares to
handle that on dbpedia.org.

 The burden issue is basically veering away from the key points, which are:

 1. Use the DBpedia instance properly
 2. When the instance enforces restrictions, understand that this is a
 Virtuoso *feature* not a bug or server shortcoming.

Yes, the showcase implementation needs to be used properly if it is
going to survive the increasing attention developer LOD is getting. It
is perfectly reasonable of you to make clear when there are limits
they are for everyone's benefit.

 Beyond the dbpedia.org instance, there are other locations for:

 1. Data Sets
 2. SPARQL endpoints (like yours and a few others, where functionality
 mirroring isn't an expectation).

Is there a list somewhere of related SPARQL endpoints? (also other
Wikipedia-derrived datasets in RDF)

 Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies,
 Cache directives, and some 303 heuristics etc.. Are the real issues of
 interest.

(am chatting with Daniel Koller in Skype now re the BitTorrent experiments...)

 Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for
 Resource Descriptors to a zillion mirrors (maybe next year's April Fool's
 joke re. beauty of Linked Data crawling) and it will only make broaden the
 scope of my dysfunctional behavior. The behavior itself has to be handled
 (one or a zillion mirrors).

Sure. But on balance, more mirrors rather than fewer should benefit
everyone, particularly if 'good behaviour' is documented and
enforced...

 Anyway, we will publish our guide for working with DBpedia very soon. I
 believe this will add immense clarity to this matter.

Great!

cheers,

Dan



Re: DBpedia hosting burden

2010-04-15 Thread Kingsley Idehen

Dan Brickley wrote:

On Thu, Apr 15, 2010 at 9:57 PM, Kingsley Idehen kide...@openlinksw.com wrote:
  

Ian Davis wrote:

When you use the term: SPARQL Mirror (note: Leigh's comments yesterday re.
not orienting towards this), you open up a different set of issues. I don't
want to revisit SPARQL and SPARQL extensions debate etc.. Esp. as Virtuoso's
SPARQL extensions are integral part of what makes the DBpedia SPARQL
endpoint viable, amongst other things.



Having the same dataset available via different implementations of
SPARQL can only be healthy. If certain extensions are necessary, this
will only highlight their importance. If there are public services
offering SPARQL-based access to the DBpedia datasets (or subsets) out
there on the Web, it would be rather useful if we could have them
linked from a single easy to find page, along with information about
any restrictions, quirks, subsetting, or value-adding features special
to that service. I suggest using a section in
http://en.wikipedia.org/wiki/DBpedia for this, unless someone cares to
handle that on dbpedia.org.
  

+1

  

The burden issue is basically veering away from the key points, which are:

1. Use the DBpedia instance properly
2. When the instance enforces restrictions, understand that this is a
Virtuoso *feature* not a bug or server shortcoming.



Yes, the showcase implementation needs to be used properly if it is
going to survive the increasing attention developer LOD is getting. It
is perfectly reasonable of you to make clear when there are limits
they are for everyone's benefit.
  


Yep, and as promised we will publish a document, this is certainly a 
missing piece of the puzzle right now.
  

Beyond the dbpedia.org instance, there are other locations for:

1. Data Sets
2. SPARQL endpoints (like yours and a few others, where functionality
mirroring isn't an expectation).



Is there a list somewhere of related SPARQL endpoints? (also other
Wikipedia-derrived datasets in RDF)

  


See: http://delicious.com/kidehen/sparql_endpoint, that's how I track 
SPARQL endpoints, at the current time.



Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies,
Cache directives, and some 303 heuristics etc.. Are the real issues of
interest.



(am chatting with Daniel Koller in Skype now re the BitTorrent experiments...)
  


Yes, seeing progress.
  

Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for
Resource Descriptors to a zillion mirrors (maybe next year's April Fool's
joke re. beauty of Linked Data crawling) and it will only make broaden the
scope of my dysfunctional behavior. The behavior itself has to be handled
(one or a zillion mirrors).



Sure. But on balance, more mirrors rather than fewer should benefit
everyone, particularly if 'good behaviour' is documented and
enforced...
  


Yes, LinkedData DNS remains a personal aspiration of mine, but no matter 
what we build, enforcement needs to be understood as a *feature* rather 
than a bug or deficiency etc..
  

Anyway, we will publish our guide for working with DBpedia very soon. I
believe this will add immense clarity to this matter.



Great!

cheers,

Dan

  



--

Regards,

Kingsley Idehen	  
President  CEO 
OpenLink Software 
Web: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 









semantic pingback improvement request for foaf

2010-04-15 Thread Story Henry
Hi,

   I often get asked how one solve the friend request problem on open social 
networks that use foaf in the hyperdata way. 

On the closed social networks when you want to make a friend, you send them a 
request which they can accept or refuse. It is easy to set up, because all the 
information is located in the same database, owned by the same company. In a 
distributed social foaf network anyone can link to you, from anywhere, and your 
acceptance can be expressed most clearly by linking back. The problem is: you 
need to find out when someone is linking to you.


So then the problem is how does one notify people that one is linking to 
them. Here are the solutions in order of simplicity.

   0. Search engine solution
   -

   Wait for a search engine to index the web, then ask the search engine which 
people are linking to you. 

 Problems:

   - This will tend to be a bit slow, as a search engine optimised to search 
the whole web will need to be notified first, even if this is only of minor 
interest to them
   - It makes the search engine a core part of the communication between two 
individuals, taking on the role of the central database in closed social 
networks
   - It will not work when people deploy foaf+ssl profiles, where they access 
control who can see their friends. Search engines will not have access to that 
information, and so will not be able to index it.

   1. HTTP Referer Header
   --

   The absolute simplest solution would be just to use the mis-spelled HTTP 
Referer Header, that was designed to do this job. In a normal HTTP request the 
location from which the requested URL was found can be placed in the header of 
the request.
 
http://en.wikipedia.org/wiki/HTTP_referrer

   The server receiving the request and serving your foaf profile, can then 
find the answer to the referrer in the web server logs.

Perhaps that is all that is needed! When you make a friend request, do the 
following:
  
   1. add the friend to your foaf profile

  http://bblfish.net/#hjs foaf:knows 
http://kingsley.idehen.name/dataspace/person/kidehen#this .

   2. Then just do a GET on their Web ID with the Referrer header set to your 
Web Id. They will then find in their apache logs, something like this:

93.84.41.131 - - [31/Dec/2008:02:36:54 -0600] GET /dataspace/person/kidehen 
HTTP/1.1 200 19924 http://bblfish.net/; Mozilla/5.0 (Windows; U; Windows NT 
5.1; ru; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5

  This can then be analysed using incredibly simple scripts such (as described 
in [1] for example)

   3. The server could then just verify that information by 
 
  a. doing a GET on the Referer URL to find out if indeed it is linking to the 
users WebId 
  b. do some basic trust analysis (is this WebId known by any of my friends?), 
in order to rank it before presenting it to the user

   The nice thing about the above method is that it will work even when the 
initial linker's server does not have a Ping service for WebIDs. If the pages 
linking are in html with RDFa most browsers will send the referrer field.

  There is indeed a Wikipedia entry for this: it is called Refback.
  http://en.wikipedia.org/wiki/Refback

  Exactly why Refback is more prone to spam than the ping back or linkback 
solution is still a bit of a mystery to me.

  2. Referer with foaf+ssl
  

  In any case the SPAM problem can be reduced by using foaf+ssl [2]. If the 
WebId is an https WebId - which it really should be! - then the requestor will 
authentify himself, at least on the protected portion of the foaf profile. So 
there are the following types of people who could be making the request on your 
WebId.
 
  P1. the person making the friend request

   Here their WebId and the referer field will match.
   (this can be useful, as this should be the first request you will receive - 
a person making a friend request, should at least test the link!) 

  P2. A friend of the person making the friend request

   Perhaps a friend of P1 goes to his page, comes across your WebId, clicks on 
it to find out more, and authentifies himself on your page. If P2 is a friend 
of yours too, then your service would have something somewhat similar to a 
LinkedIn introduction!

  P3. Just someone on the web, a crawler...

Then you know that he is making his friendship claim public. :-)

   The above seems to be just some of the interesting information one could get 
from the analysing the Referer field logs.
   

  3. Pingback
  ---

  For some reason though the Referer Header solution was not enough, and so the 
pingback protocol was invented. 
 
http://www.hixie.ch/specs/pingback/pingback

I am still not quite clear what this solution brings in addition to the refback 
one, other than that 

 - it declares the method of the pingback declaratively. If there is a ping 
back header, then it is clear that it can be used. The referer header is 

UK Govt RDF Data Sets

2010-04-15 Thread Kingsley Idehen

Ian,

While on the subject of mirrors and Linked Open Data in general.

Do you have any idea as to the whereabouts of RDF data sets for the 
SPARQL endpoints associated with data.gov.uk? As you can imagine, I 
haven't opted to crawl your endpoints for the data bearing in LOD 
community ethos i.e.,  publish dataset dump locations for SPARQL 
endpoints that host Linked Open Data. This best practice was devised 
SPARQL endpoint crawling in mind.


Example:
http://data.gov.uk/sparql

Where would I get the actual RDF datasets loaded into the endpoint above?

Here is the RPI example re. data.gov:

http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog_-_Complete .

--

Regards,

Kingsley Idehen	  
President  CEO 
OpenLink Software 
Web: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 









Re: UK Govt RDF Data Sets

2010-04-15 Thread Ian Davis
Kingsley,

You should address your question directly to the project organisers,
we're a technology provider and host some of the data but it is not up
to us when or where the dumps get shared. My understanding is that
because this is officially sanctioned data they want to ensure that
the provenance is built into the datasets properly. My hope and wish
is that the commitment to making dumps available will be built into
the guidelines the UK Government are working on. But those won't be
issued during this month because of the election.

Ian

On Thu, Apr 15, 2010 at 11:19 PM, Kingsley Idehen
kide...@openlinksw.com wrote:
 Ian,

 While on the subject of mirrors and Linked Open Data in general.

 Do you have any idea as to the whereabouts of RDF data sets for the SPARQL
 endpoints associated with data.gov.uk? As you can imagine, I haven't opted
 to crawl your endpoints for the data bearing in LOD community ethos i.e.,
  publish dataset dump locations for SPARQL endpoints that host Linked Open
 Data. This best practice was devised SPARQL endpoint crawling in mind.

 Example:
 http://data.gov.uk/sparql

 Where would I get the actual RDF datasets loaded into the endpoint above?

 Here is the RPI example re. data.gov:

 http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog_-_Complete .

 --

 Regards,

 Kingsley Idehen       President  CEO OpenLink Software     Web:
 http://www.openlinksw.com
 Weblog: http://www.openlinksw.com/blog/~kidehen
 Twitter/Identi.ca: kidehen








Re: UK Govt RDF Data Sets

2010-04-15 Thread Kingsley Idehen

Ian Davis wrote:

Kingsley,

You should address your question directly to the project organisers,
we're a technology provider and host some of the data but it is not up
to us when or where the dumps get shared. My understanding is that
because this is officially sanctioned data they want to ensure that
the provenance is built into the datasets properly. My hope and wish
is that the commitment to making dumps available will be built into
the guidelines the UK Government are working on. But those won't be
issued during this month because of the election.
  
Okay, but the need for dumps is working its way into the fundamental 
guidelines for Linked Open Data.


As you can imagine (and I have raised these concerns on the UK Govt 
mailing list a few times), this project is high profile and closely 
associated with Linked Open Data; thus, unclarity about these RDF dumps 
is confusing to say the very least.


Anyway, I am set for now, will wait and see re. what happens post 
election etc..


Kingsley

Ian

On Thu, Apr 15, 2010 at 11:19 PM, Kingsley Idehen
kide...@openlinksw.com wrote:
  

Ian,

While on the subject of mirrors and Linked Open Data in general.

Do you have any idea as to the whereabouts of RDF data sets for the SPARQL
endpoints associated with data.gov.uk? As you can imagine, I haven't opted
to crawl your endpoints for the data bearing in LOD community ethos i.e.,
 publish dataset dump locations for SPARQL endpoints that host Linked Open
Data. This best practice was devised SPARQL endpoint crawling in mind.

Example:
http://data.gov.uk/sparql

Where would I get the actual RDF datasets loaded into the endpoint above?

Here is the RPI example re. data.gov:

http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog_-_Complete .

--

Regards,

Kingsley Idehen   President  CEO OpenLink Software Web:
http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








  



--

Regards,

Kingsley Idehen	  
President  CEO 
OpenLink Software 
Web: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 









Re: UK Govt RDF Data Sets

2010-04-15 Thread Ian Davis
On Fri, Apr 16, 2010 at 12:09 AM, Kingsley Idehen
kide...@openlinksw.com wrote:
 Ian Davis wrote:

 Kingsley,

 You should address your question directly to the project organisers,
 we're a technology provider and host some of the data but it is not up
 to us when or where the dumps get shared. My understanding is that
 because this is officially sanctioned data they want to ensure that
 the provenance is built into the datasets properly. My hope and wish
 is that the commitment to making dumps available will be built into
 the guidelines the UK Government are working on. But those won't be
 issued during this month because of the election.


 Okay, but the need for dumps is working its way into the fundamental
 guidelines for Linked Open Data.

 As you can imagine (and I have raised these concerns on the UK Govt mailing
 list a few times), this project is high profile and closely associated with
 Linked Open Data; thus, unclarity about these RDF dumps is confusing to say
 the very least.

 Anyway, I am set for now, will wait and see re. what happens post election
 etc..



I should also add that some datasets do not have dumps, e.g. the
reference time and dates

http://reference.data.gov.uk/doc/hour/2010-03-23T21

Ian



Re: UK Govt RDF Data Sets

2010-04-15 Thread Kingsley Idehen

Ian Davis wrote:

On Fri, Apr 16, 2010 at 12:09 AM, Kingsley Idehen
kide...@openlinksw.com wrote:
  

Ian Davis wrote:


Kingsley,

You should address your question directly to the project organisers,
we're a technology provider and host some of the data but it is not up
to us when or where the dumps get shared. My understanding is that
because this is officially sanctioned data they want to ensure that
the provenance is built into the datasets properly. My hope and wish
is that the commitment to making dumps available will be built into
the guidelines the UK Government are working on. But those won't be
issued during this month because of the election.

  

Okay, but the need for dumps is working its way into the fundamental
guidelines for Linked Open Data.

As you can imagine (and I have raised these concerns on the UK Govt mailing
list a few times), this project is high profile and closely associated with
Linked Open Data; thus, unclarity about these RDF dumps is confusing to say
the very least.

Anyway, I am set for now, will wait and see re. what happens post election
etc..





I should also add that some datasets do not have dumps, e.g. the
reference time and dates

http://reference.data.gov.uk/doc/hour/2010-03-23T21

Ian

  


Yep, I have that: 
http://linkeddata.uriburner.com/about/html/http/reference.data.gov.uk/doc/hour/2010-03-23T21  
:-)


--

Regards,

Kingsley Idehen	  
President  CEO 
OpenLink Software 
Web: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 









twitter's annotation and metadata

2010-04-15 Thread Juan Sequeda
Hopefully everybody has heard that Twitter will release some annotation
feature which will allow to add metadata to each tweet.

I just read this blog post
http://scobleizer.com/2010/04/15/twitter-annotations/

http://scobleizer.com/2010/04/15/twitter-annotations/and the following
caught my attention: There aren’t any rules as to what can be in this
metadata. YET. All the devs I’ve talked to say they expect Twitter to
“bless” namespaces so the industry will have one common way to describe
common things

I'm just wondering what people here think about this.


Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com