Re: Minting URIs: how to deal with unknown data structures

2011-04-15 Thread Michael Hausenblas


Frans,

Great to hear that you're interested in applying Linked Data and to  
promote it in the Netherlands - certainly a very active area ;)


I would welcome any advice on this topic from people who have had  
some more experience with publishing Linked Data.


I find [1] a very useful page from a pragmatic perspective. If you're  
more into books and not only focusing on the data side (see 'REST and  
Linked Data: a match made for domain driven development?' [2] for more  
details on data vs. API), I can also recommend [3], which offers some  
more practical guidance in terms of URI space management.


Cheers,
Michael

[1] http://data.gov.uk/resources/uris
[2] http://ws-rest.org/2011/proc/a5-page.pdf
[3] http://oreilly.com/catalog/9780596529260
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 15 Apr 2011, at 13:48, Frans Knibbe wrote:


Hello,

Some newbie questions here...

I have recently come in contact with the concept of Linked Data and  
I have become enthusiastic. I would like to promote the idea within  
my company (we specialize is geographical data) and within my  
country. I have read the excellent Linked Data book (“Linked Data:  
Evolving the Web into a Global Data Space”) and I think I am almost  
ready to start publishing Linked Data. I understand that it is  
important to get the URIs right, and not have to change them later.  
That is what my questions are about.


I have acquired the first part (authority) of my URIs, let's say it  
is lod.mycompany.com. Now I am faced with the question: How do I  
come up with a URI scheme that will stand the test of time? I think  
I will start with publishing some FOAF data of myself and co- 
workers. And then hopefully more and more data will follow. At this  
moment I can not possible imagine which types of data we will  
publish. They are likely to have some kind of geographical  
component, but that is true for a lot of data. I believe it is not  
possible to come up with any hierarchical structure that will  
accommodate all types of data that might ever be published.


So I think it is best to leave out any indication of data  
organization in the path element of the URI (i.e. http://lod.mycompany.com/people 
 is a bad idea). In my understanding, I could use base URIs like http://lod.mycompany.com/resource 
, http://lod.mycompany.com/page and hhtp://lod.mycompany.com.data,  
and then use unique identifiers for all the things I want to publish  
something about. If I understand correctly, I don't need the URI to  
describe the hierarchy of my data because all Linked Data are self- 
describing. Nice.


But then I am faced with the problem: What method do I use to mint  
my identifiers? Those identifiers need to be unique. Should I use a  
number sequence, or a hash function? In those cases the URIs would  
be uniform and give no indication of the type of data. But a number  
sequence seems unsafe, and in the case of a hash function I would  
still need to make some kind of structured choice of input values.


I would welcome any advice on this topic from people who have had  
some more experience with publishing Linked Data.


Regards,
Frans Knibbe










Minting URIs: how to deal with unknown data structures

2011-04-15 Thread Frans Knibbe

Hello,

Some newbie questions here...

I have recently come in contact with the concept of Linked Data and I 
have become enthusiastic. I would like to promote the idea within my 
company (we specialize is geographical data) and within my country. I 
have read the excellent Linked Data book (“Linked Data: Evolving the Web 
into a Global Data Space”) and I think I am almost ready to start 
publishing Linked Data. I understand that it is important to get the 
URIs right, and not have to change them later. That is what my questions 
are about.


I have acquired the first part (authority) of my URIs, let's say it is 
lod.mycompany.com. Now I am faced with the question: How do I come up 
with a URI scheme that will stand the test of time? I think I will start 
with publishing some FOAF data of myself and co-workers. And then 
hopefully more and more data will follow. At this moment I can not 
possible imagine which types of data we will publish. They are likely to 
have some kind of geographical component, but that is true for a lot of 
data. I believe it is not possible to come up with any hierarchical 
structure that will accommodate all types of data that might ever be 
published.


So I think it is best to leave out any indication of data organization 
in the path element of the URI (i.e. http://lod.mycompany.com/people is 
a bad idea). In my understanding, I could use base URIs like 
http://lod.mycompany.com/resource, http://lod.mycompany.com/page and 
hhtp://lod.mycompany.com.data, and then use unique identifiers for all 
the things I want to publish something about. If I understand correctly, 
I don't need the URI to describe the hierarchy of my data because all 
Linked Data are self-describing. Nice.


But then I am faced with the problem: What method do I use to mint my 
identifiers? Those identifiers need to be unique. Should I use a number 
sequence, or a hash function? In those cases the URIs would be uniform 
and give no indication of the type of data. But a number sequence seems 
unsafe, and in the case of a hash function I would still need to make 
some kind of structured choice of input values.


I would welcome any advice on this topic from people who have had some 
more experience with publishing Linked Data.


Regards,
Frans Knibbe







Re: Discussion meta-comment

2011-04-15 Thread Hugh Glaser

On 15 Apr 2011, at 01:02, Kingsley Idehen wrote:

> On 4/14/11 6:42 PM, Hugh Glaser wrote:
>> On 14 Apr 2011, at 21:35, Kingsley Idehen wrote:
>> 
>>> On 4/14/11 4:11 PM, Hugh Glaser wrote:
 On 14 Apr 2011, at 20:11, Kingsley Idehen wrote:
 
> On 4/14/11 2:55 PM, Hugh Glaser wrote:
>> On 14 Apr 2011, at 12:33, Kingsley Idehen wrote:
>> 
>>> On 4/14/11 7:10 AM, Hugh Glaser wrote:
 Hi Kingsley,
 
 On 12 Apr 2011, at 22:33, Kingsley Idehen wrote:
 
> On 4/12/11 4:33 PM, David. Huynh wrote:
>>>  I, as well as others I know, find the culture that has developed 
>>> on this list of responses saying "Well this is how I do it" 
>>> alienating, and thus sometimes a barrier to posting and genuine 
>>> responses, and so actually stifles discussion.
> David/Hugh,
> 
> I get the point, but don't know the comment target, so I'll respond 
> with regards to myself as one participant in today's extensive debate 
> with Glenn.
> 
> I hope I haven't said or inferred "this is how we/I do it" without 
> providing at the very least a link to what I am talking about etc?
 I think that is true.
 But that is exactly the issue I was raising.
 As I said, I don't think this is usually the best way to respond to a 
 post.
>>> Hugh,
>>> 
>>> I am a little confused.
>>> 
>>> The problem is providing a link to accentuate a point or not doing so?  
>>> Put differently you are talking about more text and fewer links or more 
>>> links and fewer text?
>> Hi,
>> Essentially different text and no links.
> Hugh,
> 
> In this thread lies a simple example. If you gave me a link to the thread 
> in question I am just a click away from truly comprehending your point. 
> This forum is about Linked Data, right ? :-)
> 
> Please (seriously now) give me a link that exemplifies your point. 
> Context is inherently subjective, public forums accentuate this reality.
> 
> FWIW -- I prefer to show rather than tell, I am also an instinctive 
> dog-fooder so I link. The power of hyperlinks (links) continue to exceed 
> my personal imagination. The productively gains that I enjoy courtesy of 
> links is something I still can't quite quantify. That's why I haven't 
> made a movie, post, or presentation (yet) with regards to the utter power 
> of links.
> 
> I await your link, I do want to be much clearer about your point.
> 
> Kingsley
 Thanks for asking, Kingsley.
 Yes, I don't really want to just repeat the posting.
 So I looked at the latest thread on this list: "15 Ways to Think About 
 Data Quality (Just for a Start)", to try to illustrate what I mean.
 Briefly:
 It started with a message with a proposal to try to quantify quality.
 It was followed by 4 people who seemed very interested and engaged, and 
 who began to discuss the details.
 But that was quickly followed by a transition to an interaction that 
 centred around discussing your demos, numbering more than 30 messages.
>>> But you are overlooking the opening paragraph of the post.
>>> 
>>> You are overlooking the history of the post, including the fact that I 
>>> asked Glenn to make the post.
>>> 
>>> I asked Glenn to make the post because we've had a reoccurring debate, and 
>>> I've always suspected a fundamental disconnect.
>>> 
>>> If you could, please juxtapose the start with the final post.
>>> 
 In my opinion, after the initial phase, the discussion then made little 
 progress towards what might have resulted in an interesting consensus with 
 a number of people contributing.
 An opportunity lost.
>>> How can it be a lost opportunity? The conversation is threaded. And if for 
>>> whatever reasons the conversation is deemed linear to you, what stops you 
>>> opening a new thread with a tweak to the opening paragraph which had a 
>>> direct reference to opinions I expressed about the inherent subjectivity of 
>>> data quality, courtesy of context fluidity. Start a new thread, don't make 
>>> a reference to me, and my silence will be utterly deafening, no joke.
>>> 
>>> As for the links, the purpose (as per usual with me) was to back up my 
>>> point with live linked data. Basically, if I believe data quality 
>>> subjectivity is a function of context fluidity, why not show the very point 
>>> via a Linked Data page that accentuates the loose coupling of information 
>>> and data that's vital to addressing the conundrum in question?
>>> 
>>> Kingsley
>> Hi,
>> I am looking at the process and outcomes I observe, rather than delving into 
>> the details.
>> It is not about whether people could have acted differently - it is about 
>> how people actually did act.
>> A lost opportunity? Clearly there were a number of people who had opinions

Re: Take2: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-15 Thread Bob Ferris

Hi Glenn,

thanks a lot for your insightful thoughts. I think, I can fully agree to 
them. This topic reminds me a bit of a question I stated some time ago 
on SemanticOverflow (now answers.semanticweb.com):


"When should I use explicit/anonymous defined inverse properties?" [1]

(btw, this question is still not marked as "answered" ;) )

Cheers,


Bob


[1] 
http://answers.semanticweb.com/questions/1126/when-should-i-use-explicitanonymous-defined-inverse-properties 



On 4/15/2011 3:47 PM, glenn mcdonald wrote:

This reminds me to come back to the point about what I initially
called Directionality, and Dave improved to Modeling Consistency.

Dave is right, I think, that in terms of data quality, it is
consistency that matters, not directionality. That is, as long as we
know that a president was involved in a presidency, it doesn't matter
whether we know that because the president linked to the presidency,
or the presidency linked to the president. In fact, in a relational
database the president and the presidency and the link might even be
in three separate tables. From a data-mathematical perspective, it
doesn't matter. All of these are ways of expressing the same logical
construct. We just want it to be done the same way for all
presidents/presidencies/links.

But although directionality is immaterial for data *quality*, it
matters quite a bit for the usability of the system in which the data
reaches people. We know, for example, that in the real world
presidents have presidencies, and vice versa. But think about what it
takes to find out whether this information is represented in a given
dataset:

- In a classic SQL-style relational database we probably have to just
know the schema, as there's usually no exploratory way to find this
kind of thing out. The RDBMS formalism doesn't usually represent the
relationships between tables. You not only have to know it from
external sources, but you have to restate it in each SQL join-query.
This may be acceptable in a database with only a few tables, where the
field-headings are kept consistent by convention, but it's extremely
problematic when you're trying to combine formerly-separate datasets
into large ones with multiple dimensions and purposes. If the LOD
cloud were in relational tables, it would be awful. Arguably the main
point of the cloud is to get the data out of relational tables (where
most of it probably originates) into a graph where the connections are
actually represented instead of implied.

- But even in RDF, directionality poses a significant discovery
problem. In a minimal graph (let's say "minimal graph" means that each
relationship is asserted in only one direction, so there's no
relationship redundancy), you can't actually explore the data
navigationally. You can't go to a single known point of interest, like
a given president, and explore to find out everything the data holds
and how it connects. You can explore the *outward* relationships from
any given point, but to find out about the *inward* relationships you
have to keep doing new queries over the entire dataset. The same basic
issue applies to an XML representation of the data as a tree: you can
squirrel your way down, but only in the direction the original modeler
decided was "down". If you need a different direction, you have to
hire a hypersquirrel.

- Of course, most RDF-presenting systems recognize this as a usability
problem, and address it by turning the minimal graph into a redundant
graph for UI purposes. Thus in a data-browser UI you usually see, for
a given node, lists of both outward and inward relationships. This is
better, but if this abstraction is done at the UI layer, you still
lose it once you drop down into the SPARQL realm. This makes the
SPARQL queries harder to write, because you can't write them the way
you logically think about the question, you have to write them the way
the data thinks about the question. And this skew from real logic to
directional logic can make them *much* harder to understand or
maintain, because the directionality obscures the purpose and reduces
the self-documenting nature of the query.


All of this is *much* better, in usability terms, if the data is
redundantly, bi-directionally connected all the way down to the level
of abstraction at which you're working. Now you can explore to figure
out what's there, and you can write your queries in the way that makes
the most human sense. The artificicial skew between the logical
structure and the representational structure has been removed. This is
perfectly possible in an RDF-based system, of course, if the software
either generates or infers the missing inverses. We incur extra
machine overhead to reduce the human congnitive burden. I contend this
should be considered a nearly-mandatory best-practice for linked data,
and that propogating inverses around the LOD cloud ought to be one of
things that makes the LOD cloud *a thing*, rather than just a
collection of logical silos.





Re: How many instances of foaf:Person are there in the LOD Cloud?

2011-04-15 Thread Kingsley Idehen

On 4/15/11 9:49 AM, Michael Brunnbauer wrote:

re

On Fri, Apr 15, 2011 at 09:01:52AM -0400, Kingsley Idehen wrote:

http://www.foaf-search.net/TopDomains

Any chance of a CSV dump too? You can expose the CSV dump URL via
  in  of the HTML doc :-)

I don't think there are many people who actually need those numbers as
CSV but I have added a CSV version anyway. Changing the  of that page
is too much hassle with my CMS right now.

Regards,

Michael Brunnbauer


Michael,

That works fine! I should have said: make the CSV resource human and 
machine discoverable :-)


--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Re: How many instances of foaf:Person are there in the LOD Cloud?

2011-04-15 Thread Michael Brunnbauer

re

On Fri, Apr 15, 2011 at 09:01:52AM -0400, Kingsley Idehen wrote:
> >http://www.foaf-search.net/TopDomains
> Any chance of a CSV dump too? You can expose the CSV dump URL via 
>  in  of the HTML doc :-)

I don't think there are many people who actually need those numbers as
CSV but I have added a CSV version anyway. Changing the  of that page
is too much hassle with my CMS right now.

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel



Re: Take2: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-15 Thread glenn mcdonald
This reminds me to come back to the point about what I initially
called Directionality, and Dave improved to Modeling Consistency.

Dave is right, I think, that in terms of data quality, it is
consistency that matters, not directionality. That is, as long as we
know that a president was involved in a presidency, it doesn't matter
whether we know that because the president linked to the presidency,
or the presidency linked to the president. In fact, in a relational
database the president and the presidency and the link might even be
in three separate tables. From a data-mathematical perspective, it
doesn't matter. All of these are ways of expressing the same logical
construct. We just want it to be done the same way for all
presidents/presidencies/links.

But although directionality is immaterial for data *quality*, it
matters quite a bit for the usability of the system in which the data
reaches people. We know, for example, that in the real world
presidents have presidencies, and vice versa. But think about what it
takes to find out whether this information is represented in a given
dataset:

- In a classic SQL-style relational database we probably have to just
know the schema, as there's usually no exploratory way to find this
kind of thing out. The RDBMS formalism doesn't usually represent the
relationships between tables. You not only have to know it from
external sources, but you have to restate it in each SQL join-query.
This may be acceptable in a database with only a few tables, where the
field-headings are kept consistent by convention, but it's extremely
problematic when you're trying to combine formerly-separate datasets
into large ones with multiple dimensions and purposes. If the LOD
cloud were in relational tables, it would be awful. Arguably the main
point of the cloud is to get the data out of relational tables (where
most of it probably originates) into a graph where the connections are
actually represented instead of implied.

- But even in RDF, directionality poses a significant discovery
problem. In a minimal graph (let's say "minimal graph" means that each
relationship is asserted in only one direction, so there's no
relationship redundancy), you can't actually explore the data
navigationally. You can't go to a single known point of interest, like
a given president, and explore to find out everything the data holds
and how it connects. You can explore the *outward* relationships from
any given point, but to find out about the *inward* relationships you
have to keep doing new queries over the entire dataset. The same basic
issue applies to an XML representation of the data as a tree: you can
squirrel your way down, but only in the direction the original modeler
decided was "down". If you need a different direction, you have to
hire a hypersquirrel.

- Of course, most RDF-presenting systems recognize this as a usability
problem, and address it by turning the minimal graph into a redundant
graph for UI purposes. Thus in a data-browser UI you usually see, for
a given node, lists of both outward and inward relationships. This is
better, but if this abstraction is done at the UI layer, you still
lose it once you drop down into the SPARQL realm. This makes the
SPARQL queries harder to write, because you can't write them the way
you logically think about the question, you have to write them the way
the data thinks about the question. And this skew from real logic to
directional logic can make them *much* harder to understand or
maintain, because the directionality obscures the purpose and reduces
the self-documenting nature of the query.


All of this is *much* better, in usability terms, if the data is
redundantly, bi-directionally connected all the way down to the level
of abstraction at which you're working. Now you can explore to figure
out what's there, and you can write your queries in the way that makes
the most human sense. The artificicial skew between the logical
structure and the representational structure has been removed. This is
perfectly possible in an RDF-based system, of course, if the software
either generates or infers the missing inverses. We incur extra
machine overhead to reduce the human congnitive burden. I contend this
should be considered a nearly-mandatory best-practice for linked data,
and that propogating inverses around the LOD cloud ought to be one of
things that makes the LOD cloud *a thing*, rather than just a
collection of logical silos.



Re: How many instances of foaf:Person are there in the LOD Cloud?

2011-04-15 Thread Kingsley Idehen

On 4/15/11 6:32 AM, Michael Brunnbauer wrote:

re

On Wed, Apr 13, 2011 at 05:17:10PM -0400, Kingsley Idehen wrote:

opera.com   247281
ecademy.com 224875

[...]

How a publicly shared Google spreadsheet doc? From there to Linked Data
is a short journey :-)

Not a spreadsheet but HTML (Top 50, updated daily):

http://www.foaf-search.net/TopDomains

cu,
brunni

Any chance of a CSV dump too? You can expose the CSV dump URL via 
 in  of the HTML doc :-)


--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Take2: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-15 Thread Kingsley Idehen

All,

Here is a new thread re. topic above. I believe this matter is 
interesting to many, so I've basically taken the original post (modulo 
1st sentence in the original) and also appended comments from Dave 
Reynolds which he contributed as an  extension of Glenn's original list. 
In addition, I've added a link to the answers.com thread [1] as they 
route may also be preferable to some of you.


Please feel free to discuss this important topic. We can do this without 
acrimony :-)



Data "beauty" might be subjective, and the same data may have different 
applicability to different tasks, but there are a lot of obvious and 
straightforward ways of thinking about the quality of a dataset 
independent of the particular preferences of individual beholders. Here 
are just some of them:


1. Accuracy: Are the individual nodes that refer to factual information 
factually and lexically correct. Like, is Chicago spelled "Chigaco" or 
does the dataset say its population is 2.7?


2. Intelligibility: Are there human-readable labels on things, so you 
can tell what a thing is when you're looking at? Is there a model, so 
you can tell what questions you can ask? If a thing has multiple labels 
(or a set of owl:sameAs things havemlutiple labels), do you know which 
(or if) one is canonical?


3. Referential correspondence: If a set of data points represents some 
set of real-world referents, is there one and only one point per 
referent? If you have 9,780 data points representing cities, but 5 of 
them are "Chicago", "Chicago, IL", "Metro Chicago", "Metropolitain 
Chicago, Illinois" and "Chicagoland", that's bad.


4. Completeness: Where you have data representing a clear finite set of 
referents, do you have them all? All the countries, all the states, all 
the NHL teams, etc? And if you have things related to these sets, are 
those projections complete? Populations of every country? Addresses of 
arenas of all the hockey teams?


5. Boundedness: Where you have data representing a clear finite set of 
referents, is it unpolluted by other things? E.g., can you get a list of 
current real countries, not mixed with former states or fictional 
empires or adminstrative subdivisions?


6. Typing: Do you really have properly typed nodes for things, or do you 
just have literals? The first president of the US was not "George 
Washington"^^xsd:string, it was a person whose name-renderings include 
"George Washington". Your ability to ask questions will be constrained 
or crippled if your data doesn't know the difference.


7. Modeling correctness: Is the logical structure of the data properly 
represented? Graphs are relational databases without the crutch of 
"rows"; if you screw up the modeling, your queries will produce garbage.


8. Modeling granularity: Did you capture enough of the data to actually 
make use of it. ":us :president :george_washington" isn't exactly wrong, 
but it's pretty limiting. Model presidencies, with their dates, and 
you've got much more powerful data.


9. Connectedness: If you're bringing together datasets that used to be 
separate, are the join points represented properly. Is the US from your 
country list the same as (or owl:sameAs) the US from your list of 
presidencies and the US from your list of world cities and their 
populations?


10. Isomorphism: If you're bring together datasets that used to be 
separate, are their models reconciled? Does an album contain songs, or 
does it contain tracks which are publications of recordings of songs, or 
something else? If each data point answers this question differently, 
even simple-seeming queries may be intractable.


11. Currency: Is the data up-to-date?

12. Directionality: Can you navigate the logical binary relationships in 
either direction? Can you get from a country to its presidencies to 
their presidents, or do you have to know to only ask about presidents' 
presidencies' countries? Or worse, do you have to ask every question in 
permutations of directions because some data asserts things one way and 
some asserts it only the other?


13. Attribution: If your data comes from multiple sources, or in 
multiple batches, can you tell which came from where?


14. History: If your data has been edited, can you tell how and by whom?

15. Internal consistency: Do the populations of your counties add up to 
the populations of your states? Do the substitutes going into your 
soccer matches balance the substitutes going out?



That's by no means an exhaustive list, and I didn't even start on the 
kinds of quality you can start talking about if you widen the scope of 
what you mean by "a dataset" to include the environment in which it's 
made available: performance, query repeatability, explorational 
fluidity, expressiveness of inquiry, analytic power, UI intelligibility, 
openness...


Plus these comments from Dave Reynolds:

That's a fantastic list and should be recorded on a wiki somewhere!

A minor quibble, not sure about Directionality. Yo

Re: Discussion meta-comment

2011-04-15 Thread Kingsley Idehen

On 4/15/11 4:58 AM, Keith Alexander wrote:

On Thu, Apr 14, 2011 at 11:42 PM, Hugh Glaser  wrote:


Hi,
I am looking at the process and outcomes I observe, rather than delving into 
the details.
It is not about whether people could have acted differently - it is about how 
people actually did act.
A lost opportunity? Clearly there were a number of people who had opinions, and 
seemed ready to engage in a discussion. I would have been interested to hear 
what they had to say. But the social dynamics (in my opinion) were such that 
they no longer chose to contribute.
In answer to your last question: Because the discussion then becomes about the 
page, rather than principle, or even original topic; but I begin to repeat 
myself.
Best
Hugh


+1
I had the same impression. The thread started off well, but diverged
and became acrimonious and I for one was put off following it because
of that.

Keith



Keith,

If acrimonious is your point, then yes. +1000 .

Healthy debate doesn't have to be acrimonious. In my world debates are a 
mechanism for building rather than burning bridges :-)


--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Re: How many instances of foaf:Person are there in the LOD Cloud?

2011-04-15 Thread Michael Brunnbauer

re

On Wed, Apr 13, 2011 at 05:17:10PM -0400, Kingsley Idehen wrote:
> >opera.com247281
> >ecademy.com  224875
[...]
> How a publicly shared Google spreadsheet doc? From there to Linked Data 
> is a short journey :-)

Not a spreadsheet but HTML (Top 50, updated daily):

http://www.foaf-search.net/TopDomains

cu,
brunni

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel



Re: How many instances of foaf:Person are there in the LOD Cloud?

2011-04-15 Thread Varish Mulwad
A PhD student in the Ebiquity research lab in UMBC has been focusing on
learning whether "two FOAF instances are co-referent, i.e., denote the same
entity in the world."

See - Learning Co-reference Relations for FOAF Instances, Jennifer Sleeman,
and Tim Finin <
http://ebiquity.umbc.edu/paper/html/id/503/Learning-Co-reference-Relations-for-FOAF-Instances>

Regards,
Varish Mulwad
Ph.D. Student
Ebiquity Research Lab
Webpage: http://goo.gl/NVu8N



On Wed, Apr 13, 2011 at 6:40 PM, Kingsley Idehen wrote:

> On 4/13/11 5:49 PM, Bernard Vatant wrote:
>
>> So tonight I would turn my question otherwise : Among those millions of
>> FOAF profiles, how do I discover those of which primary source is their
>> primary topic, expressing herself natively in FOAF, vs the ocean of
>> second-hand remashed / remixed information, captured with or without clear
>> approbation of their subjects, and eventually released in FOAF syntax in the
>> Cloud ...
>>
>
> This is where URIBurner data should become a little more interesting :-)
>
>
> --
>
> Regards,
>
> Kingsley Idehen
> President&  CEO
> OpenLink Software
> Web: http://www.openlinksw.com
> Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca: kidehen
>
>
>
>
>
>
>


Re: Discussion meta-comment

2011-04-15 Thread Keith Alexander
On Thu, Apr 14, 2011 at 11:42 PM, Hugh Glaser  wrote:

> Hi,
> I am looking at the process and outcomes I observe, rather than delving into 
> the details.
> It is not about whether people could have acted differently - it is about how 
> people actually did act.
> A lost opportunity? Clearly there were a number of people who had opinions, 
> and seemed ready to engage in a discussion. I would have been interested to 
> hear what they had to say. But the social dynamics (in my opinion) were such 
> that they no longer chose to contribute.
> In answer to your last question: Because the discussion then becomes about 
> the page, rather than principle, or even original topic; but I begin to 
> repeat myself.
> Best
> Hugh
>

+1
I had the same impression. The thread started off well, but diverged
and became acrimonious and I for one was put off following it because
of that.

Keith