Re: Content negotiation for Turtle files

2013-02-10 Thread Chris Beer
Hi all

While I promised a response, time is never my friend despite best intentions.

+1 to Tim on "crispness", and on a protocol. I note that the
"content-negotiation" error which was at the core of this discussion
hasn't really been talked about, and was where I was planning to provide
comment on.

So noting the latest in the discussion, I'll fast track and suggest that
as an interim measure: (note - this is a drive by comment, and possibly at
the risk of over-simplifying:)

Cannot a lot of this discussion be solved if a server is correctly
configured to return a 300 response in these cases? ("Multiple-choice" or
"there is more than one format available Mr. Client - please choose which
one you'd like").

We can't assume that clients or users will ask for something we have, or
in a correct manner, which is the reason 300 and other responses not-often
used exist.

Cheers

Chris

> I feel we should be crisp about these things.
> Its not a question of thinking of what things kind of tend
> to enhance interoperability, it is defining a protocol
> which 100% guarantees interoperability.
>
> Here are three distinct protocols which work,
> ie guarantee each client can understand each server.
>
> A) Client accepts various formats including RDF/XML.
>   Server provides various formats including RDF/XML.
>
> B) Client accepts various formats including RDF/XML AND turtle.
>   Server provides  various formats including either RDF/XML OR turtle.
>
> C) Client accepts various formats including turtle.
>   Server provides  various formats including turtle.
>
> These may not ever have been named.
> The RDF world used A in fact for a while, but the
> Linked Data Platform at last count was using C.
> Obviously B has its own advantages but I think that
> we need lightweight clients more than we need lightweight
> servers and so being able to build a client without an
> XML parser is valuable.
>
> Obviously there is a conservative middle ground D in
> which all clients and servers support both formats,
> which could be defined as a practical best practice,
> but we should have a name for, say, C.
>
> We should see whether the LDP group will define
> a word for compliance with C.  I hope so, and then
> we can all provide that and test for it.
>
> Tim
>
> On 2013-02 -06, at 11:38, Leigh Dodds wrote:
>
>>> From an interoperability point of view, having a default format that
>> clients can rely on is reasonable. Until now, RDF/XML has been the
>> standardised format that we can all rely on, although shortly we may
>> all collectively decide to prefer Turtle. So ensuring that RDF/XML is
>> available seems like a reasonable thing for a validator to try and
>> test for.
>>
>> But there's several ways that test could have been carried out. E.g.
>> Vapour could have checked that there was a RDF/XML version and
>> provided you with some reasons why that would be useful. Perhaps as a
>> warning, rather than a fail.
>>
>> The explicit check for RDF/XML being available AND being the default
>> preference of the server is raising the bar slightly, but its still
>> trying to aim for interop.
>>
>> Personally I think I'd implement this kind of check as "ensure there
>> is at least one valid RDF serialisation available, either RDF/XML or
>> Turtle". I wouldn't force a default on a server, particularly as we
>> know that many clients can consume multiple formats.
>>
>> This is where automated validation tools have to tread carefully:
>> while they play an excellent role in encouraging consistently, the
>> tests they perform and the feedback they give need to have some
>> nuance.
>
>





Re: Content negotiation for Turtle files

2013-02-10 Thread Tim Berners-Lee
I feel we should be crisp about these things.
Its not a question of thinking of what things kind of tend
to enhance interoperability, it is defining a protocol
which 100% guarantees interoperability.

Here are three distinct protocols which work,
ie guarantee each client can understand each server.

A) Client accepts various formats including RDF/XML.
  Server provides various formats including RDF/XML.

B) Client accepts various formats including RDF/XML AND turtle.
  Server provides  various formats including either RDF/XML OR turtle.

C) Client accepts various formats including turtle.
  Server provides  various formats including turtle.

These may not ever have been named.
The RDF world used A in fact for a while, but the
Linked Data Platform at last count was using C.
Obviously B has its own advantages but I think that 
we need lightweight clients more than we need lightweight
servers and so being able to build a client without an
XML parser is valuable.

Obviously there is a conservative middle ground D in 
which all clients and servers support both formats,
which could be defined as a practical best practice, 
but we should have a name for, say, C.

We should see whether the LDP group will define
a word for compliance with C.  I hope so, and then
we can all provide that and test for it.

Tim

On 2013-02 -06, at 11:38, Leigh Dodds wrote:

>> From an interoperability point of view, having a default format that
> clients can rely on is reasonable. Until now, RDF/XML has been the
> standardised format that we can all rely on, although shortly we may
> all collectively decide to prefer Turtle. So ensuring that RDF/XML is
> available seems like a reasonable thing for a validator to try and
> test for.
> 
> But there's several ways that test could have been carried out. E.g.
> Vapour could have checked that there was a RDF/XML version and
> provided you with some reasons why that would be useful. Perhaps as a
> warning, rather than a fail.
> 
> The explicit check for RDF/XML being available AND being the default
> preference of the server is raising the bar slightly, but its still
> trying to aim for interop.
> 
> Personally I think I'd implement this kind of check as "ensure there
> is at least one valid RDF serialisation available, either RDF/XML or
> Turtle". I wouldn't force a default on a server, particularly as we
> know that many clients can consume multiple formats.
> 
> This is where automated validation tools have to tread carefully:
> while they play an excellent role in encouraging consistently, the
> tests they perform and the feedback they give need to have some
> nuance.



[ANN] 2nd CFP for WoLE2013 (abstracts due next week)

2013-02-10 Thread Pablo N. Mendes
*[Apologies for cross-posting]

===
2nd International Workshop on Web of Linked Entities (WoLE2013)
http://wole2013.eurecom.fr

In conjunction with the 22nd International World Wide Web Conference
(WWW2013), Rio de Janeiro, 13 May 2013
===

The explosive growth in the amount of data created and shared on the “Web
of Data” is now firmly established. However, the majority of the available
Web content is still unstructured or semi-structured. As a general trend,
unstructured data seems to be growing faster than structured data.
According to a 2011 IDC study, unstructured data will account for 90
percent of all data created in the next decade. As the main goal of the Web
of Data is to stimulate the emergence of new dedicated applications giving
the machines ability to process information, there is a strong need to
extract as far as possible structured data from unstructured content
available on the web. Furthermore, structured data sources provide entity
to entity interconnections, resulting in a Web of Linked Entities spanning
structured and unstructured data. According to this, there is also a strong
emergent need to study how to associate non structured content (like
textual documents) and structured content (like RDF triples) with
information retrieval, extraction, integration and exploration existing
techniques. The WoLE workshop series envisions a Web of Linked Entities
(WoLE), which transparently connects the World Wide Web (WWW) and the Giant
Global Graph (GGG) using methods from Information Retrieval (IR), Natural
Language Processing (NLP) and Database Systems (DB) research communities.
The previous edition of WoLE workshop have shown that different methods
involved in specific research tasks of those communities (like co-reference
detection in NLP or information extraction in IR) can provide different
ways to improve the web of Linked Entities content and its relation with
unstructured documents. According to this, the primary goal of WoLE is to
confront propositions from IR, NLP and DB research communities describing
how their specific methods can improve quality of the Web data, structure
its documents and help to explore its content.

OBJECTIVES AND TOPICS  (but are not limited to):
* Text and web mining
* Pattern and semantic analysis of natural language, reading the web,
learning by reading
* Large-scale information extraction
* Usage mining
* Entity resolution and automatic discovery of entities
* Frequent pattern analysis of entities and/or relationships
* Entity linking, named entity disambiguation, cross-document co-reference
resolution
* Ontology representation of natural language text
* Analysis of ontology models for natural language text
* Learning and refinement of ontologies
* Natural language taxonomies modeled to Semantic Web ontologies
* Disambiguation with the support of knowledge bases
* Multilingual information extraction
* Use cases of entity recognition for Linked Data applications
* Relationship extraction, slot filling
* Impact of entity linking on information retrieval, semantic search,
entity oriented search
* Semantic relatedness and similarity using entities and relations

SUBMISSION AND PUBLICATION:
We invite submissions of full papers (max. 8 pages) and short papers (max.
4 pages). All submissions will be reviewed by at least three program
committee members, and will be assessed based on their novelty, technical
quality, potential impact, and clarity of writing.
The workshop proceedings will be published through the ACM Digital Library.
Please, submit in PDF format to:
https://www.easychair.org/conferences/?conf=wole2013. Submissions must:
- be written in English;
- contain author names, affiliations, and email addresses;
- use the ACM SIG Proceedings template with the font size no smaller than
9pt (http://www.acm.org/sigs/publications/proceedings-templates);
- be in PDF (make sure that the PDF can be viewed on any platform);
- be formatted for the US Letter size;
It is authors’ responsibility to ensure that their submission adhere
strictly to the required format. Submissions that do not comply with the
above guidelines will be rejected without review.

IMPORTANT DATES:
* February 18th 2013, 23:59 Hawaii Time: abstract submissions
* February 25th 2013, 23:59 Hawaii Time: paper submissions
* March 13th 2013, 23:59 Hawaii Time: paper notifications
* March 27th 2013, 23:59 Hawaii Time: camera-ready paper
* May 13th 2013 : WoLE2013 workshop day

WORKSHOP ORGANIZERS:
Pablo N. Mendes, OKFN, Germany;
Giuseppe Rizzo, EURECOM, France;
Eric Charton, CRIM, Centre de Recherche Informatique de Montréal, Canada.

PROGRAMME COMMITTEE
* Alan Akbik (Technische Universität Berlin)
* Krisztian Balog (Norwegian University of Science and Tech., Norway)
* Charalampos Bratsas (Aristotle University of Thessaloniki)
* Razvan Bunescu (University of Ohio)
* Roi Blanco (Yahoo! Research Barcelona, Spain)
* Frédé