[ANN] Apache Jena Fuseki2

2015-03-22 Thread Andy Seaborne


Apache Jena is framework for building linked data and semantic web
applications.

The team is please to announce a major revision of Fuseki, the Apache
Jena SPARQL server.

   http://jena.apache.org/documentation/fuseki2/

Fuseki2 can run as a operating system service, as a Java web application 
(WAR file), and as a standalone server. It provides security (using 
Apache Shiro) and has a user interface for server monitoring and 
administration. Fuseki implements the SPARQL 1.1 protocols for query and 
update as well as the SPARQL Graph Store protocol.


Fuseki is tightly integrated with Apache Jena's TDB to provide a
robust, transactional persistent storage layer, and incorporates Jena
text query and Jena spatial query. It can be used to provide the
protocol engine for other RDF query and storage systems.

The new UI provides administration of a running server, including
managing datasets. At the moment, it provides the abilities to create
datasets, upload data, query the data and to backup the database on a
live server. This incorporates component and contributions from YASR
http://yasgui.org/ (with thanks to Laurens Rietveld)

Fuseki v1 continues to be available.  To ease transition, the Fuseki v2
standalone server can be run in the same way as Fuseki1 for existing
configurations. The Fuseki1 and Fuseki2 UIs are not compatible.

== Obtaining Fuseki 2

= As binary downloads

Apache Jena Fuseki is available as a binary distribution as well as via
in maven.

http://jena.apache.org/download/#apache-jena-fuseki

= Source code for the release

The signed source code of this release is available at:

http://www.apache.org/dist/jena/source/

and the signed master source for all Apache Jena releases is available
at: http://archive.apache.org/dist/jena/

Andy
on behalf of the Apache Jena developer community



Re: SPARUL & Turtle compatibility

2014-09-23 Thread Andy Seaborne
You can have \u in string literal in SPARQL -- it happens a different in 
SPARQL.


\u processing is applied to the input char stream via
   http://www.w3.org/TR/sparql11-query/#codepointEscape

(it's this, not the Turtle way, for historical reasons i.e. 
compatibility with SPARQL 1.1)


Andy


On 23/09/14 08:45, Dimitris Kontokostas wrote:

Hello,

With the recent discussion on RDF/LD patching I looked on the SPARUL [1]
& Turtle [2] specs and was surprised to notice that they are not 100%
compatible.

The difference I found was in the escaping of literal values where, in
turtle we are allowed to have unicode escaping while in SPARUL we are not

Turtle [26] |UCHAR| ::= '|\u|' HEX
 HEX
 HEX
 HEX
 || |'|\U|' HEX
 HEX
 HEX
 HEX
 HEX
 HEX
 HEX
 HEX



This means we cannot just reuse turtle blocks inside insert/delete
SPARUL blocks without pre-processing. Am I correct or did I overlook
something?
In addition, does the aforementioned rule add anything in turtle besides
extra serialization options?

Best,
Dimitris

[1] http://www.w3.org/TR/sparql11-query/#rSTRING_LITERAL1
[2] http://www.w3.org/TR/turtle/#grammar-production-STRING_LITERAL_QUOTE


--
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas




Re: Understanding datatypes in RDF 1.1 - was various things

2013-12-06 Thread Andy Seaborne

(and now, press "send" not "save")

On 02/12/13 12:35, Hugh Glaser wrote:

Hmm,
My head is spinning a bit now - I’m trying to understand something simple - 
"1"^^xsd:boolean.

So my reading says that is a valid lexical form (in the lexical space) for the 
value ’true’ (in the value space).
(http://www.w3.org/TR/rdf11-concepts/#dfn-lexical-space )
I think that ‘value space’ is where the other documents talk about 'RDF term’, 
but I’m not sure.


"RDF Term" covers IRI, blank Node or Literal.  So it's on the (abstract) 
syntax side, not the value side.




And I also I read:
"Literal term equality: Two literals are term-equal (the same RDF literal) if 
and only if the two lexical forms, the two datatype IRIs, and the two language tags 
(if any) compare equal, character by character.”
(http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal )


Literal term equality = compare the three parts of a literal - lexical 
form, datatype and language tag.  Lang tags are lower case (yes, that's 
different to the RFC :-().




So the language processor will (must) take my lexical form
"1"^^xsd:boolean
and make it an RDF term
“true"


It may do, probably doesn't - but there is no requirement to work in 
"values" at all.



And then if I ask the store (sorry, I am rather engineering in this) if 2 terms 
are equal, it will always be comparing two similar terms (from the literal 
space), (probably, but see below):
“true"^^xsd:boolean

And I can expect a sensible querying engine to consider
"1"^^xsd:boolean
as a shorthand for
“true"


Definitely in a SPARQL FILTER, for "=", etc which are value based 
comparisons, not for sameTerm(...).


SPARQL graph pattern matches has to follow RDF (with or with D-entailment).


It could be confusing, which it was for a bit for me, because the equality 
constraint says "the two lexical forms”, but in this case there is more than 
one lexical from for the value form.
So I think it means that a processor must always choose the same lexical form 
for any given value form.


It could, but not required.  The loading of the data might canonicalize 
the RDF terms.  Then comparing terms and comparing values is quite 
similar. thats outside SPARQL Query.


Still not the same though:

"1"^^xsd:integer = "1"^^xsd:double

but the rules of XSD arithmetic mean you can't easily drop the type 
distinction.


xsd:long + integer -> integer
integer + double -> double

xsd:doubles are not xsd:decimals.  Sometimes the loss of precision is 
unacceptable (insert finance example).


Whether loosing the original datatype and/or exact way the literal was 
written may sometimes matter to the app.



I am guessing that processors could consistently choose
"1"^^xsd:boolean
as the value form for
“true"
but that would be pretty perverse.

A little further confusion for me arises as to whether the datatype IRI is part 
of the value space.
I have taken off any ^^xsd:boolean from my rendering of the “true” in the value 
space because the documentation seems to leave it out.
(The table says: '<“true”, xsd:boolean>’ and ‘true’ are the literal and value.)
So I am left assuming that the datatype IRI is somewhere in the RDF term world, 
although we know it isn’t in the graph.
Not something I need to worry about as a consumer, as it is all an internal 
issue, I think, but I thought I would mention it.

Best
Hugh



Andy



An

2013-12-02 Thread Andy Seaborne



On 01/12/13 23:02, Hugh Glaser wrote:

Hi.
Thanks.
A bit of help please :-)
On 1 Dec 2013, at 17:36, Andy Seaborne  wrote:




On 01/12/13 12:25, Tim Berners-Lee wrote:


On 2013-11 -23, at 12:21, Andy Seaborne wrote:




On 23/11/13 17:01, David Booth wrote:

[...]
This would have been fixed if the RDF model had been changed to
represent the language tag as an additional triple, but whether this
would have been a net benefit to the community is still an open
question, as it would add the complexity of additional triples.


Different.  Maybe better, maybe worse.


Do you want all your "abc" to be the same language?

   "abc" rdf:lang "en"

or multiple languages:

   "abc" rdf:lang "cy" .
   "abc" rdf:lang "en" .


?

Unlikely - so it's bnode time ...

:x :p [ rdf:value "abc" ; rdf:lang "en" ] .


The nice thing about this in a n3rules-like system (where FILTER and WHERE 
clauses are not distinct and some properties are just builtins)   is that 
rdf:value and rdf:lang can be made builtins so a datatypes literal can behave 
just like a bnode with two properties if you want to.

But I have always preferred it with not 2 extra triples, just one:

:x  :p [ lang:en "cat" ]

which allows you also to write things like

:x :p  [ lang:en "cat"] , [ lang:fr "chat" ].

or if you use the  ^  back-path syntax of N3 (which was not taken up in turtle),

:x :p "cat"^lang:en,  "chat"^lang:fr .

You can do the same with datatypes:

:x :q   "2013-11-25"^xsd:date .

instead of

:x :q   "2013-11-25"^xsd:date .


This seems to bring it it's own issues.  These bnodes seem to be like untidy 
literals as considered in RDF-2004 WG.

:x  :p [ lang:en "cat" ]
:x  :p [ lang:en "cat" ]
:x  :p [ lang:en "cat" ]

is 6 triples.

:x :p :q .
:x :p :q .
:x :p :q .

is 1 triple.  Repeated read in same file - this already causes confusion.

:x :p "cat" .
:x :p "cat" .
:x :p "cat" .

is 1 triple or is it 3 triples because it's really

Is it not 1 triple if you take the first view or 6 triples if you take the 
second?
Or probably I don’t understand bnodes properly!?


:x :p [ xsd:string "cat" ].

:x :p 123 .
:x :p 123 .
:x :p 123 .

It makes it hard to ask "do X and Y have the same value for :p?" - it gets messy to 
consider all the cases of triple patterns that arise and I would not want to push that burden 
back onto the application writer. Why can't the app writer say "find me all things which 
a property value less than 45?

I see it makes it hard, but I don’t see it as any harder than what we have now, 
with multiple patterns that do and don’t have ^^xsd:String
As I said before, with the ^^xsd you need to consider a bunch of patterns to do 
the query - again, it is messy, but is it messier?

Actually I find
  { ?s1 ?p [ xsd:string ?str ] . ?s2 ?p [ xsd:string ?str ] . }
with a possible also
  { ?s1 ?p ?str . ?s2 ?p ?str . }


Let's talk numbers (strings have a lexical form that looks like the 
value) and have 123 as shorthand for [ xsd:integer "123 ].  And let's 
ignore rdf:langString.


{ ?s1 ?p ?x . ?s2 ?p ?x . }

does not care whether ?x is a URI or a literal at the moment.  Your 
example is a good one as it's "?p" so the engine does not know whether 
it's a datatype property or a object property.


With bnodes this may match, it probably doesn't.  It depends on the 
micro-detail of the data.


# No.
:x1 :p 123 .
:x2 :p 123 .

# Yes
:s1 :p _:a .
:s2 :p _:a
_:a xsd:string "abc" .

Sure, if you know it's an integer
   ?s1 ?p [ xsd:integer ?str ]
or even:
{ ?s1 ?p [ ?dt ?str ] . ?s2 ?p [ ?dt ?str ] . }

{ ?s1 ?p [ ?dt ?str ] . ?s2 ?p [ ?dt ?str ] . }

though I think this is shifting unnecessary cognitive model onto the app 
writer.


I didn't say the access language was SPARQL :-)  I meant how people 
think about accessing the data.  Datatype properties are really very 
bizarre in this world.


And this is at the fine grain level.  Now apply to real queries that are 
10s of lines long.



{ ?s1 ?p [ xsd:integer "123 ] }
{ ?s1 ?p 123 }

it might be possible to make that bNode infer to the value 123 which 
would be a win.  Making literals value-centric not appearance/struct 
based would be a very nice.



And counting.  Counting matters to people (e.g. facetted browse)

Andy

PS I started my first email draft with the argument that it was better 
to have the more triples form ... but the usability caused me to 
recreate the tidy literals thing, not that I was there are the time.



much easier to work with than something that has this stuff optionally tacked 
on the end of literals, that isn’t really part of the string but isn’t part of 
RDF either.
Or maybe it is pa

Re: Lang and dt in the graph. Was: Dumb SPARQL query problem

2013-12-01 Thread Andy Seaborne



On 01/12/13 12:25, Tim Berners-Lee wrote:


On 2013-11 -23, at 12:21, Andy Seaborne wrote:




On 23/11/13 17:01, David Booth wrote:

[...]
This would have been fixed if the RDF model had been changed to
represent the language tag as an additional triple, but whether this
would have been a net benefit to the community is still an open
question, as it would add the complexity of additional triples.


Different.  Maybe better, maybe worse.


Do you want all your "abc" to be the same language?

   "abc" rdf:lang "en"

or multiple languages:

   "abc" rdf:lang "cy" .
   "abc" rdf:lang "en" .


?

Unlikely - so it's bnode time ...

:x :p [ rdf:value "abc" ; rdf:lang "en" ] .


The nice thing about this in a n3rules-like system (where FILTER and WHERE 
clauses are not distinct and some properties are just builtins)   is that 
rdf:value and rdf:lang can be made builtins so a datatypes literal can behave 
just like a bnode with two properties if you want to.

But I have always preferred it with not 2 extra triples, just one:

:x  :p [ lang:en "cat" ]

which allows you also to write things like

:x :p  [ lang:en "cat"] , [ lang:fr "chat" ].

or if you use the  ^  back-path syntax of N3 (which was not taken up in turtle),

:x :p "cat"^lang:en,  "chat"^lang:fr .

You can do the same with datatypes:

:x :q   "2013-11-25"^xsd:date .

instead of

:x :q   "2013-11-25"^xsd:date .


This seems to bring it it's own issues.  These bnodes seem to be like 
untidy literals as considered in RDF-2004 WG.


:x  :p [ lang:en "cat" ]
:x  :p [ lang:en "cat" ]
:x  :p [ lang:en "cat" ]

is 6 triples.

:x :p :q .
:x :p :q .
:x :p :q .

is 1 triple.  Repeated read in same file - this already causes confusion.

:x :p "cat" .
:x :p "cat" .
:x :p "cat" .

is 1 triple or is it 3 triples because it's really

:x :p [ xsd:string "cat" ].

:x :p 123 .
:x :p 123 .
:x :p 123 .

It makes it hard to ask "do X and Y have the same value for :p?" - it 
gets messy to consider all the cases of triple patterns that arise and I 
would not want to push that burden back onto the application writer. 
Why can't the app writer say "find me all things which a property value 
less than 45?


To give that, if we add interpretation of bNodes used in this value form 
(datatype properties vs object properties ?), so you can ask about 
shared values, we have made them tidy again.  But then it is little 
different from structured literals with @lang and ^^datatype.


Having the data model and the access model different does not gain 
anything.  The data model should reflect the way the data is accessed.


Like RDF lists, or seq/alt/bag, encoding values in triples is attractive 
in its uniformity but the "triples" nature always shows through 
somewhere, making something else complicated.


Andy

PS Graph leaning does not help because you can't add data incrementally 
if leaning is applied at each addition.



I suggested way back these properties as a way of putting the info into the 
graph
but my suggestion was not adopted.  I think it would have made the model
more complete which would have been a good think, though
SPARQL would need to have language-independent query matching as a  special 
case -- but
it does now too really.

(These are interpretation properties.  I must really update
http://www.w3.org/DesignIssues/InterpretationProperties.html)

Units are fun as properties too. http://www.w3.org/2007/ont/unit

Tim



Andy








Re: Dumb SPARQL query problem

2013-11-23 Thread Andy Seaborne



On 23/11/13 17:01, David Booth wrote:

Hi Hugh,

A little correction and a further question . . .

On 11/23/2013 10:17 AM, Hugh Glaser wrote:

Pleasure.
Actually, I found this:
http://answers.semanticweb.com/questions/3530/sparql-query-filtering-by-string


I said it is a pig’s breakfast because you never know what the RDF
publisher has decided to do, and need to try everything.
So to match strings efficiently you need to do (at least) four queries:
“cat”
“cat”@en
“cat”^^xsd:string


Is that still true in SPARQL 1.1?  In Turtle "cat" means the exact same
thing as "cat"^^xsd:string:
http://www.w3.org/TR/turtle/#literals

But this section of SPARQL 1.1 Section 4.1.2 "Syntax for Literals" has
no mention of them being the same:
http://www.w3.org/TR/sparql11-query/#QSynLiterals

Anyone (Andy?) know whether this was fixed in SPARQL 1.1?  I thought
SPARQL 1.1 and Turtle had been pretty well aligned.


SPARQL 1.1 says nothing about it aside from (as in SPARQL 1.0) 
DATATYPE("abc") is xsd:string and DATATYPE("abc"@en) is rdf:langString 
(in 1.1).


What it should say, but does not because SPARQL 1.1 finished before RDF 
1.1 got near sufficiently stable, is


1/ parsing "abc" and "abc"^^xsd:string is the same thing.
2/ In results formats, it's "abc" or equivalent, and no ^^xsd:String.

For matching, it falls out in the matching over RDF but actually putting 
that in the text would be nice.




“cat”@en^^xsd:string or “cat”^^xsd:string@en - I can’t remember which
is right, but I think it’s only one of them :-)


Neither is allowed.  You can have *either* a language tag *or* a
datatype, but not both:
http://www.w3.org/TR/sparql11-query/#QSynLiterals
http://www.w3.org/TR/sparql11-query/#rRDFLiteral


Ditto in RDF syntax.



But dealing with the difference between "cat" and "cat"@en is still a
problem, as explained here:
http://www.w3.org/TR/sparql11-query/#matchLangTags

This would have been fixed if the RDF model had been changed to
represent the language tag as an additional triple, but whether this
would have been a net benefit to the community is still an open
question, as it would add the complexity of additional triples.


Different.  Maybe better, maybe worse.


Do you want all your "abc" to be the same language?

   "abc" rdf:lang "en" .

or multiple languages:

   "abc" rdf:lang "cy" .
   "abc" rdf:lang "en" .


?

Unlikely - so it's bnode time ...

:x :p [ rdf:value "abc" ; rdf:lang "en" ] .

Andy




David



Of course if you are matching in SPARQL you can use “… ?o . FILTER
(str(?o) = “cat”)…”, but that its likely to be much slower.

This means that you may need to do a lot of queries.
I built something to look for matching strings (of course! - finding
sameAs candidates) where the RDF had been gathered from different
sources.
Something like
SELECT ?a ?b WHERE { ?a ?p1 ?s . ?b ?p2 ?s }
would have been nice.
I’ll leave it as an exercise to the reader to work out how many
queries it takes to genuinely achieve the desired effect without using
FILTER and str.

Unfortunately it seems that recent developments have not been much
help here, but I may be wrong:
http://www.w3.org/TR/sparql11-query/#matchingRDFLiterals

I guess that the truth is that other people don’t actually build
systems that follow your nose to arbitrary Linked Data resources, so
they don’t worry about it?
Or am I missing something obvious, and people actually have a good way
around this?

To me the problem all comes because knowledge is being represented
outside the triple model.
And also because of the XML legacy of RDF, even though everyone keeps
saying that is only a serialisation of an abstract model.
Ah well, back in my box.

Cheers.

On 23 Nov 2013, at 11:00, Richard Light 
wrote:



On 23/11/2013 10:30, Hugh Glaser wrote:

Its’ the other bit of the pig’s breakfast.
Try an @en


Magic!  Thanks.

Richard

On 23 Nov 2013, at 10:18, Richard Light 
  wrote:



Hi,

Sorry to bother the list, but I'm stumped by what should be a
simple SPARQL query.  When applied to the dbpedia end-point [1],
this search:

PREFIX foaf:


PREFIX dbpedia-owl:


SELECT *
WHERE {
 ?pers a foaf:Person .
 ?pers foaf:surname "Malik" .
 OPTIONAL {?pers dbpedia-owl:birthDate ?dob }
 OPTIONAL {?pers dbpedia-owl:deathDate ?dod }
 OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob }
 OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod }
}
LIMIT 100

yields no results. Yet if you drop the '?pers foaf:surname "Malik"
.' clause, you get a result set which includes a Malik with the
desired surname property.  I'm clearly being dumb, but in what way?
:-)

(I've tried adding ^^xsd:string to the literal, but no joy.)

Thanks,

Richard
[1]
http://dbpedia.org/sparql

--
Richard Light



--
Richard Light








Re: SPARQL results in RDF

2013-09-25 Thread Andy Seaborne



On 25/09/13 15:10, Sven R. Kunze wrote:

On 09/25/2013 04:02 PM, Dave Reynolds wrote:

On 25/09/13 14:57, Sven R. Kunze wrote:

On 09/25/2013 03:53 PM, Dave Reynolds wrote:

Hi Damian,

On 25/09/13 14:16, Damian Steer wrote:

On 25/09/13 12:03, Stuart Williams wrote:

On 25/09/2013 11:26, Hugh Glaser wrote:

You'll get me using CONSTRUCT soon :-)
(By the way, Tim's actual CONSTRUCT WHERE query isn't allowed
because
of the FILTER).


Good catch... yes - I've been bitten by that kind of thing too...
that
not all that's admissible in a WHERE 'body', is admissible in a
CONSTRUCT 'body'.


As far as I'm aware it is -- Tim's original simply misplaced a curly
brace. The filter ought to be in the WHERE body.

CONSTRUCT is essentially SELECT with a tabular data -> rdf system
bolted
at the end of the pipeline.


I think the point people were making is that the syntactic shortform
"CONSTRUCT WHERE" with implicit template only applies when you have a
simple basic graph pattern [1].

If the WHERE clause is more complex, e.g. with a FILTER, then you need
an explicit construct template.

Dave

[1] http://www.w3.org/TR/sparql11-query/#constructWhere




How did you come to that conclusion?


Based on the part of the specification given by link [1] above, which
says (my emphasis):

"A short form for the CONSTRUCT query form is provided for the case
where the template and the pattern are the same and the pattern is
just a basic graph pattern **(no FILTERs and no complex graph patterns
are allowed in the short form)**."

Dave



I see, Dave. However, my intention was to understand the reason behind
that decision. This quote is not a justification, just a description of
what's expected to work.

I can't see why restricting the set of where clauses is necessary.


You can consult the email archives and telecon meeting minutes for the 
discussions.  The spec is not going to have a justification for every 
decision; that's not a spec anymore!


It gets increasingly difficult to define what the implicit template 
should be.  What about OPTIONALS? UNION? Subquery?  There was a 
consideration of BGP+FILTER.


In the end, given the constraints of time and other features, the simple 
case (simple to explain, to implement and to specify)  where the 
template and the WHERE pattern are the same was put in the spec. 
Otherwise you need to give the relation of WHERE to template.


Andy












Re: SPARQL results in RDF

2013-09-23 Thread Andy Seaborne
DAWG did at one time work with result sets encoded in RDF for the 
testing work.


As the WG progressed, it was clear that implementation of testing
was based on result set comparison, and an impl needed to grok the XML 
results encoding anyway.  Hence the need for the RDF form dwindled but 
it's still there:


http://www.w3.org/2001/sw/DataAccess/tests/result-set.n3

Apache Jena will still produce it if you ask it nicely.

Andy



Re: Again on endpoint server limits [WAS Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users.]

2013-06-03 Thread Andy Seaborne
There's a practical tradeoff of streaming and whole result processing in 
the server.


Streaming can give lower latency to first result for the client which 
can be a better user experience.  HTTP status code go in the header so 
to insert the perfect answer, the server needs to see the end of the 
query before the first bytes get sent.  This is orthogonal to Mark 
Baker's point about returning a URL.


What would be a useful improvement is a formally defined marker in the 
results saying "that's all folks - times up - incomplete results" i.e. 
in the standards results form).  The client can then decide whihc mode 
is better user experience - to see all results before doing any app 
work, or choose to work incrementally.


Andy

On 30/05/13 15:52, Kingsley Idehen wrote:

On 5/30/13 10:42 AM, Andrea Splendiani wrote:

Hi,

good.

A different http header would be good.
The problem is that, in a typical application (or at least some of
them) you don't really know which server is there, so specific headers
(non http) may not be known.
More in general, I think the wider public won't be so precise: either
you get results or not. Or perhaps you get some very visible thing
that results are partial (to the point you need to change something in
your code). Otherwise things go easily undetected.
Also, many times things are wrapped by libraries, so you just get the
data out of a query without knowing details of what happened.

I'll try the lod cloud cache, thanks for the link.

Overall, I think that for many users, it would much be safer a X0X:
query exceeding resources than some partial results. Would it possible
to configure the preferred behavior at query time ?

Maybe.

At this point, we are working to get more into HTTP standard headers
re., partial results. After that, we might consider a SPARQL pragma for
results behavior.

Kingsley


best,
Andrea

Il giorno 30/mag/2013, alle ore 15:33, Kingsley Idehen
 ha scritto:


On 5/30/13 9:13 AM, Andrea Splendiani wrote:

Hi,

let me get back to this thread for two reasons.
1) I was wondering whether the report on DBPedia queries cited below
was already published.
2) I have recently tried to use DBPedia for some simple computation
and I have a problem. Basically a query for all cities whose
population is larger than that of the countries they are in returns
a random number of results. I suspect this is due to hitting some
internal computation load limits, and there is not much I can do
with limits, I think, as results are no more than 20 o so.

Now, I discovered this by chance. If this due to some limits, I
would much better prefer an error message (query too expensive) than
partial results.
Is there a way to detect that these results are partial ?

Of course, via the response headers of the SPARQL query:

1. X-SQL-State:
2. X-SQL-Message:

We are also looking at using HTTP a bit more here i.e., not returning
200 OK if the resultset is partial.


Otherwise, there is s full range of use cases that gets problematic.
I know dbpedia is a best effort free resource, so I understand the
need for limits, and unpredictable results are good enough for many
demos. But being unable to tell if a result is complete or not is a
big constraint in many applications

Also remember, as I've indicated repeatedly, you can get the same
data from the LOD cloud cache instance which is supported by more
powerful computing resources:

1. http://lod.openlinksw.com/sparql -- the fastest instance since its
a V7 cluster (even though it hosts 50 Billion+ triples)
2. http://dbpedia-live.openlinksw.com -- still faster than
dbpedia.org since its using V7 .


Kingsley


best,
Andrea


Il giorno 19/apr/2013, alle ore 02:54, Kingsley Idehen
 ha scritto:


On 4/18/13 7:06 PM, Andrea Splendiani wrote:

Il giorno 18/apr/2013, alle ore 16:04, Kingsley Idehen
 ha scritto:


On 4/18/13 9:23 AM, Andrea Splendiani wrote:

Hi,

I think that some caching with a minimum of query rewriting
would get read of 90% of the select{?s ?p ?o} where {?s?p ?o}
queries.

Sorta.
Client queries are inherently unpredictable. That's always been
the case, and that predates SPARQL. These issues also exist in
the SQL RDBMS realm, which is why you don't have SQL endpoints
delivering what SPARQL endpoints provide.

I know, but I suspect that these days lot of these "intensive"
queries are explorative, just to check what is in the dataset, and
may end up being very similar in structure.

Note, we have logs and recordings of queries that hit many of our
public endpoints. For instance, we are preparing a report on
DBpedia that will actually shed light on types and complexity of
queries that hit the DBpedia endpoint.


Jerven: can you report on your experience in this ? How much of
problematic queries are not really targeted, but more generic ?


 From a user perspective, I would rather have a clear result
code upfront telling me: your query is to heavy, not enough
resources and so on, than partial results + extra codes.

Yes, and yo

Re: Restpark - Minimal RESTful API for querying RDF triples

2013-04-18 Thread Andy Seaborne



On 18/04/13 11:56, Paul Groth wrote:

There's also a java implementation.


ELDA:

http://code.google.com/p/elda/

Andy



Re: Content negotiation for Turtle files

2013-02-06 Thread Andy Seaborne

I use the follow .htaccess file:

AddType  text/turtle .ttl

AddType  application/rdf+xml .rdf
AddType  application/ld+json .jsonld
AddType  application/n-triples   .nt
AddType  application/owl+xml .owl

AddType  text/trig   .trig
AddType  application/n-quads .nq

Andy

On 05/02/13 23:49, Bernard Vatant wrote:

Hello all

Back in 2006, I thought had understood with the help of folks around
here, how to configure my server for content negotiation at lingvoj.org
.
Both vocabulary and instances were published in RDF/XML.

I updated the ontology last week, and since after years of happy living
with RDF/XML people eventually convinced that it was a bad, prehistoric
and ugly syntax, I decided to be trendy and published the new version in
Turtle at http://www.lingvoj.org/ontology_v2.0.ttl

The vocabulary URI is still the same : http://www.lingvoj.org/ontology,
and the namespace http://www.lingvoj.org/ontology# (cool URI don't change)

Then I turned to Vapour to test this new publication, and found out that
to be happy with the vocabulary URI it has to find some answer when
requesting application/rdf+xml. But since I have no more RDF/XML file
for this version, what should I do?
I turned to best practices document at
http://www.w3.org/TR/swbp-vocab-pub, but it does not provide examples
with Turtle, only RDF/XML.

So I blindly put the following in the .htaccess : AddType
application/rdf+xml .ttl
I found it a completely stupid and dirty trick ... but amazigly it makes
Vapour happy.

But now Firefox chokes on http://www.lingvoj.org/ontology_v2.0.ttl
because it seems to expect a XML file. Chrome has not this issue.
The LOV-Bot says there is a content negotiation issue and can't get the
file. So does Parrot.

I feel dumb, but I'm certainly not the only one, I've stumbled upon a
certain number of vocabularies published in Turtle for which the conneg
does not seem to be perfectly clear either.

What do I miss, folks? Should I forget about it, and switch back to good
ol' RDF/XML?

Bernard

--
*Bernard Vatant
*
Vocabularies & Data Engineering
Tel : + 33 (0)9 71 48 84 59
Skype : bernard.vatant
Blog : the wheel and the hub 

*Mondeca*
3 cité Nollez 75018 Paris, France
www.mondeca.com 
Follow us on Twitter : @mondecanews 
--

Meet us at Documation  in Paris, March 20-21





Re: Proposal: register /.well-known/sparql with IANA

2012-12-24 Thread Andy Seaborne



On 24/12/12 14:38, Kingsley Idehen wrote:

On 12/23/12 8:48 PM, David Wood wrote:

On Dec 22, 2012, at 17:23, Kingsley Idehen 
wrote:


On 12/22/12 11:25 AM, Melvin Carvalho wrote:


So I think anyone register this, if there's interest, it would
probably just need to reopen the conversation with the sparql wg
mail list, I think.

I suggest, you just do it :-)


Normally, I'd agree, but in this case I think the SPARQL WG thread is
worth reading; they have a good point.


Help me with a link to the specific point. I've taken a quick look and
haven't found anything that changes my position re. "just do it".

Discoverability remains something that's received too little attention
re. Linked Data, SPARQL, and RDF. It isn't something best handled (a as
prescription) by the W3C (IMHO). When best practices have been
established, the W3C can come in and formalize etc..


void descriptions and service descriptions provide searchable documents?

At least they provide an answer to what to put at a well-known URI.


We need to bootstrap first, and the formalize standardization later. The
alternative route never works, as you can see from the age of the
initial suggestion re. this matter :-)


A good argument ... for using sitemaps·

Andy




Happy Holidays to you and everyone else on the list !



Regards,
Dave



--

Regards,

Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen















Re: ANN: Sparallax! - Browse sets of things together (now those on your SPARQL endpoint)

2011-08-18 Thread Andy Seaborne



On 18/08/11 00:03, Danny Ayers wrote:
> Which aggregate functions are needed?
> ARQ has some support (very possibly more than) listed here:
> http://jena.sourceforge.net/ARQ/group-by.html
>
> [Andy, are there any more query examples around? I can't seem to get
> count(*) working here]
>

ARQ runs in standards mode (SPARQL 1.1) by default:

SELECT (count(*) AS ?C) { ?s ?p ?o }

Older style,

SELECT count(*) { ?s ?p ?o }

will work if you choose syntaxARQ (which is ticking "SPARQL extended 
syntax" in the validator [1] or the same service in the Fuseki download.


ARQ supports all the SPARQl 1.1 aggregates [2]
COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT, and SAMPLE.

Let me know if you are stil having problems - the jena users list is 
jena-users AT incubator.apache.org


Andy


[1]
http://www.sparql.org/query-validator.html
[2]
http://www.w3.org/TR/sparql11-query/#aggregates



Re: Why does rdf-sparql-protocol say to return 500 when refusing a query?

2011-05-04 Thread Andy Seaborne
Not dumb at all - it would be a good idea to include something in 
addition to the status code.  Agreeing what exactly has been tricky.


In the case of a SELECT query, however, there is a wrinkle that the 
client may not have an RDF parser - the results (if the query worked) 
are not RDF so an RDF parser for just errors is a bit of a burden.


Andy

On 04/05/11 16:13, Whitley, Zachary C. wrote:

This might be a completely dumb idea but what about including some rdf in the 
response body giving a more detailed explanation of the problem?

"Except when responding to a HEAD request, the server should include an entity 
containing an explanation of the error situation, and indicate whether it is a temporary 
or permanent condition. Likewise, user agents should display any included entity to the 
user. These response codes are applicable to any request method."


-Original Message-
From: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] On Behalf Of 
Andy Seaborne
Sent: Wednesday, April 27, 2011 5:35 AM
To: Hugh Glaser
Cc:
Subject: Re: Why does rdf-sparql-protocol say to return 500 when refusing a 
query?


I agree with the sentiment.  Teh HTTP status codes jist aren't up to
this sort of work, they come from a background of accessing HTML pages
by browsers.  They just don't capture resource-intensive requests very
well IMHO so pushing on the boundaries of HTTP status codes isn't going
to be easy.

Unrelated example:
Using 303 in httpRange-14 is a bit weird because:

"""
This method exists primarily to allow the output of a POST-activated
script to redirect the user agent to a selected resource.
"""
so the original purpose is quite clear, and nothing to do with all the
IR/NIR games which are pushing heavily on the next sentence:
"""
The new URI is not a substitute reference for the originally requested
resource.
"""


Using 4xx for something that the client should fix at the protocol level
(syntax error) and 5xx for something that isn't completely in its
control is the best there is.  Changing the query is much like asking
for a different web page (in HTTP terms it is asking for a different
resource - the URL has changed).

Maybe the standard text for 500 should be:

500 - please don't try that again.

:-)

A good ping query is "ASK{}"

  >  If after all that I get nothing but 500, then I can assume what?

Nothing - there isn't the variety of language in HTTP status codes to
express the possibilities, even if the server knows anything.  If it's
under load, the load may going away in the next second, or may not.

What you want is probably a set of HTTP status codes that are covering
the ins-and-outs of request-response of the characteristics of query
execution.

  >  (I might even check the sparql syntax - I should get a 400 if it
  >  is a syntax error, but I am not sure that is what all endpoints
  >  do - and I know one that does a 200.)

Tut, tut.  That case is very clear.

Andy

On 27/04/11 10:14, Hugh Glaser wrote:

Hi Andy,
Thanks, interesting ways of putting it, and moves the focus slightly, so I 
could put it this way (sort of!).
5xx is the server can't do what you want, and there isn't much point in the 
client having another go with the same request.
However, as you say, it may be that the server could perform the request at 
another time.
And it is actually arguable that the client is wrong if it asks for things like 
?s ?p ?o on a large store :-)

I would not presume to suggest what should happen in detail, as there has been 
much thought on this.
But I would like to reiterate that as a client it should be possible to 
distinguish useful responses (the responses are for the benefit of the client, 
not the server).

I am looking at things from the client's point of view, in terms of actionable 
information.
In practice, what I currently do is not good, but it is constrained by the 500 
given back.
If I get a 500, I might send the same request a couple of more times, to see if 
it is a server overload problem (sorry!).
Then I might try to simplify the query, LIMIT, split it, or whatever.
Then I might check I have an actual valid endpoint, and that I am invoking it 
in the right way, as best I can, for example by sending some simple queries.
(I might even check the sparql syntax - I should get a 400 if it is a syntax 
error, but I am not sure that is what all endpoints do - and I know one that 
does a 200.)

If after all that I get nothing but 500, then I can assume what?
If I have successfully got something out of this endpoint before, then I am 
guessing that Apache can't connect to the back end cluster, or whatever.
Otherwise, I sort of don't know much.

Somehow I think that the codes should enable me to distinguish between these, 
as each of them implies that the client should take different actions to 
compens

Re: Why does rdf-sparql-protocol say to return 500 when refusing a query?

2011-04-27 Thread Andy Seaborne


I agree with the sentiment.  Teh HTTP status codes jist aren't up to 
this sort of work, they come from a background of accessing HTML pages 
by browsers.  They just don't capture resource-intensive requests very 
well IMHO so pushing on the boundaries of HTTP status codes isn't going 
to be easy.


Unrelated example:
Using 303 in httpRange-14 is a bit weird because:

"""
This method exists primarily to allow the output of a POST-activated 
script to redirect the user agent to a selected resource.

"""
so the original purpose is quite clear, and nothing to do with all the 
IR/NIR games which are pushing heavily on the next sentence:

"""
The new URI is not a substitute reference for the originally requested 
resource.

"""


Using 4xx for something that the client should fix at the protocol level 
(syntax error) and 5xx for something that isn't completely in its 
control is the best there is.  Changing the query is much like asking 
for a different web page (in HTTP terms it is asking for a different 
resource - the URL has changed).


Maybe the standard text for 500 should be:

500 - please don't try that again.

:-)

A good ping query is "ASK{}"

> If after all that I get nothing but 500, then I can assume what?

Nothing - there isn't the variety of language in HTTP status codes to 
express the possibilities, even if the server knows anything.  If it's 
under load, the load may going away in the next second, or may not.


What you want is probably a set of HTTP status codes that are covering 
the ins-and-outs of request-response of the characteristics of query 
execution.


> (I might even check the sparql syntax - I should get a 400 if it
> is a syntax error, but I am not sure that is what all endpoints
> do - and I know one that does a 200.)

Tut, tut.  That case is very clear.

Andy

On 27/04/11 10:14, Hugh Glaser wrote:

Hi Andy,
Thanks, interesting ways of putting it, and moves the focus slightly, so I 
could put it this way (sort of!).
5xx is the server can't do what you want, and there isn't much point in the 
client having another go with the same request.
However, as you say, it may be that the server could perform the request at 
another time.
And it is actually arguable that the client is wrong if it asks for things like 
?s ?p ?o on a large store :-)

I would not presume to suggest what should happen in detail, as there has been 
much thought on this.
But I would like to reiterate that as a client it should be possible to 
distinguish useful responses (the responses are for the benefit of the client, 
not the server).

I am looking at things from the client's point of view, in terms of actionable 
information.
In practice, what I currently do is not good, but it is constrained by the 500 
given back.
If I get a 500, I might send the same request a couple of more times, to see if 
it is a server overload problem (sorry!).
Then I might try to simplify the query, LIMIT, split it, or whatever.
Then I might check I have an actual valid endpoint, and that I am invoking it 
in the right way, as best I can, for example by sending some simple queries.
(I might even check the sparql syntax - I should get a 400 if it is a syntax 
error, but I am not sure that is what all endpoints do - and I know one that 
does a 200.)

If after all that I get nothing but 500, then I can assume what?
If I have successfully got something out of this endpoint before, then I am 
guessing that Apache can't connect to the back end cluster, or whatever.
Otherwise, I sort of don't know much.

Somehow I think that the codes should enable me to distinguish between these, 
as each of them implies that the client should take different actions to 
compensate.
And it is in the server's interest to do so, as otherwise I will start firing 
off stuff to investigate.

Best
Hugh

On 17 Apr 2011, at 21:32, Andy Seaborne wrote:


To quote RFC 2616:

"""
Response status codes beginning with the digit "5" indicate cases in which the 
server is aware that it has erred or is incapable of performing the request"
"""

and the server is incapable.  It may be the query or the given query at that 
point in time.

There aren't that many HTTP status codes defined and fitting "server refuses, but 
your request was valid" to 500 seems close, just not very near.

A server can (and should?) use any appropriate HTTP code.  503 "Service 
Unavailable" looks useful here but it notes:

"""
  Note: The existence of the 503 status code does not imply that a
  server must use it when becoming overloaded. Some servers may wish
  to simply refuse the connection.
"""

The 4xx's means the client is wrong. query timeouts may be due to many things, 
like concurrent requests from other clients, so i

Re: Why does rdf-sparql-protocol say to return 500 when refusing a query?

2011-04-17 Thread Andy Seaborne

To quote RFC 2616:

"""
Response status codes beginning with the digit "5" indicate cases in 
which the server is aware that it has erred or is incapable of 
performing the request"

"""

and the server is incapable.  It may be the query or the given query at 
that point in time.


There aren't that many HTTP status codes defined and fitting "server 
refuses, but your request was valid" to 500 seems close, just not very 
near.


A server can (and should?) use any appropriate HTTP code.  503 "Service 
Unavailable" looks useful here but it notes:


"""
  Note: The existence of the 503 status code does not imply that a
  server must use it when becoming overloaded. Some servers may wish
  to simply refuse the connection.
"""

The 4xx's means the client is wrong. query timeouts may be due to many 
things, like concurrent requests from other clients, so it's not simply 
a mistake by the client. In SPARQL protocol terms, the query request is 
valid (it's a legal query).


Andy


On 17/04/11 21:07, Hugh Glaser wrote:

Ah, thanks.
That explains it.
I was puzzled why I was getting 500.
I assumed the endpoint was returning 500 by mistake.
Never crossed my mind it might be the the "correct" thing to do.
As a consumer I would like to be able to distinguish a refusal to answer from a 
failure of the web server to access the store, for example.
Best
Hugh

- Reply message -
From: "Alexander Dutton"
To: "public-lod"
Subject: Why does rdf-sparql-protocol say to return 500 when refusing a query?
Date: Sun, Apr 17, 2011 06:04




-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

The SPARQL Protocol for RDF specification¹ say sin §2.2 that
"QueryRequestRefused [is] bound to HTTP status code 500 Internal
Server Error", and should be used "when a client submits a request
that the service refuses to process". The HTTP 1.1 specification²
states that a status code of 500 means that "the server encountered an
unexpected condition which prevented it from fulfilling the request".

A server might reasonably expect that it will receive
resource-intensive requests, and respond to those by declining to
fulfil them. It is not a client error, not a server error, as the
client is being overly demanding. As such, a 500 response seems — to
me, at least — inappropriate.

The SPARQL protocol spec also says in §2.1.4 that "the
|QueryRequestRefused| fault message [does not] constrain a conformant
SPARQL service

from returning other HTTP status codes or HTTP headers as appropriate
given the semantics of HTTP". Does this contradict §2.2, and the WSDL
definition?

I've heard a rumour that one or more implementations return a 509. To
me, a 403 seems somewhat appropriate (but isn't perfect). What do
other people think, and what is currently implemented?

Yours,

Alex


¹
²
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk2qc50ACgkQS0pRIabRbjCV7QCfcxy/K8dvGtDA8CA3egRaaqfD
8swAn1D/aMUEdTfI/hgVv5UEo7f7vwlr
=CVKO
-END PGP SIGNATURE-







Re: Show me the money - (was Subjects as Literals)

2010-07-05 Thread Andy Seaborne
The other economic-like argument is that there is only so much developer 
bandwidth in the world, whether open source or proprietary.  Do you 
think that bandwidth should be applied to changing current code to track 
changes, to making existing systems more usable, or (open source) on 
supporting users?


Andy

(Disclaimer: I'm sure some email somewhere makes the same point.  But.)

On 01/07/2010 4:38 PM, Jeremy Carroll wrote:


I am still not hearing any argument to justify the costs of literals as
subjects

I have loads and loads of code, both open source and commercial that
assumes throughout that a node in a subject position is not a literal,
and a node in a predicate position is a URI node.

Of course, the "correct" thing to do is to allow all three node types in
all three positions. (Well four if we take the graph name as well!)

But if we make a change, all of my code base will need to be checked for
this issue.
This costs my company maybe $100K (very roughly)
No one has even showed me $1K of advantage for this change.

It is a no brainer not to do the fix even if it is technically correct

Jeremy







Re: DBpedia hosting burden

2010-04-15 Thread Andy Seaborne



On 15/04/2010 2:44 PM, Kingsley Idehen wrote:

Andy,

Great stuff, this is also why we are going to leave the current DBpedia
3.5 instance to stew for a while (until end of this week or a little
later).

DBpedia users:
Now is the time to identify problems with the DBpedia 3.5 dataset dumps.
We don't want to continue reloading DBpedia (Static Edition and then
recalibrating DBpedia-Live) based on faulty datasets related matters, we
do have other operational priorities etc..


"Faulty" is a bit strong.

Many of the warnings are legal RDF, but bad lexical forms for the 
datatype, or IRIs that trigger some of the standard warnings (but they 
are still legal IRIs).  Should they be included or not? Seems to me you 
can argue both for and against.


external_links_en.nt.bz2  is the largest source of broken IRIs.

DBpedia is a wonderful and important dataset, and being derived from 
elsewhere is unlikely to ever be "perfect" (for some definition of 
"perfect").  Better to have the data than to wait for perfection.


Andy



Re: DBpedia hosting burden

2010-04-15 Thread Andy Seaborne



On 15/04/2010 1:36 PM, Andy Seaborne wrote:

I ran the files from
http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt


Doh! Cut & paste error:

http://downloads.dbpedia.org/3.5/en/


through
an N-Triples parser with checking:

The report is here (it's 25K lines long):

http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt

It covers both strict errors and warnings of ill-advised forms.

A few examples:

Bad IRI: <=?(''[[Nepenthes>
Bad IRI: <http://www.european-athletics.org‎>

Bad lexical forms for the value space:
"1967-02-31"^^http://www.w3.org/2001/XMLSchema#date
(there is no February the 31st)


Warning of well known ports of other protocols:
http://stream1.securenetsystems.net:443

Warning about explicit about port 80:

http://bibliotecadigitalhispanica.bne.es:80/

and use of . and .. in absolute URIs which are all from the standard
list of IRI warnings.

Bad IRI: <http://dbpedia.org/resource/..> Code:
8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ not
at the beginning of a relative reference, or it contains a /./ These
should be removed.

Andy

Software used:

The IRI checker, by Jeremy Carroll, is available from
http://www.openjena.org/iri/ and Maven.

The lexical form checking is done by Apache Xerces.

The N-triples parser is the one from TDB v0.8.5 which bundles the above
two together.


On 15/04/2010 9:54 AM, Malte Kiesel wrote:

Ivan Mikhailov wrote:


If I were The Emperor of LOD I'd ask all grand dukes of datasources to
put fresh dumps at some torrent with control of UL/DL ratio :)


Last time I checked (which was quite a while ago though), loading
DBpedia in a normal triple store such as Jena TDB didn't work very well
due to many issues with the DBpedia RDF (e.g., problems with the URIs of
external links scraped from Wikipedia).

I don't know whether this is a bug in TDB or DBpedia but I guess this is
one of the problems causing people to use DBpedia online only - even if,
due to performance reasons, running it locally would be far better.

Regards
Malte





Re: DBpedia hosting burden

2010-04-15 Thread Andy Seaborne
I ran the files from 
http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt through 
an N-Triples parser with checking:


The report is here (it's 25K lines long):

http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt

It covers both strict errors and warnings of ill-advised forms.

A few examples:

Bad IRI: <=?(''[[Nepenthes>
Bad IRI: 

Bad lexical forms for the value space:
"1967-02-31"^^http://www.w3.org/2001/XMLSchema#date
(there is no February the 31st)


Warning of well known ports of other protocols:
http://stream1.securenetsystems.net:443

Warning about explicit about port 80:

http://bibliotecadigitalhispanica.bne.es:80/

and use of . and .. in absolute URIs which are all from the standard 
list of IRI warnings.


Bad IRI:  Code: 
8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ not 
at the beginning of a relative reference, or it contains a /./ These 
should be removed.


Andy

Software used:

The IRI checker, by Jeremy Carroll, is available from
http://www.openjena.org/iri/ and Maven.

The lexical form checking is done by Apache Xerces.

The N-triples parser is the one from TDB v0.8.5 which bundles the above 
two together.



On 15/04/2010 9:54 AM, Malte Kiesel wrote:

Ivan Mikhailov wrote:


If I were The Emperor of LOD I'd ask all grand dukes of datasources to
put fresh dumps at some torrent with control of UL/DL ratio :)


Last time I checked (which was quite a while ago though), loading
DBpedia in a normal triple store such as Jena TDB didn't work very well
due to many issues with the DBpedia RDF (e.g., problems with the URIs of
external links scraped from Wikipedia).

I don't know whether this is a bug in TDB or DBpedia but I guess this is
one of the problems causing people to use DBpedia online only - even if,
due to performance reasons, running it locally would be far better.

Regards
Malte