Re: Lang and dt in the graph. Was: Dumb SPARQL query problem
On 2 Dec 2013, at 06:24, Ross Horne ross.ho...@gmail.com wrote: Andy is right (as usual!). With the proposed bnode encoding, the graph becomes fatter each time the same triple is loaded. But how much fatter was the question. RDF 1.1 has just fixed the mess caused by blurring the roles of the lexer and the parser, as summarised by David recently: http://lists.w3.org/Archives/Public/public-lod/2013Nov/0093.html Ah yes, I forgot that everything is rosy now with 1.1 - sorry. Please don't get back into mixing up the lexer and the parser. The lexical spaces of the basic datatypes are disjoint, so in any language we can just write: - 999 instead of 999^^xsd:integer - 9.99 instead of 9.99^^xsd:decimal - WWV instead of WWV^^xsd:string - 2013-06-6T11:00:00+01:00 instead of 2013-06-6T11:00:00+01:00^^xsd:dateTime As part of a compiler [1], a lexer gobbles up characters, e.g. 999, and turns the characters into a token. A token consists of a string, called an attribute value, plus a token name, e.g. 999^^xsd:integer. Only a relatively small handful of people writing compilers for languages should have to care about how tokens are represented, not end users of languages. Well personally I prefer the first version I used for my course on this when it came out in 1977, the Dragon Book - Principles of Compiler Design, before Sethi polluted it with all that type-checking stuff :-) Actually, it wasn’t about blurring the lexer and parser - the graph semantics were different. It was closer to having two representations of zero in the machine (as some machines used to have), and having to write code to ensure that you coped with both of them. Of course your examples do raise the issue of multiple representations for the same thing if the user is not careful. 23.4, 23.5, 23.0, 23.2, 23, 23.1, 023.0, 023 all of which are different RDF terms. Would a lexer/parser make 23.00 and 23.000 different RDF terms, I find myself thinking I should know, but don’t - my guess is it should. (RDF 1.1 doesn’t seem to give guidance on this.) And I find myself getting strangely interested in your dateTime example. I think most lexers will reject it? Or friendly ones will treat it as the correct lexical form: 2013-06-06T11:00:00+01:00 (You need to pad the day) So maybe we need to get a bit more explicit about the RDF term for dateTime (unless I have missed it)? That the RDF term is always in UTC? - This is what the xdd standard says. That the RDF term always has a fractional second part? - Good question. That the RDF term always has a timezone? - Better question. (See http://www.w3.org/TR/xmlschema-2/#dateTime ) Or are we happy with many different representations of a given dateTime? (Of course xsd:dateTime does get into problems with year zero, but lets not worry about that :-) ) But I guess my friendly RDF parser gnomes (all hail!) already have stories for all this. Best Hugh For language tags, a little simple conventional datatype subtyping (as opposed to rdfs:subClassOf), could help the programmer further [2]. e.g. a programmer that writes regex(WWV2013@en, WWV) clearly meant regex(WWV2013, WWV) and shouldn't have to care about the distinction, unless I am mistaken. Regards, Ross [1] Ullman, Aho, Lam and Sethi. Compilers: principles, techniques and tools. 1986 [2] Local Type Checking for Linked Data Consumers. http:/dx.doi.org/10.4204/EPTCS.123.4 -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Lang and dt in the graph. Was: Dumb SPARQL query problem
On 2013-11 -23, at 12:21, Andy Seaborne wrote: On 23/11/13 17:01, David Booth wrote: [...] This would have been fixed if the RDF model had been changed to represent the language tag as an additional triple, but whether this would have been a net benefit to the community is still an open question, as it would add the complexity of additional triples. Different. Maybe better, maybe worse. Do you want all your abc to be the same language? abc rdf:lang en . or multiple languages: abc rdf:lang cy . abc rdf:lang en . ? Unlikely - so it's bnode time ... :x :p [ rdf:value abc ; rdf:lang en ] . The nice thing about this in a n3rules-like system (where FILTER and WHERE clauses are not distinct and some properties are just builtins) is that rdf:value and rdf:lang can be made builtins so a datatypes literal can behave just like a bnode with two properties if you want to. But I have always preferred it with not 2 extra triples, just one: :x :p [ lang:en cat ] which allows you also to write things like :x :p [ lang:en cat] , [ lang:fr chat ]. or if you use the ^ back-path syntax of N3 (which was not taken up in turtle), :x :p cat^lang:en, chat^lang:fr . You can do the same with datatypes: :x :q 2013-11-25^xsd:date . instead of :x :q 2013-11-25^xsd:date . I suggested way back these properties as a way of putting the info into the graph but my suggestion was not adopted. I think it would have made the model more complete which would have been a good think, though SPARQL would need to have language-independent query matching as a special case -- but it does now too really. (These are interpretation properties. I must really update http://www.w3.org/DesignIssues/InterpretationProperties.html) Units are fun as properties too. http://www.w3.org/2007/ont/unit Tim Andy
Re: Lang and dt in the graph. Was: Dumb SPARQL query problem
On 01/12/13 12:25, Tim Berners-Lee wrote: On 2013-11 -23, at 12:21, Andy Seaborne wrote: On 23/11/13 17:01, David Booth wrote: [...] This would have been fixed if the RDF model had been changed to represent the language tag as an additional triple, but whether this would have been a net benefit to the community is still an open question, as it would add the complexity of additional triples. Different. Maybe better, maybe worse. Do you want all your abc to be the same language? abc rdf:lang en or multiple languages: abc rdf:lang cy . abc rdf:lang en . ? Unlikely - so it's bnode time ... :x :p [ rdf:value abc ; rdf:lang en ] . The nice thing about this in a n3rules-like system (where FILTER and WHERE clauses are not distinct and some properties are just builtins) is that rdf:value and rdf:lang can be made builtins so a datatypes literal can behave just like a bnode with two properties if you want to. But I have always preferred it with not 2 extra triples, just one: :x :p [ lang:en cat ] which allows you also to write things like :x :p [ lang:en cat] , [ lang:fr chat ]. or if you use the ^ back-path syntax of N3 (which was not taken up in turtle), :x :p cat^lang:en, chat^lang:fr . You can do the same with datatypes: :x :q 2013-11-25^xsd:date . instead of :x :q 2013-11-25^xsd:date . This seems to bring it it's own issues. These bnodes seem to be like untidy literals as considered in RDF-2004 WG. :x :p [ lang:en cat ] :x :p [ lang:en cat ] :x :p [ lang:en cat ] is 6 triples. :x :p :q . :x :p :q . :x :p :q . is 1 triple. Repeated read in same file - this already causes confusion. :x :p cat . :x :p cat . :x :p cat . is 1 triple or is it 3 triples because it's really :x :p [ xsd:string cat ]. :x :p 123 . :x :p 123 . :x :p 123 . It makes it hard to ask do X and Y have the same value for :p? - it gets messy to consider all the cases of triple patterns that arise and I would not want to push that burden back onto the application writer. Why can't the app writer say find me all things which a property value less than 45? To give that, if we add interpretation of bNodes used in this value form (datatype properties vs object properties ?), so you can ask about shared values, we have made them tidy again. But then it is little different from structured literals with @lang and ^^datatype. Having the data model and the access model different does not gain anything. The data model should reflect the way the data is accessed. Like RDF lists, or seq/alt/bag, encoding values in triples is attractive in its uniformity but the triples nature always shows through somewhere, making something else complicated. Andy PS Graph leaning does not help because you can't add data incrementally if leaning is applied at each addition. I suggested way back these properties as a way of putting the info into the graph but my suggestion was not adopted. I think it would have made the model more complete which would have been a good think, though SPARQL would need to have language-independent query matching as a special case -- but it does now too really. (These are interpretation properties. I must really update http://www.w3.org/DesignIssues/InterpretationProperties.html) Units are fun as properties too. http://www.w3.org/2007/ont/unit Tim Andy
Re: Lang and dt in the graph. Was: Dumb SPARQL query problem
Hi. Thanks. A bit of help please :-) On 1 Dec 2013, at 17:36, Andy Seaborne andy.seabo...@epimorphics.com wrote: On 01/12/13 12:25, Tim Berners-Lee wrote: On 2013-11 -23, at 12:21, Andy Seaborne wrote: On 23/11/13 17:01, David Booth wrote: [...] This would have been fixed if the RDF model had been changed to represent the language tag as an additional triple, but whether this would have been a net benefit to the community is still an open question, as it would add the complexity of additional triples. Different. Maybe better, maybe worse. Do you want all your abc to be the same language? abc rdf:lang en or multiple languages: abc rdf:lang cy . abc rdf:lang en . ? Unlikely - so it's bnode time ... :x :p [ rdf:value abc ; rdf:lang en ] . The nice thing about this in a n3rules-like system (where FILTER and WHERE clauses are not distinct and some properties are just builtins) is that rdf:value and rdf:lang can be made builtins so a datatypes literal can behave just like a bnode with two properties if you want to. But I have always preferred it with not 2 extra triples, just one: :x :p [ lang:en cat ] which allows you also to write things like :x :p [ lang:en cat] , [ lang:fr chat ]. or if you use the ^ back-path syntax of N3 (which was not taken up in turtle), :x :p cat^lang:en, chat^lang:fr . You can do the same with datatypes: :x :q 2013-11-25^xsd:date . instead of :x :q 2013-11-25^xsd:date . This seems to bring it it's own issues. These bnodes seem to be like untidy literals as considered in RDF-2004 WG. :x :p [ lang:en cat ] :x :p [ lang:en cat ] :x :p [ lang:en cat ] is 6 triples. :x :p :q . :x :p :q . :x :p :q . is 1 triple. Repeated read in same file - this already causes confusion. :x :p cat . :x :p cat . :x :p cat . is 1 triple or is it 3 triples because it's really Is it not 1 triple if you take the first view or 6 triples if you take the second? Or probably I don’t understand bnodes properly!? :x :p [ xsd:string cat ]. :x :p 123 . :x :p 123 . :x :p 123 . It makes it hard to ask do X and Y have the same value for :p? - it gets messy to consider all the cases of triple patterns that arise and I would not want to push that burden back onto the application writer. Why can't the app writer say find me all things which a property value less than 45? I see it makes it hard, but I don’t see it as any harder than what we have now, with multiple patterns that do and don’t have ^^xsd:String As I said before, with the ^^xsd you need to consider a bunch of patterns to do the query - again, it is messy, but is it messier? Actually I find { ?s1 ?p [ xsd:string ?str ] . ?s2 ?p [ xsd:string ?str ] . } with a possible also { ?s1 ?p ?str . ?s2 ?p ?str . } much easier to work with than something that has this stuff optionally tacked on the end of literals, that isn’t really part of the string but isn’t part of RDF either. Or maybe it is part of the literal but not the string? Surely that should be clear to me? I just don’t see there is a difference in complexity for querying - it is just that the current situation is genuinely messier for consumers because there are two notations in play, whereas if RDF is so good we should have everything in RDF. Not that I would say anything should change :-) it ain’t actually broken, but it could get fixed. (Oh dear, Hugh showing his ignorance of the fancy stuff again) Best Hugh To give that, if we add interpretation of bNodes used in this value form (datatype properties vs object properties ?), so you can ask about shared values, we have made them tidy again. But then it is little different from structured literals with @lang and ^^datatype. Having the data model and the access model different does not gain anything. The data model should reflect the way the data is accessed. Like RDF lists, or seq/alt/bag, encoding values in triples is attractive in its uniformity but the triples nature always shows through somewhere, making something else complicated. Andy PS Graph leaning does not help because you can't add data incrementally if leaning is applied at each addition. I suggested way back these properties as a way of putting the info into the graph but my suggestion was not adopted. I think it would have made the model more complete which would have been a good think, though SPARQL would need to have language-independent query matching as a special case -- but it does now too really. (These are interpretation properties. I must really update http://www.w3.org/DesignIssues/InterpretationProperties.html) Units are fun as properties too. http://www.w3.org/2007/ont/unit Tim Andy -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Lang and dt in the graph. Was: Dumb SPARQL query problem
Andy is right (as usual!). With the proposed bnode encoding, the graph becomes fatter each time the same triple is loaded. RDF 1.1 has just fixed the mess caused by blurring the roles of the lexer and the parser, as summarised by David recently: http://lists.w3.org/Archives/Public/public-lod/2013Nov/0093.html Please don't get back into mixing up the lexer and the parser. The lexical spaces of the basic datatypes are disjoint, so in any language we can just write: - 999 instead of 999^^xsd:integer - 9.99 instead of 9.99^^xsd:decimal - WWV instead of WWV^^xsd:string - 2013-06-6T11:00:00+01:00 instead of 2013-06-6T11:00:00+01:00^^xsd:dateTime As part of a compiler [1], a lexer gobbles up characters, e.g. 999, and turns the characters into a token. A token consists of a string, called an attribute value, plus a token name, e.g. 999^^xsd:integer. Only a relatively small handful of people writing compilers for languages should have to care about how tokens are represented, not end users of languages. For language tags, a little simple conventional datatype subtyping (as opposed to rdfs:subClassOf), could help the programmer further [2]. e.g. a programmer that writes regex(WWV2013@en, WWV) clearly meant regex(WWV2013, WWV) and shouldn't have to care about the distinction, unless I am mistaken. Regards, Ross [1] Ullman, Aho, Lam and Sethi. Compilers: principles, techniques and tools. 1986 [2] Local Type Checking for Linked Data Consumers. http:/ dx.doi.org/10.4204/EPTCS.123.4
Re: Dumb SPARQL query problem
Relevant to RDF 1.1 support in SPARQL 1.1, I modified my local copy of the SPARQL 1.1 Query test suite for the result differences I noticed: distinct-all.srx - Remove 3 literal results with datatype=http://www.w3.org/2001/XMLSchema#string; distinct-str.srx - (same) strdt03.srx - Remove datatype=http://www.w3.org/2001/XMLSchema#string; from literals strlang03.srx - (same) After that, I can continue to pass SPARQL 1.1 Query tests for my Ruby implementation. Gregg Kellogg gr...@greggkellogg.net On Nov 23, 2013, at 11:25 AM, David Booth da...@dbooth.org wrote: On 11/23/2013 12:21 PM, Andy Seaborne wrote: On 23/11/13 17:01, David Booth wrote: Hi Hugh, A little correction and a further question . . . On 11/23/2013 10:17 AM, Hugh Glaser wrote: Pleasure. Actually, I found this: http://answers.semanticweb.com/questions/3530/sparql-query-filtering-by-string I said it is a pig’s breakfast because you never know what the RDF publisher has decided to do, and need to try everything. So to match strings efficiently you need to do (at least) four queries: “cat” “cat”@en “cat”^^xsd:string Is that still true in SPARQL 1.1? In Turtle cat means the exact same thing as cat^^xsd:string: http://www.w3.org/TR/turtle/#literals But this section of SPARQL 1.1 Section 4.1.2 Syntax for Literals has no mention of them being the same: http://www.w3.org/TR/sparql11-query/#QSynLiterals Anyone (Andy?) know whether this was fixed in SPARQL 1.1? I thought SPARQL 1.1 and Turtle had been pretty well aligned. SPARQL 1.1 says nothing about it aside from (as in SPARQL 1.0) DATATYPE(abc) is xsd:string and DATATYPE(abc@en) is rdf:langString (in 1.1). What it should say, but does not because SPARQL 1.1 finished before RDF 1.1 got near sufficiently stable, is 1/ parsing abc and abc^^xsd:string is the same thing. 2/ In results formats, it's abc or equivalent, and no ^^xsd:String. For matching, it falls out in the matching over RDF but actually putting that in the text would be nice. Ah yes, I see that in the RDF 1.1 draft now: http://www.w3.org/TR/rdf11-concepts/#h3_section-Graph-Literal [[ Concrete syntaxes MAY support simple literals, consisting of only a lexical form without any datatype IRI or language tag. Simple literals only exist in concrete syntaxes, and are treated as syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string. ]] So in effect, this was fixed at the RDF 1.1 abstract level, so even though the SPARQL 1.1 spec did not mention it, if a SPARQL 1.1 server is RDF 1.1 compliant, then it will treat abc and abc^^xsd:string as the same. Thanks! David
Dumb SPARQL query problem
Hi, Sorry to bother the list, but I'm stumped by what should be a simple SPARQL query. When applied to the dbpedia end-point [1], this search: PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT * WHERE { ?pers a foaf:Person . ?pers foaf:surname Malik . OPTIONAL {?pers dbpedia-owl:birthDate ?dob } OPTIONAL {?pers dbpedia-owl:deathDate ?dod } OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob } OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod } } LIMIT 100 yields no results. Yet if you drop the '?pers foaf:surname Malik .' clause, you get a result set which includes a Malik with the desired surname property. I'm clearly being dumb, but in what way? :-) (I've tried adding ^^xsd:string to the literal, but no joy.) Thanks, Richard [1] http://dbpedia.org/sparql -- *Richard Light*
Re: Dumb SPARQL query problem
Its’ the other bit of the pig’s breakfast. Try an @en On 23 Nov 2013, at 10:18, Richard Light rich...@light.demon.co.uk wrote: Hi, Sorry to bother the list, but I'm stumped by what should be a simple SPARQL query. When applied to the dbpedia end-point [1], this search: PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT * WHERE { ?pers a foaf:Person . ?pers foaf:surname Malik . OPTIONAL {?pers dbpedia-owl:birthDate ?dob } OPTIONAL {?pers dbpedia-owl:deathDate ?dod } OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob } OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod } } LIMIT 100 yields no results. Yet if you drop the '?pers foaf:surname Malik .' clause, you get a result set which includes a Malik with the desired surname property. I'm clearly being dumb, but in what way? :-) (I've tried adding ^^xsd:string to the literal, but no joy.) Thanks, Richard [1] http://dbpedia.org/sparql -- Richard Light -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Dumb SPARQL query problem
On 23/11/2013 10:30, Hugh Glaser wrote: Its’ the other bit of the pig’s breakfast. Try an @en Magic! Thanks. Richard On 23 Nov 2013, at 10:18, Richard Light rich...@light.demon.co.uk wrote: Hi, Sorry to bother the list, but I'm stumped by what should be a simple SPARQL query. When applied to the dbpedia end-point [1], this search: PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT * WHERE { ?pers a foaf:Person . ?pers foaf:surname Malik . OPTIONAL {?pers dbpedia-owl:birthDate ?dob } OPTIONAL {?pers dbpedia-owl:deathDate ?dod } OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob } OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod } } LIMIT 100 yields no results. Yet if you drop the '?pers foaf:surname Malik .' clause, you get a result set which includes a Malik with the desired surname property. I'm clearly being dumb, but in what way? :-) (I've tried adding ^^xsd:string to the literal, but no joy.) Thanks, Richard [1] http://dbpedia.org/sparql -- Richard Light -- *Richard Light*
Re: Dumb SPARQL query problem
Pleasure. Actually, I found this: http://answers.semanticweb.com/questions/3530/sparql-query-filtering-by-string I said it is a pig’s breakfast because you never know what the RDF publisher has decided to do, and need to try everything. So to match strings efficiently you need to do (at least) four queries: “cat” “cat”@en “cat”^^xsd:string “cat”@en^^xsd:string or “cat”^^xsd:string@en - I can’t remember which is right, but I think it’s only one of them :-) Of course if you are matching in SPARQL you can use “… ?o . FILTER (str(?o) = “cat”)…”, but that its likely to be much slower. This means that you may need to do a lot of queries. I built something to look for matching strings (of course! - finding sameAs candidates) where the RDF had been gathered from different sources. Something like SELECT ?a ?b WHERE { ?a ?p1 ?s . ?b ?p2 ?s } would have been nice. I’ll leave it as an exercise to the reader to work out how many queries it takes to genuinely achieve the desired effect without using FILTER and str. Unfortunately it seems that recent developments have not been much help here, but I may be wrong: http://www.w3.org/TR/sparql11-query/#matchingRDFLiterals I guess that the truth is that other people don’t actually build systems that follow your nose to arbitrary Linked Data resources, so they don’t worry about it? Or am I missing something obvious, and people actually have a good way around this? To me the problem all comes because knowledge is being represented outside the triple model. And also because of the XML legacy of RDF, even though everyone keeps saying that is only a serialisation of an abstract model. Ah well, back in my box. Cheers. On 23 Nov 2013, at 11:00, Richard Light rich...@light.demon.co.uk wrote: On 23/11/2013 10:30, Hugh Glaser wrote: Its’ the other bit of the pig’s breakfast. Try an @en Magic! Thanks. Richard On 23 Nov 2013, at 10:18, Richard Light rich...@light.demon.co.uk wrote: Hi, Sorry to bother the list, but I'm stumped by what should be a simple SPARQL query. When applied to the dbpedia end-point [1], this search: PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT * WHERE { ?pers a foaf:Person . ?pers foaf:surname Malik . OPTIONAL {?pers dbpedia-owl:birthDate ?dob } OPTIONAL {?pers dbpedia-owl:deathDate ?dod } OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob } OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod } } LIMIT 100 yields no results. Yet if you drop the '?pers foaf:surname Malik .' clause, you get a result set which includes a Malik with the desired surname property. I'm clearly being dumb, but in what way? :-) (I've tried adding ^^xsd:string to the literal, but no joy.) Thanks, Richard [1] http://dbpedia.org/sparql -- Richard Light -- Richard Light -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Dumb SPARQL query problem
Not sure if this helps multilingual pigs as much as it should, but I'm not much good before coffee and expect there are many fellow mammals who share my plight ... Language classification code reduction (in old fashioned SQL) http://www.rustprivacy.org/faca/languages.php On Sat, 11/23/13, Hugh Glaser h...@glasers.org wrote: Subject: Re: Dumb SPARQL query problem To: Richard Light rich...@light.demon.co.uk Cc: public-lod community public-lod@w3.org Date: Saturday, November 23, 2013, 9:17 AM Pleasure. Actually, I found this: http://answers.semanticweb.com/questions/3530/sparql-query-filtering-by-string I said it is a pig’s breakfast because you never know what the RDF publisher has decided to do, and need to try everything. So to match strings efficiently you need to do (at least) four queries: “cat” “cat”@en “cat”^^xsd:string “cat”@en^^xsd:string or “cat”^^xsd:string@en - I can’t remember which is right, but I think it’s only one of them :-) Of course if you are matching in SPARQL you can use “… ?o . FILTER (str(?o) = “cat”)…”, but that its likely to be much slower. This means that you may need to do a lot of queries. I built something to look for matching strings (of course! - finding sameAs candidates) where the RDF had been gathered from different sources. Something like SELECT ?a ?b WHERE { ?a ?p1 ?s . ?b ?p2 ?s } would have been nice. I’ll leave it as an exercise to the reader to work out how many queries it takes to genuinely achieve the desired effect without using FILTER and str. Unfortunately it seems that recent developments have not been much help here, but I may be wrong: http://www.w3.org/TR/sparql11-query/#matchingRDFLiterals I guess that the truth is that other people don’t actually build systems that follow your nose to arbitrary Linked Data resources, so they don’t worry about it? Or am I missing something obvious, and people actually have a good way around this? To me the problem all comes because knowledge is being represented outside the triple model. And also because of the XML legacy of RDF, even though everyone keeps saying that is only a serialisation of an abstract model. Ah well, back in my box. Cheers. On 23 Nov 2013, at 11:00, Richard Light rich...@light.demon.co.uk wrote: On 23/11/2013 10:30, Hugh Glaser wrote: Its’ the other bit of the pig’s breakfast. Try an @en Magic! Thanks. Richard On 23 Nov 2013, at 10:18, Richard Light rich...@light.demon.co.uk wrote: Hi, Sorry to bother the list, but I'm stumped by what should be a simple SPARQL query. When applied to the dbpedia end-point [1], this search: PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT * WHERE { ?pers a foaf:Person . ?pers foaf:surname Malik . OPTIONAL {?pers dbpedia-owl:birthDate ?dob } OPTIONAL {?pers dbpedia-owl:deathDate ?dod } OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob } OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod } } LIMIT 100 yields no results. Yet if you drop the '?pers foaf:surname Malik .' clause, you get a result set which includes a Malik with the desired surname property. I'm clearly being dumb, but in what way? :-) (I've tried adding ^^xsd:string to the literal, but no joy.) Thanks, Richard [1] http://dbpedia.org/sparql -- Richard Light -- Richard Light -- Hugh Glaser 20 Portchester Rise Eastleigh SO50 4QS Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Re: Dumb SPARQL query problem
Hi Hugh, A little correction and a further question . . . On 11/23/2013 10:17 AM, Hugh Glaser wrote: Pleasure. Actually, I found this: http://answers.semanticweb.com/questions/3530/sparql-query-filtering-by-string I said it is a pig’s breakfast because you never know what the RDF publisher has decided to do, and need to try everything. So to match strings efficiently you need to do (at least) four queries: “cat” “cat”@en “cat”^^xsd:string Is that still true in SPARQL 1.1? In Turtle cat means the exact same thing as cat^^xsd:string: http://www.w3.org/TR/turtle/#literals But this section of SPARQL 1.1 Section 4.1.2 Syntax for Literals has no mention of them being the same: http://www.w3.org/TR/sparql11-query/#QSynLiterals Anyone (Andy?) know whether this was fixed in SPARQL 1.1? I thought SPARQL 1.1 and Turtle had been pretty well aligned. “cat”@en^^xsd:string or “cat”^^xsd:string@en - I can’t remember which is right, but I think it’s only one of them :-) Neither is allowed. You can have *either* a language tag *or* a datatype, but not both: http://www.w3.org/TR/sparql11-query/#QSynLiterals http://www.w3.org/TR/sparql11-query/#rRDFLiteral But dealing with the difference between cat and cat@en is still a problem, as explained here: http://www.w3.org/TR/sparql11-query/#matchLangTags This would have been fixed if the RDF model had been changed to represent the language tag as an additional triple, but whether this would have been a net benefit to the community is still an open question, as it would add the complexity of additional triples. David Of course if you are matching in SPARQL you can use “… ?o . FILTER (str(?o) = “cat”)…”, but that its likely to be much slower. This means that you may need to do a lot of queries. I built something to look for matching strings (of course! - finding sameAs candidates) where the RDF had been gathered from different sources. Something like SELECT ?a ?b WHERE { ?a ?p1 ?s . ?b ?p2 ?s } would have been nice. I’ll leave it as an exercise to the reader to work out how many queries it takes to genuinely achieve the desired effect without using FILTER and str. Unfortunately it seems that recent developments have not been much help here, but I may be wrong: http://www.w3.org/TR/sparql11-query/#matchingRDFLiterals I guess that the truth is that other people don’t actually build systems that follow your nose to arbitrary Linked Data resources, so they don’t worry about it? Or am I missing something obvious, and people actually have a good way around this? To me the problem all comes because knowledge is being represented outside the triple model. And also because of the XML legacy of RDF, even though everyone keeps saying that is only a serialisation of an abstract model. Ah well, back in my box. Cheers. On 23 Nov 2013, at 11:00, Richard Light rich...@light.demon.co.uk wrote: On 23/11/2013 10:30, Hugh Glaser wrote: Its’ the other bit of the pig’s breakfast. Try an @en Magic! Thanks. Richard On 23 Nov 2013, at 10:18, Richard Light rich...@light.demon.co.uk wrote: Hi, Sorry to bother the list, but I'm stumped by what should be a simple SPARQL query. When applied to the dbpedia end-point [1], this search: PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT * WHERE { ?pers a foaf:Person . ?pers foaf:surname Malik . OPTIONAL {?pers dbpedia-owl:birthDate ?dob } OPTIONAL {?pers dbpedia-owl:deathDate ?dod } OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob } OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod } } LIMIT 100 yields no results. Yet if you drop the '?pers foaf:surname Malik .' clause, you get a result set which includes a Malik with the desired surname property. I'm clearly being dumb, but in what way? :-) (I've tried adding ^^xsd:string to the literal, but no joy.) Thanks, Richard [1] http://dbpedia.org/sparql -- Richard Light -- Richard Light
Re: Dumb SPARQL query problem
On 23/11/13 17:01, David Booth wrote: Hi Hugh, A little correction and a further question . . . On 11/23/2013 10:17 AM, Hugh Glaser wrote: Pleasure. Actually, I found this: http://answers.semanticweb.com/questions/3530/sparql-query-filtering-by-string I said it is a pig’s breakfast because you never know what the RDF publisher has decided to do, and need to try everything. So to match strings efficiently you need to do (at least) four queries: “cat” “cat”@en “cat”^^xsd:string Is that still true in SPARQL 1.1? In Turtle cat means the exact same thing as cat^^xsd:string: http://www.w3.org/TR/turtle/#literals But this section of SPARQL 1.1 Section 4.1.2 Syntax for Literals has no mention of them being the same: http://www.w3.org/TR/sparql11-query/#QSynLiterals Anyone (Andy?) know whether this was fixed in SPARQL 1.1? I thought SPARQL 1.1 and Turtle had been pretty well aligned. SPARQL 1.1 says nothing about it aside from (as in SPARQL 1.0) DATATYPE(abc) is xsd:string and DATATYPE(abc@en) is rdf:langString (in 1.1). What it should say, but does not because SPARQL 1.1 finished before RDF 1.1 got near sufficiently stable, is 1/ parsing abc and abc^^xsd:string is the same thing. 2/ In results formats, it's abc or equivalent, and no ^^xsd:String. For matching, it falls out in the matching over RDF but actually putting that in the text would be nice. “cat”@en^^xsd:string or “cat”^^xsd:string@en - I can’t remember which is right, but I think it’s only one of them :-) Neither is allowed. You can have *either* a language tag *or* a datatype, but not both: http://www.w3.org/TR/sparql11-query/#QSynLiterals http://www.w3.org/TR/sparql11-query/#rRDFLiteral Ditto in RDF syntax. But dealing with the difference between cat and cat@en is still a problem, as explained here: http://www.w3.org/TR/sparql11-query/#matchLangTags This would have been fixed if the RDF model had been changed to represent the language tag as an additional triple, but whether this would have been a net benefit to the community is still an open question, as it would add the complexity of additional triples. Different. Maybe better, maybe worse. Do you want all your abc to be the same language? abc rdf:lang en . or multiple languages: abc rdf:lang cy . abc rdf:lang en . ? Unlikely - so it's bnode time ... :x :p [ rdf:value abc ; rdf:lang en ] . Andy David Of course if you are matching in SPARQL you can use “… ?o . FILTER (str(?o) = “cat”)…”, but that its likely to be much slower. This means that you may need to do a lot of queries. I built something to look for matching strings (of course! - finding sameAs candidates) where the RDF had been gathered from different sources. Something like SELECT ?a ?b WHERE { ?a ?p1 ?s . ?b ?p2 ?s } would have been nice. I’ll leave it as an exercise to the reader to work out how many queries it takes to genuinely achieve the desired effect without using FILTER and str. Unfortunately it seems that recent developments have not been much help here, but I may be wrong: http://www.w3.org/TR/sparql11-query/#matchingRDFLiterals I guess that the truth is that other people don’t actually build systems that follow your nose to arbitrary Linked Data resources, so they don’t worry about it? Or am I missing something obvious, and people actually have a good way around this? To me the problem all comes because knowledge is being represented outside the triple model. And also because of the XML legacy of RDF, even though everyone keeps saying that is only a serialisation of an abstract model. Ah well, back in my box. Cheers. On 23 Nov 2013, at 11:00, Richard Light rich...@light.demon.co.uk wrote: On 23/11/2013 10:30, Hugh Glaser wrote: Its’ the other bit of the pig’s breakfast. Try an @en Magic! Thanks. Richard On 23 Nov 2013, at 10:18, Richard Light rich...@light.demon.co.uk wrote: Hi, Sorry to bother the list, but I'm stumped by what should be a simple SPARQL query. When applied to the dbpedia end-point [1], this search: PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT * WHERE { ?pers a foaf:Person . ?pers foaf:surname Malik . OPTIONAL {?pers dbpedia-owl:birthDate ?dob } OPTIONAL {?pers dbpedia-owl:deathDate ?dod } OPTIONAL {?pers dbpedia-owl:placeOfBirth ?pob } OPTIONAL {?pers dbpedia-owl:placeOfDeath ?pod } } LIMIT 100 yields no results. Yet if you drop the '?pers foaf:surname Malik .' clause, you get a result set which includes a Malik with the desired surname property. I'm clearly being dumb, but in what way? :-) (I've tried adding ^^xsd:string to the literal, but no joy.) Thanks, Richard [1] http://dbpedia.org/sparql -- Richard Light -- Richard Light
Re: Dumb SPARQL query problem
On 11/23/2013 12:21 PM, Andy Seaborne wrote: On 23/11/13 17:01, David Booth wrote: Hi Hugh, A little correction and a further question . . . On 11/23/2013 10:17 AM, Hugh Glaser wrote: Pleasure. Actually, I found this: http://answers.semanticweb.com/questions/3530/sparql-query-filtering-by-string I said it is a pig’s breakfast because you never know what the RDF publisher has decided to do, and need to try everything. So to match strings efficiently you need to do (at least) four queries: “cat” “cat”@en “cat”^^xsd:string Is that still true in SPARQL 1.1? In Turtle cat means the exact same thing as cat^^xsd:string: http://www.w3.org/TR/turtle/#literals But this section of SPARQL 1.1 Section 4.1.2 Syntax for Literals has no mention of them being the same: http://www.w3.org/TR/sparql11-query/#QSynLiterals Anyone (Andy?) know whether this was fixed in SPARQL 1.1? I thought SPARQL 1.1 and Turtle had been pretty well aligned. SPARQL 1.1 says nothing about it aside from (as in SPARQL 1.0) DATATYPE(abc) is xsd:string and DATATYPE(abc@en) is rdf:langString (in 1.1). What it should say, but does not because SPARQL 1.1 finished before RDF 1.1 got near sufficiently stable, is 1/ parsing abc and abc^^xsd:string is the same thing. 2/ In results formats, it's abc or equivalent, and no ^^xsd:String. For matching, it falls out in the matching over RDF but actually putting that in the text would be nice. Ah yes, I see that in the RDF 1.1 draft now: http://www.w3.org/TR/rdf11-concepts/#h3_section-Graph-Literal [[ Concrete syntaxes MAY support simple literals, consisting of only a lexical form without any datatype IRI or language tag. Simple literals only exist in concrete syntaxes, and are treated as syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string. ]] So in effect, this was fixed at the RDF 1.1 abstract level, so even though the SPARQL 1.1 spec did not mention it, if a SPARQL 1.1 server is RDF 1.1 compliant, then it will treat abc and abc^^xsd:string as the same. Thanks! David