future work: reasinging

2018-02-27 Thread Andrew U Frank
For my work, the most important feature would be "same-as"  (like 
Laura), with the same justification.


Afterwards, I would use reasoning that is achievable with simple 
construct queries and then store the result.


An actual example:

construct {?tok a ?pattern .}

FROM #  contains ?tok nlp:lemma3 ?lem
from 
# contains ?lexentry lemon:writtenRep ?lem.
where  {
?lexentry ?kind ?pattern .
?lexentry lemon:writtenRep ?lem .

?tok nlp:lemma3 ?lem .
?tok nlp:pos ?cats.

filter (not exists {wn:isNotClass ?kind ?lem} ) .

}






Re: Transactions when using SPARQL update

2018-02-18 Thread Andrew U Frank

I have read the discussion on transactions, especially the hint
"Transactions are whatever the dataset provides - for TDB that's 
multiple read-and-singlewriter (MR+SW) all at the same time. " (by andy 
ssaborne).


i use TDB but store data using the SPARQL update functions (INSERT DATA 
especially). is it correct that each such SPARQL http call is a 
transaction on the TDB database?


thank you for a very powerful tool!
andrew


Re: performance measures

2017-12-24 Thread Andrew U. Frank

Thank you for the good advice!

The argument is to show that triple store are fast enough for linguist 
application. five years ago a comparison was published, where a 
proprietary data structure excelled. i would like to show that 
triple-stores are fast enough today. I can perhaps get the same dataset 
and the same queries (at the application level), but i have no idea how 
cache data was accounted; it seems that results differed between runs.


i guess I could use some warmup queries, sort of similar to the 
application queries for the test and then run the test queries and 
compare with the previously produced response times. If the response 
time is of the same order of magnitude than before, it would be shown 
that triple-store is fast enough.


Does this sound "good enough"?


On 12/24/2017 01:24 PM, ajs6f wrote:

Any measurements would be unreliable at best and probably worthless.
1/ Different data gives different answers to queries.
2/ Caching matters a lot for databases and a different setup will cache 
differently.

This is so true, and it's not even a complete list. It might be better to 
approach the problem from the application layer. Are you able to put together a 
good suite of test data, queries, and updates, accompanied by a good 
understanding of the kinds of load the triplestore will experience in 
production?

Adam Soroka


On Dec 24, 2017, at 1:21 PM, Andy Seaborne <a...@apache.org> wrote:

On 24/12/17 14:11, Andrew U. Frank wrote:

thank you for the information; i take that using teh indexes  a one-variable 
query would be (close to) linear in the amount of triples found. i saw that TBD 
does build indexes and assumed they use hashes.
i have still the following questions:
1. is performance different for a named or the default graph?

Query performance is approximately the same for GRAPH.
Update is slower.


2. can i simplify measurements with putting pieces of the dataset in different 
graphs and then add more or less of these graphs to take a measure? say i have 
5 named graphs, each with 10 million triples, do queries over 2, 3, 4 and 5 
graphs give the same (or very similar) results than when i would load 20, 30, 
40 and 50 million triples in a single named graph?

Any measurements would be unreliable at best and probably worthless.

1/ Different data gives different answers to queries.

2/ Caching matters a lot for databases and a different setup will cache 
differently.

Andy


thank you for help!
andrew
On 12/23/2017 06:20 AM, ajs6f wrote:

For example, the TIM in-memory dataset impl uses 3 indexes on triples and 6 on quads to ensure that all one-variable queries (i.e. 
for triples ?s  ,  ?p ,   ?o) will be as direct as possible. The indexes are 
hashmaps (e.g. Map<Node, Map<Node, Set>>) and don't use the kind of node directory that TDB does.

There are lots of other ways to play that out, according to the balance of 
times costs and storage costs desired and the expected types of queries.

Adam


On Dec 23, 2017, at 2:56 AM, Lorenz Buehmann 
<buehm...@informatik.uni-leipzig.de> wrote:


On 23.12.2017 00:47, Andrew U. Frank wrote:

are there some rules which queries are linear in the amount of data in
the graph? is it correct to assume that searching for a triples based
on a single condition (?p a X) is logarithmic in the size of the data
collection?

Why should it be logarithmic? The complexity of matching a single BGP
depends on the implementation. I could search for matches by doing a
scan on the whole dataset - that would for sure be not logarithmic but
linear. Usually, if exists, a triple store would use the POS index in
order to find bindings for variable ?p.

Cheers,
Lorenz


--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



Re: performance measures

2017-12-24 Thread Andrew U. Frank
thank you for the information; i take that using teh indexes  a 
one-variable query would be (close to) linear in the amount of triples 
found. i saw that TBD does build indexes and assumed they use hashes.


i have still the following questions:

1. is performance different for a named or the default graph?

2. can i simplify measurements with putting pieces of the dataset in 
different graphs and then add more or less of these graphs to take a 
measure? say i have 5 named graphs, each with 10 million triples, do 
queries over 2, 3, 4 and 5 graphs give the same (or very similar) 
results than when i would load 20, 30, 40 and 50 million triples in a 
single named graph?


thank you for help!

andrew


On 12/23/2017 06:20 AM, ajs6f wrote:

For example, the TIM in-memory dataset impl uses 3 indexes on triples and 6 on quads to ensure that all one-variable queries (i.e. 
for triples ?s  ,  ?p ,   ?o) will be as direct as possible. The indexes are 
hashmaps (e.g. Map<Node, Map<Node, Set>>) and don't use the kind of node directory that TDB does.

There are lots of other ways to play that out, according to the balance of 
times costs and storage costs desired and the expected types of queries.

Adam


On Dec 23, 2017, at 2:56 AM, Lorenz Buehmann 
<buehm...@informatik.uni-leipzig.de> wrote:


On 23.12.2017 00:47, Andrew U. Frank wrote:

are there some rules which queries are linear in the amount of data in
the graph? is it correct to assume that searching for a triples based
on a single condition (?p a X) is logarithmic in the size of the data
collection?

Why should it be logarithmic? The complexity of matching a single BGP
depends on the implementation. I could search for matches by doing a
scan on the whole dataset - that would for sure be not logarithmic but
linear. Usually, if exists, a triple store would use the POS index in
order to find bindings for variable ?p.

Cheers,
Lorenz


--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



performance measures

2017-12-22 Thread Andrew U. Frank
i should do some comparison of a fuseki store based application with 
others with rel db or proprietary dbs. i use fuseki and tdb stored on an 
ssd or a hard disk.


can i simplify measurements with putting pieces of the dataset in 
different graphs and then add more or less of these graphs to take a 
measure? say i have 5 named graphs, each with 10 million triples, do 
queries over 2, 3, 4 and 5 graphs give the same (or very similar) 
results than when i would load 20, 30, 40 and 50 million triples in a 
single named graph?


is performance different for a named or the default graph?

are there some rules which queries are linear in the amount of data in 
the graph? is it correct to assume that searching for a triples based on 
a single condition (?p a X) is logarithmic in the size of the data 
collection?


is there a document which gives some insight into the expected 
performance of queries?


thank you for any information!

andrew



On 12/22/2017 05:16 PM, Dick Murray wrote:

How big? How many?

On 22 Dec 2017 8:37 pm, "Dimov, Stefan" <stefan.di...@sap.com> wrote:


Hi all,

We have a project, which we’re trying to productize and we’re facing
certain operational issues with big size files. Especially with copying and
maintaining them on the productive cloud hardware (application nodes).

Did anybody have similar issues? How did you resolve them?

I will appreciate if someone shares their experience/problems/solutions.

Regards,
Stefan



--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



Re: TDB multiple locations

2017-12-14 Thread Andrew U. Frank
you can achieve this with a file in your fuseki source directory 
../run/configuration. i have found on the web an example which i used


from 
https://github.com/jfmunozf/Jena-Fuseki-Reasoner-Inference/wiki/Configuring-Apache-Jena-Fuseki-2.4.1-inference-and-reasoning-support-using-SPARQL-1.1:-Jena-inference-rules,-RDFS-Entailment-Regimes-and-OWL-reasoning


adapt the filenames to your names!

@prefix :  <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

#This line is a comment

:service1    a    fuseki:Service ;
fuseki:dataset    :dataset ;
fuseki:name   "ElQuijote" ;
fuseki:serviceQuery   "query" , "sparql" ;
fuseki:serviceReadGraphStore  "get" ;
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:serviceUpdate  "update" ;
fuseki:serviceUpload  "upload" .

:dataset rdf:type ja:RDFDataset ;
rdfs:label "ElQuijote" ;
ja:defaultGraph
[ rdfs:label "ElQuijote" ;
a ja:InfModel ;

    #Reference to model.ttl file
    ja:content [ja:externalContent <.../rdfsOntologyExample/model.ttl>  ] ;

    #Reference to data.ttl file
    ja:content [ja:externalContent <.../rdfsOntologyExample/data.ttl>  ] ;

    #Disable OWL-based reasoner
    ja:reasoner [ja:reasonerURL 
<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner> ] ;


    #Disable RDFS-based reasoner
#    ja:reasoner [ja:reasonerURL 
<http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>] ;


    #Enable Jena Rules-based reasoner and we point the location of 
myrules.rules file

#    ja:reasoner [
#    ja:reasonerURL <http://jena.hpl.hp.com/2003/GenericRuleReasoner> ;
#    ja:rulesFrom 
 ;

#    ] ;
  ] ;
 .



On 12/14/2017 10:21 AM, Robert Nielsen wrote:

Is it possible to start Fuseki with initial contents from a file (an OWL
ontology, not an existing TDB), and then allow updates?

When I start Fuseki with the following parameters:

./fuseki-server --file /Ontologies/MyOntology.owl --update /MyFuseki

I get a message that the the resource /MyFuseki is running in read-only
mode.   It appears the --update parameter does nothing.   But there is no
message that says the parameter is ignored (and why).   I can, of course,
start the Fuseki server and then load the ontology file ... but it seems
like I should be able to do it in one step.   What am I missing?

Running Apache Jena Fuseki 3.5.0 with Java 1.8.0_144 on Mac OS X 10.12.6
x86_64.

Robert Nielsen



--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



Re: reasoner in default, data in other graph - no inferences?

2017-12-12 Thread Andrew U. Frank
I have a midsize dataset (55k triples) and a model of eventually a 
hundred triples. what is the preferred mode to manage these?


during development of the mode: should i include the dataset as 
ja:content and load the model after each change using the browser? I 
understand there is no way for the reasoner to use data not stored in 
the default graph. correct?


i appreciate any advice!

andrew



On 12/12/2017 04:47 AM, Dave Reynolds wrote:
The current reasoners are not graph aware and so do indeed only work 
over the default graph.


Dave

On 12/12/17 00:03, Andrew U. Frank wrote:
i try to use the OWL reasoner in fusekii and the browser and have 
followed instructions on the web. I can make a reasoner work, if the 
reasoner and the data are in the same default graph. if i have the 
data in a different graph (i tend to separate my data into various 
graphs - perhaps this is not a good idea?) i have no reasoning.


i wish i could - at least - include in the reasoning one graph 
containing data. how to achieve this? is the reasoner only working on 
the data in the default graph?


i appreciate help!

andrew

my TDB file is:

@prefix :  <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

:service_tdb_all  a   fuseki:Service ;
 fuseki:dataset    :dataset ;
 fuseki:name   "animals" ;
 fuseki:serviceQuery   "query" , "sparql" ;
 fuseki:serviceReadGraphStore  "get" ;
 fuseki:serviceReadWriteGraphStore "data" ;
 fuseki:serviceUpdate  "update" ;
 fuseki:serviceUpload  "upload" .

:dataset a ja:RDFDataset ;
 ja:defaultGraph   <#model_inf> ;
  .

<#model_inf> a ja:InfModel ;
  ja:baseModel <#graph> ;
  ja:reasoner [
  ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>
  ] .

<#graph> rdf:type tdb:GraphTDB ;
   tdb:dataset :tdb_dataset_readwrite .

:tdb_dataset_readwrite
 a tdb:DatasetTDB ;
 tdb:location  "/home/frank/corpusLocal/animalsTest"
 .




--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



Re: OWL reasoner not making deduction for class with value restriction

2017-12-12 Thread Andrew U. Frank
thank you barry! your explanation did the trick. (1) it works and (2) i 
understand it (perhaps more important).


:LexicalEntryNoun a owl:Class ;
    owl:equivalentClass  # not: rdfs:subClassOf
    [ a owl:Restriction ;
  owl:onProperty wn:part_of_speech ;
  owl:hasValue wn:noun
    ] .

i think that some explanation  which i found on the web and copied dealt 
with the issue that there were more things which satisfied the 
restriction and thus the newly defined class was a subset of these (and 
somewhere else in the example, the new class was defined, which i did 
not notice).


thank you all to clarify the point!

to barry: do you want the honor to add the solution to stackoverflow? i 
will then confirm it.


andrew


On 12/12/2017 01:12 PM, Nouwt, B. (Barry) wrote:

Hi Lorenz and Dave,

Thanks for the explanation! I did not realize the difference between 
equivalence and subclassof in this context.

Maybe this is also the solution to this question? Change the subclassof to an 
equivalence? Since both resources:

<http://wordnet-rdf.princeton.edu/wn31/%27s+Gravenhage-n>

<http://wordnet-rdf.princeton.edu/wn31/%27hood-n>

would fall inside the :LexicalEntryNoun class and show up in his SPARQL query. 
Or am I missing something again?

Regards, Barry

-Original Message-
From: Lorenz Buehmann [mailto:buehm...@informatik.uni-leipzig.de]
Sent: dinsdag 12 december 2017 19:01
To: users@jena.apache.org
Subject: Re: OWL reasoner not making deduction for class with value restriction

Hi Barry,


On 12.12.2017 18:43, Nouwt, B. (Barry) wrote:

Hi Lorenz,

You say:

"if some individual :x belongs to class :LexicalEntryNoun, then a
triple

:x wn:part_of_speech wn:noun .

can be inferred."

, but isn't it also the other way around?

"if some individual :x has the property wn:part_of_speech wn:noun,
then a triple

:x rdfs:type LexicalEntryNoun

can be inferred."

No, why do you think so? It's a SubClassOf axiom and the class LexicalEntryNoun 
is on the left-hand side (in Manchester OWL syntax to make it more readable):

Class: LexicalEntryNoun
     SubClassOf: wn:part_of_speech value wn:noun

OWL is based on Description Logics, the semantics is defined on set theory:

:A rdfs:subClassOf :B

implies the set of individuals in :A is a subset of the set of individuals in 
:B.

Or just see it as logical implication, "if A then B"

Having also the other direction of inference would need an EquivalentClass 
axiom, i.e. one has to use owl:equivalentClass instead of rdfs:subClassOf:

:LexicalEntryNoun a owl:Class ;
                               owl:equivalentClass [ a owl:Restriction ;

owl:onProperty wn:part_of_speech ;

owl:hasValue wn:noun ] .


This means both class expressions contain exactly the same set of individuals. Indeed, as 
you probably already recognized, this is just "syntactic sugar" for two 
rdfs:subClassOf axioms for both directions.

Hope this answer helps, if not, feel free to ask.

Cheers,
Lorenz


Regards, Barry



-Original Message-
From: Lorenz Buehmann [mailto:buehm...@informatik.uni-leipzig.de]
Sent: dinsdag 12 december 2017 16:15
To: users@jena.apache.org
Subject: Re: OWL reasoner not making deduction for class with value
restriction

Good evening!


I commented already on StackOverflow, but to make it consistent with the 
mailing list here, please read the comments inline:


On 12.12.2017 15:55, Andrew U. Frank wrote:

I have asked on stackoverflow, perhaps somebody here knows the answer

 


I try to add a bit of ontology to a (public) RDF dataset (wordnet),
specifically I need to differentiate between |LexicalEntries| for
Verbs and Nouns, separated as two subclasses. Following examples on
the web and in the OWL standard, I assumed that

|:LexicalEntryNoun a owl:Class ; rdfs:subClassOf [ a owl:Restriction
|;
owl:onProperty wn:part_of_speech ; owl:hasValue wn:noun ] . |

should build a class |LexicalEntryNoun|, but the query (in jena
fuseki)

What means "build a class"? This axiom states that

if some individual :x belongs to class :LexicalEntryNoun, then a
triple

:x wn:part_of_speech wn:noun .

can be inferred.


|prefix : <http://gerastree.at/2017/litonto#> SELECT * WHERE { ?s a
:LexicalEntryNoun. } |

gives an empty result. The two URI which should be returned are
included in the class represented by a blank node, which stands for
the restriction, but are not reported as |LexicalEntryNoun| as
reported in other queries.

Your query is just looking for subjects of RDF triples that belong to
class :LexicalEntryNoun, i.e. that match the pattern

?s rdf:type :LexicalEntryNoun .

Neither your data nor the axiom above is "generating" such an individual, at 
least not by OWL inference.

Does this answer your question?


Lorenz


i am new to OWL and do not find ma

OWL reasoner not making deduction for class with value restriction

2017-12-12 Thread Andrew U. Frank

I have asked on stackoverflow, perhaps somebody here knows the answer



I try to add a bit of ontology to a (public) RDF dataset (wordnet), 
specifically I need to differentiate between |LexicalEntries| for Verbs 
and Nouns, separated as two subclasses. Following examples on the web 
and in the OWL standard, I assumed that


|:LexicalEntryNoun a owl:Class ; rdfs:subClassOf [ a owl:Restriction ; 
owl:onProperty wn:part_of_speech ; owl:hasValue wn:noun ] . |


should build a class |LexicalEntryNoun|, but the query (in jena fuseki)

|prefix : <http://gerastree.at/2017/litonto#> SELECT * WHERE { ?s a 
:LexicalEntryNoun. } |


gives an empty result. The two URI which should be returned are included 
in the class represented by a blank node, which stands for the 
restriction, but are not reported as |LexicalEntryNoun| as reported in 
other queries.


i am new to OWL and do not find many examples of OWL in turtle syntax. 
Where is my error? Thank you for help!


I constructed a very small subset of data which is loaded together with 
the OWL reasoner |http://jena.hpl.hp.com/2003/OWLFBRuleReasoner|:


|@prefix wn31: <http://wordnet-rdf.princeton.edu/wn31> . @prefix lemon: 
<http://lemon-model.net/lemon#> . @prefix nlp: 
<http://gerastree.at/nlp_2015#> . @prefix rdf: 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix lit: 
<http://gerastree.at/lit_2014#> . @prefix wn: 
<http://wordnet-rdf.princeton.edu/ontology#> . @prefix rdfs: 
<http://www.w3.org/2000/01/rdf-schema#> . @prefix ns: 
<http://www.example.org/ns#> . @prefix owl: 
<http://www.w3.org/2002/07/owl#> . @prefix xsd: 
<http://www.w3.org/2001/XMLSchema#> . @prefix : 
<http://gerastree.at/2017/litonto#> . 
<http://wordnet-rdf.princeton.edu/wn31/%27s+Gravenhage-n> a _:b0 , 
owl:Thing , rdfs:Resource , lemon:LexicalEntry ; lemon:canonicalForm 
<http://wordnet-rdf.princeton.edu/wn31/%27s+Gravenhage-n#CanonicalForm> 
; lemon:sense 
<http://www.lexvo.org/page/wordnet/30/noun/%27s_gravenhage_1_15_00> , 
<http://wordnet-rdf.princeton.edu/wn31/%27s+Gravenhage-n#1-n> ; 
wn:part_of_speech wn:noun ; owl:sameAs 
<http://wordnet-rdf.princeton.edu/wn31/%27s+Gravenhage-n> . 
<http://wordnet-rdf.princeton.edu/wn31/%27hood-n> a _:b0 , owl:Thing , 
rdfs:Resource , lemon:LexicalEntry ; lemon:canonicalForm 
<http://wordnet-rdf.princeton.edu/wn31/%27hood-n#CanonicalForm> ; 
lemon:sense <http://www.lexvo.org/page/wordnet/30/noun/%27hood_1_15_00> 
, <http://wordnet-rdf.princeton.edu/wn31/%27hood-n#1-n> ; 
wn:part_of_speech wn:noun ; owl:sameAs 
<http://wordnet-rdf.princeton.edu/wn31/%27hood-n> . :LexicalEntryNoun a 
owl:Class ; rdfs:subClassOf [ a owl:Restriction ; owl:onProperty 
wn:part_of_speech ; owl:hasValue wn:noun ] . |


--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



reasoner in default, data in other graph - no inferences?

2017-12-11 Thread Andrew U. Frank
i try to use the OWL reasoner in fusekii and the browser and have 
followed instructions on the web. I can make a reasoner work, if the 
reasoner and the data are in the same default graph. if i have the data 
in a different graph (i tend to separate my data into various graphs - 
perhaps this is not a good idea?) i have no reasoning.


i wish i could - at least - include in the reasoning one graph 
containing data. how to achieve this? is the reasoner only working on 
the data in the default graph?


i appreciate help!

andrew

my TDB file is:

@prefix :  <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

:service_tdb_all  a   fuseki:Service ;
    fuseki:dataset    :dataset ;
    fuseki:name   "animals" ;
    fuseki:serviceQuery   "query" , "sparql" ;
    fuseki:serviceReadGraphStore  "get" ;
    fuseki:serviceReadWriteGraphStore "data" ;
    fuseki:serviceUpdate  "update" ;
    fuseki:serviceUpload  "upload" .

:dataset a ja:RDFDataset ;
    ja:defaultGraph   <#model_inf> ;
 .

<#model_inf> a ja:InfModel ;
 ja:baseModel <#graph> ;
 ja:reasoner [
 ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>
 ] .

<#graph> rdf:type tdb:GraphTDB ;
  tdb:dataset :tdb_dataset_readwrite .

:tdb_dataset_readwrite
    a tdb:DatasetTDB ;
    tdb:location  "/home/frank/corpusLocal/animalsTest"
    .


--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



Re: Fuseki all graphs into dataset vs separate graphs

2017-11-26 Thread Andrew U. Frank
can i ask for clarification (my application is perhaps somewhat similar 
to Laura's):


when i use the sparql update protokoll to store data, am i not using 
fuseki? I define my TDB in fuseki (in the run/configuration directory) 
and i start a fuseki server which is the endpoint to receive the update 
queries. i had the impression, that s-put is essentially doing wget to 
the sparql endpoint.


is this (more or less) a correct understanding? how would one start the 
TDB or TDB2 server without fuseki?


andrew



On 11/26/2017 02:54 PM, ajs6f wrote:

"s-put" has nothing to do with TDB2-- it is entirely about SPARQL Graph Store 
protocol. It would work perfectly well with any implementation thereof, including 
non-Jena ones.

You will find in the bin/ directory of a Jena distribution a series of CLI 
tools for working with TDB2 databases, called tdb2.tdbquery, tdb2.tdbupdate 
etc. They work very much like their TDB1 counterparts. They will let you work 
against your TDB2 database without Fuseki.

If all you want to do is load data and query it yourself, you don't need 
Fuseki. You can just use the CLI tools. tdb2.tdbupdate will let you handle your 
graph replacement chore easily.

ajs6f




On Nov 26, 2017, at 2:46 PM, Laura Morales <laure...@mail.com> wrote:


Is this just for your own exploration

Yes


in which case you might want to avoid Fuseki entirely and just work with TDB

I can issue SPARQL queries directly at TDB2 without using Fuseki?

My original problem still stands tho :) Is "s-put" the only way to replace a 
graph in a TDB2 dataset? Is there any CLI tool (non http) that can manipulate a TDB2 
dataset to replace a graph with another (eg. replace wikidata with a new dump)?


--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



Re: Estimating TDB2 size

2017-11-26 Thread Andrew U. Frank
of course, my comparison was naive - and I should have known better, 
working with databases back in the 1980s. still it surprises when a 200 
mb files becomes 13 gb - given that diskspace and memory is inexpensive 
compared to human time waiting for responses, the design choices are 
amply justified. i will buy more memory (;-)


andrew


On 11/26/2017 11:20 AM, ajs6f wrote:

You have to start with the understanding that the indexes in a database are not 
the same thing nor for the same purpose as a simple file of triples or quads. 
TDB1 and 2 store the same triple several times in different orders (SPO, OPS, 
etc.) in order to be able to answer arbitrary queries with good performance, 
which is a common technique.

There is no reason to expect a database that is capable of answering arbitrary 
queries with good performance to be as small as a file, which is not.

ajs6f


On Nov 26, 2017, at 11:17 AM, Andrew U. Frank <fr...@geoinfo.tuwien.ac.at> 
wrote:

thank you for the explanations:

to laura: i guess HDT would reduce the size of my files considerably. where 
could i find information how to use fuseki with HDT? i might be worth trying 
and see how response time changes.

to andy: am i correct to understand that a triple (uri p literal) is translated 
in two triples (uri p uriX) and a second one (uriX s literal) for some 
properties p and s? is there any reuse of existing literals? that would give 
for each literal triple approx. 60 bytes?

i still do not undestand how a triple needs about 300 bytes of storage? (or how 
an nt.gzip file of 219 M igives a TDB database of 13 GB)

size of the database is of concern to me and I think it influences performance 
through the use of IO time.

thank you all very much for the clarifications!

andrew



On 11/26/2017 07:30 AM, Andy Seaborne wrote:

Every RDFTerm gets a NodeId in TDB.  A triple is 3 NodeIds.

There is a big cache, NodeId->RDFTerm.

In TDB1 and TDB2, a NodeId is stored as 8 bytes. TDB2 design is an int and long 
(96 bits) - the current implementation is using 64 bits.

It is very common as a design to dictionary (intern) terms because joins can be 
done by comparing a integers, not testing whether two strings are the same, 
which is much more expensive.

In addition TDBx inlines numbers integers and date/times and some others.

https://jena.apache.org/documentation/tdb/architecture.html

TDBx could, but doesn't, store compressed data on disk. There are pros and cons 
of this.

 Andy

On 26/11/17 08:30, Laura Morales wrote:

Perhaps a bit tangential but this is somehow related to how HDT stores its data (I've run 
some tests with Fuseki + HDT store instead of TDB). Basically, they assign each subject, 
predicate, and object an integer value. It keeps an index to map integers with the 
corresponding string (of the original value), and then they store every triple using 
integers instead of strings (something like "1 2 9. 8 2 1 ." and so forth. The 
drawback I think is that they have to translate indices/strings back and forth at each 
query, nonetheless the response time is still impressive (milliseconds), and it 
compresses the original file *a lot*. By a lot I mean that for Wikidata (not the full 
file though, but one with about 2.3 billion triples) the HDT is more or less 40GB, and 
gz-compressed about 10GB. The problem is that their rdf2hdt tool is so inefficient that 
it does everything in RAM, so to convert something like wikidata you'd need at least a 
machine with 512GB of ram (or swap if you have a fast enough swap :D). Also the tool 
looks like it can't handle files with more than 2^32 triples, although HDT (the format) 
does handle them. So as long as you can handle the conversion, if you want to save space 
you could benefit from using a HDT store rather than using TDB.



Sent: Sunday, November 26, 2017 at 5:30 AM
From: "Andrew U. Frank" <fr...@geoinfo.tuwien.ac.at>
To: users@jena.apache.org
Subject: Re: Estimating TDB2 size
i have   specific questiosn in relation to what ajs6f said:

i have a TDB store with 1/3 triples with very small literals (3-5 char),
where the same sequence is often repeated. would i get smaller store and
better performance if these were URI of the character sequence (stored
once for each repeated case)? any guess how much I could improve?

does the size of the URI play a role in the amount of storage used. i
observe that i have for 33 M triples a TDB size (files) of 13 GB, which
means about 300 byte per triple. the literals are all short (very seldom
more than 10 char, mostly 5 - words from english text). is is a named
graph, if this makes a difference.

thank you!

andrew


--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



--
em.o.Univ.Pr

Re: Estimating TDB2 size

2017-11-26 Thread Andrew U. Frank

thank you for the explanations:

to laura: i guess HDT would reduce the size of my files considerably. 
where could i find information how to use fuseki with HDT? i might be 
worth trying and see how response time changes.


to andy: am i correct to understand that a triple (uri p literal) is 
translated in two triples (uri p uriX) and a second one (uriX s literal) 
for some properties p and s? is there any reuse of existing literals? 
that would give for each literal triple approx. 60 bytes?


i still do not undestand how a triple needs about 300 bytes of storage? 
(or how an nt.gzip file of 219 M igives a TDB database of 13 GB)


size of the database is of concern to me and I think it influences 
performance through the use of IO time.


thank you all very much for the clarifications!

andrew



On 11/26/2017 07:30 AM, Andy Seaborne wrote:

Every RDFTerm gets a NodeId in TDB.  A triple is 3 NodeIds.

There is a big cache, NodeId->RDFTerm.

In TDB1 and TDB2, a NodeId is stored as 8 bytes. TDB2 design is an int 
and long (96 bits) - the current implementation is using 64 bits.


It is very common as a design to dictionary (intern) terms because 
joins can be done by comparing a integers, not testing whether two 
strings are the same, which is much more expensive.


In addition TDBx inlines numbers integers and date/times and some others.

https://jena.apache.org/documentation/tdb/architecture.html

TDBx could, but doesn't, store compressed data on disk. There are pros 
and cons of this.


    Andy

On 26/11/17 08:30, Laura Morales wrote:
Perhaps a bit tangential but this is somehow related to how HDT 
stores its data (I've run some tests with Fuseki + HDT store instead 
of TDB). Basically, they assign each subject, predicate, and object 
an integer value. It keeps an index to map integers with the 
corresponding string (of the original value), and then they store 
every triple using integers instead of strings (something like "1 2 
9. 8 2 1 ." and so forth. The drawback I think is that they have to 
translate indices/strings back and forth at each query, nonetheless 
the response time is still impressive (milliseconds), and it 
compresses the original file *a lot*. By a lot I mean that for 
Wikidata (not the full file though, but one with about 2.3 billion 
triples) the HDT is more or less 40GB, and gz-compressed about 10GB. 
The problem is that their rdf2hdt tool is so inefficient that it does 
everything in RAM, so to convert something like wikidata you'd need 
at least a machine with 512GB of ram (or swap if you have a fast 
enough swap :D). Also the tool looks like it can't handle files with 
more than 2^32 triples, although HDT (the format) does handle them. 
So as long as you can handle the conversion, if you want to save 
space you could benefit from using a HDT store rather than using TDB.




Sent: Sunday, November 26, 2017 at 5:30 AM
From: "Andrew U. Frank" <fr...@geoinfo.tuwien.ac.at>
To: users@jena.apache.org
Subject: Re: Estimating TDB2 size
i have   specific questiosn in relation to what ajs6f said:

i have a TDB store with 1/3 triples with very small literals (3-5 char),
where the same sequence is often repeated. would i get smaller store and
better performance if these were URI of the character sequence (stored
once for each repeated case)? any guess how much I could improve?

does the size of the URI play a role in the amount of storage used. i
observe that i have for 33 M triples a TDB size (files) of 13 GB, which
means about 300 byte per triple. the literals are all short (very seldom
more than 10 char, mostly 5 - words from english text). is is a named
graph, if this makes a difference.

thank you!

andrew



--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



Re: Estimating TDB2 size

2017-11-25 Thread Andrew U. Frank

i have   specific questiosn in relation to what ajs6f said:

i have a TDB store with 1/3 triples with very small literals (3-5 char), 
where the same sequence is often repeated. would i get smaller store and 
better performance if these were URI of the character sequence (stored 
once for each repeated case)? any guess how much I could improve?


does the size of the URI play a role in the amount of storage used. i 
observe that i have for 33 M triples a TDB size (files) of 13 GB, which 
means about 300 byte per triple. the literals are all short (very seldom 
more than 10 char, mostly 5 - words from english text). is is a named 
graph, if this makes a difference.


thank you!

andrew


On 11/25/2017 06:42 AM, ajs6f wrote:

Andy may be able to be more precise, but I can tell you right away that it's not a 
straightforward function. How many literals are there "per triple"? How big are 
the literals, on average? How many unique bnodes and URIs? All of these things will 
change the eventual size of the database.

ajs6f


On Nov 25, 2017, at 6:40 AM, Laura Morales <laure...@mail.com> wrote:

Is it possible to estimate the size of a TDB2 store from one of nt/turtle/xml 
input file, without actually creating the store? Is there maybe a tool for this?


--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



fuseki 3.5

2017-11-25 Thread Andrew U. Frank
i tried 3.5 and got errors in loading data using the gui. loading with 
sparql update protocol gives the expected confifmation message, but the 
gui does not show anything in the info page, nor seem queries to work. 
is the fuseki gui ready for 3.5? (if not, please put this on the info 
page for downloading 3.5)


thank you - i am looking forward to use the improved TB2 datastructure!

andrew

--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



problems with uploading data in nt.gz format from browser

2017-10-16 Thread Andrew U. Frank

i experience a strange effect (replicated a few times):
i upload data in nt.gz format and get a success message, but only a part 
(sometimes less than 10%) are uploaded.
if i extract the nt file from gz.nt and then convert with rapper to 
turtle format, i get an information on how many tripels are in the nt.gz 
file and when i then upload the ttl file all triples are loaded.

i use the browser upload.

any explanation? i use fuseki 3.4.0.

thank you!
andrew



Re: loading many small rdf/xml files

2017-10-07 Thread Andrew U. Frank

thank you again!

rereading your answers, i checked on the utilities xargs and riot, which 
i had not ever used before. then i understood your approach (thank you 
for putting the comand line in!) and followed your approach. it indeed 
produces lots of warnings and i had also a hard error in the riot 
output, which i could fix with rapper. then it loaded


still: why would project gutenberg select such a format?

andrew





On 10/07/2017 12:52 PM, Andy Seaborne wrote:



On 07/10/17 17:06, Andrew U. Frank wrote:
thank you - your link indicates why the solution with calling s-put 
for each individual file is so slow.


practically - i will just wait the 10 hours and then extract the 
triples from the store.


I admire your patience!

I've just downloaded the RDF, converted it to N-triples and loaded it 
into TDB. 55688 files converted to N-triples : 7,949,706 triples.


date ; ( find . -name \*.rdf | xargs riot ) >> data.nt ; date

(Load time was 83s / disk is an SSD)

Then I loaded it into Fuseki into a different, empty database and it 
took ~82 seconds (java had already started).


There are a few RDF warnings:

It uses mixed case host names sometimes:
  http://fr.Wikipedia.org

Some literals are in non-canonical UTF-8:
  "String not in Unicode Normal Form C"

Doesn't stop the process - they are only warnings.

    Andy

can you understand, why somebody would select this format? what is 
the advantage?


andrew



On 10/07/2017 10:52 AM, zPlus wrote:

Hello Andrew,

if I understand this correctly, I think I stumbled on the same problem
before. Concatenating XML files will not work indeed. My solution was
to convert all XML files to N-Triples, then concatenate all those
triples into a single file, and finally load only this file.
Ultimately, what I ended up with is this loop [1]. The idea is to call
RIOT with a list of files as input, instead of calling RIOT on every
file.

I hope this helps.


[1] https://notabug.org/metadb/pipeline/src/master/build.sh#L54

- Original Message -
From: users@jena.apache.org
To:"users@jena.apache.org" <users@jena.apache.org>
Cc:
Sent:Sat, 7 Oct 2017 10:17:18 -0400
Subject:loading many small rdf/xml files

  i have to load the Gutenberg projects catalog in rdf/xml format. this
is
  a collection of about 50,000 files, each containing a single record
as
  attached.

  if i try to concatenate these files into a single one the result is
not
  legal rdf/xml - there are xml doc headers:

  http://www.gutenberg.org/;>

  and similar, which can only occur once per file.

  i found a way to load each file individually with s-put and a loop,
but
  this runs extremely slowly - it is alrady running for more than 10
  hours; each file takes half a second to load (fuseki running as
localhost).

  i am sure there is a better way?

  thank you for the help!

  andrew

  --
  em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
  +43 1 58801 12710 direct
  Geoinformation, TU Wien +43 1 58801 12700 office
  Gusshausstr. 27-29 +43 1 55801 12799 fax
  1040 Wien Austria +43 676 419 25 72 mobil







--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



Re: loading many small rdf/xml files

2017-10-07 Thread Andrew U. Frank
thank you - your link indicates why the solution with calling s-put for 
each individual file is so slow.


practically - i will just wait the 10 hours and then extract the triples 
from the store.


can you understand, why somebody would select this format? what is the 
advantage?


andrew



On 10/07/2017 10:52 AM, zPlus wrote:

Hello Andrew,

if I understand this correctly, I think I stumbled on the same problem
before. Concatenating XML files will not work indeed. My solution was
to convert all XML files to N-Triples, then concatenate all those
triples into a single file, and finally load only this file.
Ultimately, what I ended up with is this loop [1]. The idea is to call
RIOT with a list of files as input, instead of calling RIOT on every
file.

I hope this helps.


[1] https://notabug.org/metadb/pipeline/src/master/build.sh#L54

- Original Message -
From: users@jena.apache.org
To:"users@jena.apache.org" <users@jena.apache.org>
Cc:
Sent:Sat, 7 Oct 2017 10:17:18 -0400
Subject:loading many small rdf/xml files

  i have to load the Gutenberg projects catalog in rdf/xml format. this
is
  a collection of about 50,000 files, each containing a single record
as
  attached.

  if i try to concatenate these files into a single one the result is
not
  legal rdf/xml - there are xml doc headers:

  http://www.gutenberg.org/;>

  and similar, which can only occur once per file.

  i found a way to load each file individually with s-put and a loop,
but
  this runs extremely slowly - it is alrady running for more than 10
  hours; each file takes half a second to load (fuseki running as
localhost).

  i am sure there is a better way?

  thank you for the help!

  andrew

  --
  em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
  +43 1 58801 12710 direct
  Geoinformation, TU Wien +43 1 58801 12700 office
  Gusshausstr. 27-29 +43 1 55801 12799 fax
  1040 Wien Austria +43 676 419 25 72 mobil





--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



loading many small rdf/xml files

2017-10-07 Thread Andrew U. Frank
i have to load the Gutenberg projects catalog in rdf/xml format. this is 
a collection of about 50,000 files, each containing a single record as 
attached.


if i try to concatenate these files into a single one the result is not 
legal rdf/xml - there are xml doc headers:


http://www.gutenberg.org/;>

and similar, which can only occur once per file.

i found a way to load each file individually with s-put and a loop, but 
this runs extremely slowly - it is alrady running for more than 10 
hours; each file takes half a second to load (fuseki running as localhost).


i am sure there is a better way?

thank you for the help!

andrew



--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



pg9630.rdf
Description: application/rdf


moving graph between tdb

2017-08-30 Thread Andrew U Frank
i have a large tdb stored dataset with multiple graphs. can i move one
graph from one to another dataset? what would be the command using http
protokoll?
thank you!
andrew





query against the default graph and a a graph with a label

2017-06-01 Thread Andrew U Frank
i try to combine two queries where one goes against the default graph in
which data were loaded and the other against a specific graph.

SELECT *
from 
WHERE {
  ?s wn:translation ?o .
}
LIMIT 25

and

SELECT *
WHERE {
  ?s lit:hl1 ?o.
}
LIMIT 25

give both the expected results.
If i add the graph to the second query (in order to formulate a query
connecting the two grahs):

SELECT *
from 
WHERE {
  ?s lit:hl1 ?o.
}
LIMIT 25

i get nothing.
How to include the "default graph" in fuseki to combine with other graphs?

thank you!
andrew


Re: Why we need Fuseki

2017-04-05 Thread Andrew U Frank
what went wrong is the HUGE competition from google, financed by
advertising. everybody has learned that search  for information is free
(except stock market and a few others). the investment in the "good
idea" is large, because you will have to compete with google's 24/7
service.

i run a sparql server reliably on a small hardware for a few users - it
is very useful, runs easily and unless you have the "wide public" as
users, with little effort. a very simplistic systemd automatic start...

when you go for dbpedia as a service for everybody you are in a serious
project, not a demonstration. i would not know how to make a business
case for such a thing, therefore i do not see your manager asking for it
either. - if it is for limited database for a limited community, the
cost is likely much smaller than any comparable solution.

andrew

-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil 
 

On 04/04/2017 06:30 PM, baran...@gmail.com wrote:
>
>
>> In practical terms hosting a public endpoint is an expensive
>> business. To take DBPedia as an example it is billions of triples and
>> so needs appropriate hardware. Let’s assume you wanted to host this
>> in Amazon EC2 and wanted to use a r3.8xlarge instance (32 cores, 244
>> GiB RAM, 2x320GB SSD, 10 GigE network) as an example. The hourly rate
>> for this is $2.66 per hour which works out as approximately $23,000
>> per year, even if we were to use a reserve instance and pay up front
>> that would still cost approximately $12,000 per year. This is before
>> we even take into account bandwidth, Storage and ongoing support
>> costs.  As has already been pointed out everybody here is volunteers,
>> we do not have any large corporate sponsors like other high profile
>> Apache projects, so where do you expect that money to come from?
>>
>>  Rob
>
> I know about the  calculations since so many years, but 'out of the
> blue' if some managers wanted a presentation with my Sparql-Query-UI
> using the well-known Dbpedia public endpoint and they wanted to see
> very fluently working query-process a bit more intelligent then
> Google, and if you come then after 10 minutes with 'Excuse me, yo know
> the costs of an endpoint...', they think, where a good idea is, there
> is also money for it in such a simple demonstration case, why is there
> no money for such a good idea? To make some simple live-experience
> with a permanently reliable public-endpoint accessible for each one to
> each time is much-much more important than harebrained SPARQL-queries.
>
> I have had always the feeling something went wrong in this story, the
> question is what?
>
> baran
>
> *
>



Re: Why we need Fuseki

2017-04-04 Thread Andrew U Frank
you can either let them access it with sparql and, for example, the
fuseki client (but other clients can do as well) or you can write a
program in any language you like and use one of the HTTP clients in this
language to send a prefabricated query to the endpoint. i do this even
to load the data, because i like the sparql 1.1 update standard more
than the more or less idosyncratic API.  the query says then something
like 

"INSERT DATA { GRAPH "   < graphname >
" {" < triples> "} }"

it is - at least for small applications (<1 G triples) and slow code to
produce the triples, fast enough.

andrew

-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil 
 

On 04/04/2017 07:41 AM, Lorenz B. wrote:
> Then let me ask you a question:
>
> Ok, so you said you have Jena and maybe loaded some data. How would you
> allow users on the web to access this data? How would you implement this?
>
>> Thank you Lorenz, I have read that website but unfortunately did not get
>> the concept. Let me try to read it again.
>>
>>
>>
>> On Mon, Apr 3, 2017 at 4:35 PM, Lorenz Buehmann <
>> buehm...@informatik.uni-leipzig.de> wrote:
>>
>>> Javed ...
>>>
>>> I'll simply cite the "slogan" from the web page [1] and recommend to
>>> read [2]
>>>
>>> "Fuseki: serving RDF data over HTTP"
>>>
>>>
>>>
>>> [1] https://jena.apache.org/documentation/serving_data/
>>>
>>> [2] https://jena.apache.org/documentation/fuseki2/
>>>
>>>
>>> On 03.04.2017 14:54, javed khan wrote:
>>>> Hi
>>>>
>>>> Why we need fuseki server in semantic web applications. We can run SPARQL
>>>> queries without it, like we do using Jena syntax.
>>>>



Re: Why we need Fuseki

2017-04-03 Thread Andrew U Frank
i like fuseki because it follows closely the standard sparql 1.1 to
allow update over https, which makes my programs more portable (to other
sparql servers which follow 1.1 - few so far).

fuseiki can easily be run as a service on a host, ready for clients to
use it...


-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil 
 

On 04/03/2017 02:54 PM, javed khan wrote:
> Hi
>
> Why we need fuseki server in semantic web applications. We can run SPARQL
> queries without it, like we do using Jena syntax.
>



Re: fuseki silently ignores insert data requests with a BOM character

2017-03-29 Thread Andrew U Frank
dear andy

thank you again for your lasting help. i changed some aspects of the
encoding and the sending of bytes and i had the impression that this
cleaned up my problem - unfortunately, i cannot test at the moment (for
some other reasons). if this is not enough, then i will use wireshark,
which i have never used but is probably a good thing to learn...

i can follow why you fuseki cannot produce better error messages, so
other tools must be learned (the problem is often, that the additional
tools contribute new ways of making errors...)

i hope i learned something with your help!

andrew


-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil 
 

On 03/29/2017 01:46 PM, Andy Seaborne wrote:
> Probably your code puts a x00 into the bytes.  x00 is illegal in
> unicode (but not java strings!).
>
> Fuseki is logging what it receives. To print it needs to be a string,
> not bytes, so it creates a string .. and goes bang.  All I can do is
> change the decoder setup to put a "illegal char" marker in the log. 
> As I said, the exact error point is not available to Java in teh
> check-fail case.
>
> URL-encoding is not related - this is an HTTP POST with the data in
> the HTTP body.
>
> Try a tool that allows you to look at the on-the-wire action (e.g.
> wireshark). Capturing inside Jetty-Fuseki has had too many places
> where the bytes have been touched. Capturing in the client or wire is
> reliable.
>
> Sorry - don't know Haskell network code.
>
> Andy
>
> On 29/03/17 08:36, Andrew U Frank wrote:
>> thank you - i am aware of this, but still wonder where the encoding on
>> my end goes wrong. it would be very helpful if the fuseki server would
>> log the input it receives for errors in the 'insert data' case. it does
>> show my input (as it is received) if i url-encode it (which is an error
>> with message and produces a copy of what is received). it does not show
>> this in the case of incorrect utf8 characters but this would be very
>> helpful to under stand where in the stack the problem lies. i will
>> experiment more.
>>
>> do you have a suggestion for a simple "web sink  to log"? could ntop be
>> used to capture the request and identify what is sent my end? any
>> suggestions on details how to?
>>
>> than you a lot for your help!
>>
>> andrew
>>



Re: fuseki silently ignores insert data requests with a BOM character

2017-03-29 Thread Andrew U Frank
thank you - i am aware of this, but still wonder where the encoding on
my end goes wrong. it would be very helpful if the fuseki server would
log the input it receives for errors in the 'insert data' case. it does
show my input (as it is received) if i url-encode it (which is an error
with message and produces a copy of what is received). it does not show
this in the case of incorrect utf8 characters but this would be very
helpful to under stand where in the stack the problem lies. i will
experiment more.

do you have a suggestion for a simple "web sink  to log"? could ntop be
used to capture the request and identify what is sent my end? any
suggestions on details how to?

than you a lot for your help!

andrew

-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil 
 

On 03/28/2017 11:48 PM, Andy Seaborne wrote:
>
>
> On 28/03/17 22:05, Andrew U Frank wrote:
>> i found that encoding the literals in the requests as latin1 i do not
>> see errors and the triples are stored.
>>
>> is this intended behaviour? for now, i have a work around.
>>
>> i look forward to your analysis of the code? when i look at the java
>> error message, i sense that there is a encoding selected? is it UTF8 or
>> latin1?
>>
>> thank you for maintaining fuseki!
>>
>> andrew
>>
>>
> For some (not all) iso-8859-1/latin1 characters, sending and reading
> back as if it were UTF-8 works.  Its the wrong character in the
> database ("�") but the reverse undoes the damage. This is not true of
> all of latin-1.
>
> Andy



Re: fuseki silently ignores insert data requests with a BOM character

2017-03-28 Thread Andrew U Frank
thank you for the hints - i use haskell and assume that between the
strings which i see and what is sent 'on the wire' is converted. i am
not familiar with your comment about the difference between utf8
encoding and utf8 on the wire. in the material that you pointed to i do
not see such a conversion mentioned. can you give me another pointer?

i will read more about what haskell does in encoding utf8. what i
understand is that a umlaut (U+00E4) is encoded in three bytes...

i assume you will fix the differences in the decoders to assure that the
return code and the store action corresponds.

thank you for the help!

andrew


-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil 
 

On 03/28/2017 10:57 PM, Andy Seaborne wrote:
>
>
> On 28/03/17 21:35, Andrew U Frank wrote:
>> the problem/bug is not related to the BOM character but seemingly to
>> many UTF-8.
>>
>> i get (consistently) a return code of 204 when the fuseki server is
>> running without -v and 500 when running with -v if any of the literatls
>> contains a "strange" (nonASCII?) UTF-8. the current problem is the
>> character ä (code point 228 - character a with diaresis, german umlaut).
>> if i remove the character, the triples (all of the request) are stored,
>> if it is in the literat, none is stored.
>
> (can we stick to hex please?)
>
> 228 = U+00E4
>
> I suspect that codepoints are not being encoded into UTF-8 correctly.
> That is what the java-based decoder that you hit via "-v" is saying.
>
> For example, U+00E4 is 3 bytes : c3 a4 0a : in UTF-8 on the wire.
>
> What is definitely wrong is sending the codepoint as a byte directly :
> xE4 or two bytes 00 E4.
>
>>
>> i understand that a request encoded as application/sparql-update must be
>> coded as UTF8 which my literal is - or is there some special encoding
>> necessary for the german a umlaut? i do not think that the triples
>> should be encoded as latin1 or similar??
>
> Can you confirm that on the wire it is c3 a4 0a?
>
>>
>> i tried to POST with curl or wget, but did not succeed (i have not much
>> experience with these outside of simplest case).
>>
>> in any case, it is likely a bug when the response with or without -v in
>> the fuseki start makes a difference?
>
> Hitting different decoders.
>
> Strictly, it is an error and it should be 500.  javacc
> bytes-to-character seems to be too lax.
>
>>
>> thank you for the help!
>>
>> andrew
>>
>>



Re: fuseki silently ignores insert data requests with a BOM character

2017-03-28 Thread Andrew U Frank
i found that encoding the literals in the requests as latin1 i do not
see errors and the triples are stored.

is this intended behaviour? for now, i have a work around.

i look forward to your analysis of the code? when i look at the java
error message, i sense that there is a encoding selected? is it UTF8 or
latin1?

thank you for maintaining fuseki!

andrew


-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil 
 

On 03/28/2017 03:35 PM, Andy Seaborne wrote:
> What storage is the Fuseki server using?  I can't reproduce the
> restart effect.
>
> The BOM is not 65257 (bytes xFE xFF) in a SPARQL Update request, it's
> bytes xEF xBB xBF.
>
> We are talking about what is on-the-wire which means UTF-8 encoded
> unicode and codepoint 65257, U+FEFF is 3 bytes in UTF-8 xEF xBB xBF
>
> http://unicode.org/faq/utf_bom.html#bom4
>
> The bytes xFE xFF are illegal as UTF-8 hence the message you see.
>
> $ echo -n $'\uFEFF' | od -t x1
> ==>
> 000 ef bb bf
> 003
>
> $ echo -n $'\xFE\xFF' | od -t x1
> ==>
> 000 fe ff
> 002
>
> The fact that the 500 does not say where the error in the input stream
> occurs is an unfortunate effect of efficient decoding by java and by
> javacc.  It processes large blocks of bytes and does not say where in
> the block the error occurred.  This is a nuisance.
>
> What is legal is to put the unicode encoding  "\uFEFF" into the SPARQL
> Update.
>
> Andy
>
>
>
> On 28/03/17 12:07, Andrew U Frank wrote:
>> thank you for your information. starting fuseki with -v gives indeed
>> more information. in this case i get
>>
>> [2017-03-28 12:45:07] Fuseki INFO  [49] POST
>> http://127.0.0.1:3030/memDB/update
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Connection: 
>> close
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => User-Agent:
>> haskell-HTTP/4000.3.5
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Host:
>> 127.0.0.1:3030
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Accept: 
>> */*
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Content-Length: 
>> 1062
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Content-Type:
>> application/sparql-update
>> [2017-03-28 12:45:07] Fuseki INFO  [49] POST /memDB :: 'update' ::
>> [application/sparql-update] ?
>> [2017-03-28 12:45:07] Fuseki WARN  [49] Runtime IO Exception (client
>> left?) RC = 500 : java.nio.charset.MalformedInputException: Input
>> length = 1
>> org.apache.jena.atlas.RuntimeIOException:
>> java.nio.charset.MalformedInputException: Input length = 1
>> at org.apache.jena.atlas.io.IO.exception(IO.java:233)
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:183)
>>
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.perform(SPARQL_Update.java:108)
>>
>> at
>> org.apache.jena.fuseki.servlets.ActionSPARQL.executeLifecycle(ActionSPARQL.java:134)
>>
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeRequest(SPARQL_UberServlet.java:356)
>>
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.serviceDispatch(SPARQL_UberServlet.java:317)
>>
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeAction(SPARQL_UberServlet.java:272)
>>
>> at
>> org.apache.jena.fuseki.servlets.ActionSPARQL.execCommonWorker(ActionSPARQL.java:85)
>>
>> at
>> org.apache.jena.fuseki.servlets.ActionBase.doCommon(ActionBase.java:81)
>> at
>> org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:73)
>>
>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>>
>> at
>> org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
>>
>> at
>> org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
>>
>> at
>> org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
>>
>> at
>> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>>
>> at
>> org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
>>
>> at
>> org.apache.shiro.web.servlet.Ab

Re: fuseki silently ignores insert data requests with a BOM character

2017-03-28 Thread Andrew U Frank
the problem/bug is not related to the BOM character but seemingly to
many UTF-8.

i get (consistently) a return code of 204 when the fuseki server is
running without -v and 500 when running with -v if any of the literatls
contains a "strange" (nonASCII?) UTF-8. the current problem is the
character ä (code point 228 - character a with diaresis, german umlaut).
if i remove the character, the triples (all of the request) are stored,
if it is in the literat, none is stored.

i understand that a request encoded as application/sparql-update must be
coded as UTF8 which my literal is - or is there some special encoding
necessary for the german a umlaut? i do not think that the triples
should be encoded as latin1 or similar??

i tried to POST with curl or wget, but did not succeed (i have not much
experience with these outside of simplest case).

in any case, it is likely a bug when the response with or without -v in
the fuseki start makes a difference?

thank you for the help!

andrew


-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil 
 

On 03/28/2017 03:35 PM, Andy Seaborne wrote:
> What storage is the Fuseki server using?  I can't reproduce the
> restart effect.
>
> The BOM is not 65257 (bytes xFE xFF) in a SPARQL Update request, it's
> bytes xEF xBB xBF.
>
> We are talking about what is on-the-wire which means UTF-8 encoded
> unicode and codepoint 65257, U+FEFF is 3 bytes in UTF-8 xEF xBB xBF
>
> http://unicode.org/faq/utf_bom.html#bom4
>
> The bytes xFE xFF are illegal as UTF-8 hence the message you see.
>
> $ echo -n $'\uFEFF' | od -t x1
> ==>
> 000 ef bb bf
> 003
>
> $ echo -n $'\xFE\xFF' | od -t x1
> ==>
> 000 fe ff
> 002
>
> The fact that the 500 does not say where the error in the input stream
> occurs is an unfortunate effect of efficient decoding by java and by
> javacc.  It processes large blocks of bytes and does not say where in
> the block the error occurred.  This is a nuisance.
>
> What is legal is to put the unicode encoding  "\uFEFF" into the SPARQL
> Update.
>
> Andy
>
>
>
> On 28/03/17 12:07, Andrew U Frank wrote:
>> thank you for your information. starting fuseki with -v gives indeed
>> more information. in this case i get
>>
>> [2017-03-28 12:45:07] Fuseki INFO  [49] POST
>> http://127.0.0.1:3030/memDB/update
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Connection: 
>> close
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => User-Agent:
>> haskell-HTTP/4000.3.5
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Host:
>> 127.0.0.1:3030
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Accept: 
>> */*
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Content-Length: 
>> 1062
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Content-Type:
>> application/sparql-update
>> [2017-03-28 12:45:07] Fuseki INFO  [49] POST /memDB :: 'update' ::
>> [application/sparql-update] ?
>> [2017-03-28 12:45:07] Fuseki WARN  [49] Runtime IO Exception (client
>> left?) RC = 500 : java.nio.charset.MalformedInputException: Input
>> length = 1
>> org.apache.jena.atlas.RuntimeIOException:
>> java.nio.charset.MalformedInputException: Input length = 1
>> at org.apache.jena.atlas.io.IO.exception(IO.java:233)
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:183)
>>
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.perform(SPARQL_Update.java:108)
>>
>> at
>> org.apache.jena.fuseki.servlets.ActionSPARQL.executeLifecycle(ActionSPARQL.java:134)
>>
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeRequest(SPARQL_UberServlet.java:356)
>>
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.serviceDispatch(SPARQL_UberServlet.java:317)
>>
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeAction(SPARQL_UberServlet.java:272)
>>
>> at
>> org.apache.jena.fuseki.servlets.ActionSPARQL.execCommonWorker(ActionSPARQL.java:85)
>>
>> at
>> org.apache.jena.fuseki.servlets.ActionBase.doCommon(ActionBase.java:81)
>> at
>> org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:73)
>>
>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>>
>> at
>

Re: fuseki silently ignores insert data requests with a BOM character

2017-03-28 Thread Andrew U Frank
the server was started with

exec /home/frank/jena/apache-jena-fuseki-2.5.0/fuseki-server -v --update
--loc=/home/frank/march19 /marchDB

and then with

exec /home/frank/jena/apache-jena-fuseki-2.5.0/fuseki-server  --update
--mem /memDB

(first with -v and got the error message, then i removed the -v and got
204). the 204 return and no insertion happens in both memory cases.


i do think it would be sufficient to inform the sender that the request
is not ok and rejected (when the position can be indicated, the better).

the "restart effect" is not produced by the restart, just the difference
between starting fuseki server with -v or not. with -v, the error 500 is
returned, without -v an ok return is returned.

my problem is only that the error message 500 (which is internally
produced) is not sent back when -v is not present. (i am, at least at
the moment, not interested to send a BOM character, it is rather an
annoying problem caused by a file i probably received from somewhere and
which i now routinely filter out. nevertheless, thank you for the
information you pointed me to. i understand now that sending a BOM
character as is not a legal literal). however, i am ONLY concerned when
i see a 200 return and the triples are not inserted.

if you cannot reproduce the effect, i could try to see if i can produce
it with using curl/wget - at the moment i tested with a program which
inserts the triples.

does your system insert triples where an (illegal) BOM character is in a
literal unencode sent (and not running with the -v flag)?

thank you for your effort and time!

andrew



-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil 
 

On 03/28/2017 03:35 PM, Andy Seaborne wrote:
> What storage is the Fuseki server using?  I can't reproduce the
> restart effect.
>
> The BOM is not 65257 (bytes xFE xFF) in a SPARQL Update request, it's
> bytes xEF xBB xBF.
>
> We are talking about what is on-the-wire which means UTF-8 encoded
> unicode and codepoint 65257, U+FEFF is 3 bytes in UTF-8 xEF xBB xBF
>
> http://unicode.org/faq/utf_bom.html#bom4
>
> The bytes xFE xFF are illegal as UTF-8 hence the message you see.
>
> $ echo -n $'\uFEFF' | od -t x1
> ==>
> 000 ef bb bf
> 003
>
> $ echo -n $'\xFE\xFF' | od -t x1
> ==>
> 000 fe ff
> 002
>
> The fact that the 500 does not say where the error in the input stream
> occurs is an unfortunate effect of efficient decoding by java and by
> javacc.  It processes large blocks of bytes and does not say where in
> the block the error occurred.  This is a nuisance.
>
> What is legal is to put the unicode encoding  "\uFEFF" into the SPARQL
> Update.
>
> Andy
>
>
>
> On 28/03/17 12:07, Andrew U Frank wrote:
>> thank you for your information. starting fuseki with -v gives indeed
>> more information. in this case i get
>>
>> [2017-03-28 12:45:07] Fuseki INFO  [49] POST
>> http://127.0.0.1:3030/memDB/update
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Connection: 
>> close
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => User-Agent:
>> haskell-HTTP/4000.3.5
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Host:
>> 127.0.0.1:3030
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Accept: 
>> */*
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Content-Length: 
>> 1062
>> [2017-03-28 12:45:07] Fuseki INFO  [49]   => Content-Type:
>> application/sparql-update
>> [2017-03-28 12:45:07] Fuseki INFO  [49] POST /memDB :: 'update' ::
>> [application/sparql-update] ?
>> [2017-03-28 12:45:07] Fuseki WARN  [49] Runtime IO Exception (client
>> left?) RC = 500 : java.nio.charset.MalformedInputException: Input
>> length = 1
>> org.apache.jena.atlas.RuntimeIOException:
>> java.nio.charset.MalformedInputException: Input length = 1
>> at org.apache.jena.atlas.io.IO.exception(IO.java:233)
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:183)
>>
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.perform(SPARQL_Update.java:108)
>>
>> at
>> org.apache.jena.fuseki.servlets.ActionSPARQL.executeLifecycle(ActionSPARQL.java:134)
>>
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeRequest(SPARQL_UberServlet.java:356)
>>
>> at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.serviceDispatch(SPARQL_UberServlet.java:317)

Re: fuseki silently ignores insert data requests with a BOM character

2017-03-28 Thread Andrew U Frank
st-ID: 28
Connection: close

i hope this is enough information that you can identify a fix to allow
the 500 response to pass through.

to reproduce the problem it seems to be enough to have a BOM  "\65279"  
character in a triple with a literal (perhaps at the front position, but
seemingly any triple in the request triggers the error response).

thank you for your effort - i like fuseki a lot!

andrew


-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil 
 

On 03/27/2017 08:48 PM, Andy Seaborne wrote:
> andrew,
>
> Which version of Fuseki is this?
>
> You can launch with "-v" to get more runtime info.
>
> Also - how are you sending the request to Fuseki?
>
> If you are parsing the string and then sending the parsed form, the
> BOM it might be that the BOM is lost because of handling (by java) of
> BOM in the middle of text:
>
> http://unicode.org/faq/utf_bom.html#bom6
>
> What exactly ends up in the Fuseki server?
>
> Andy
>
> On 26/03/17 11:52, Andrew U Frank wrote:
>> i use fuseki with the SPARQL update "INSERT DATA {...}" command, send as
>> a HTTP POST to a fuseki server.
>> this works very well except when a triple contains in a literal a BOM
>> (65279) character. Then the confirmation is still positiv (204) but the
>> triples are NOT inserted.
>>
>> the issue is not that the request with the BOM is ignored - this is
>> probably a good thing, but that a 204 confirmation is produced; some
>> information pointing to a syntax error in the SPARQL request or similar
>> is necessary.
>>
>> i cannot see if the request arrives at the fuseki server ok - is there a
>> flag i can set when starting the fuseki server to show the request as it
>> is received? i can only see that the server is receiving the POST.
>>
>> here the protocol of the sender:
>>
>> callHTTP5 :
>> request POST http://t:3030/march25/update HTTP/1.1
>> Accept: */*
>> Content-Length: 586
>> Content-Type: application/sparql-update
>>
>>
>> requestbody INSERT DATA { GRAPH <http://gerastree.at/g12>
>> {<http://gerastree.at/waterhouse-kw#>
>> <http://gerastree.at/lit_2014#titel> " the BOM "@xx  .
>> 
>> } }
>> callHTTP5 result is is Right HTTP/1.1 204 No Content
>> Date: Sun, 26 Mar 2017 10:32:08 GMT
>> Fuseki-Request-ID: 39
>> Connection: close
>>
>> the literal is  "\65279 the BOM "  - if i remove the BOM mark, the
>> contents are stored, but the response from the server is exactly the
>> same!
>>
>> please produce an appropriate error message!
>>
>> andrew
>>



fuseki silently ignores insert data requests with a BOM character

2017-03-26 Thread Andrew U Frank
i use fuseki with the SPARQL update "INSERT DATA {...}" command, send as 
a HTTP POST to a fuseki server.
this works very well except when a triple contains in a literal a BOM 
(65279) character. Then the confirmation is still positiv (204) but the 
triples are NOT inserted.


the issue is not that the request with the BOM is ignored - this is 
probably a good thing, but that a 204 confirmation is produced; some 
information pointing to a syntax error in the SPARQL request or similar 
is necessary.


i cannot see if the request arrives at the fuseki server ok - is there a 
flag i can set when starting the fuseki server to show the request as it 
is received? i can only see that the server is receiving the POST.


here the protocol of the sender:

callHTTP5 :
request POST http://t:3030/march25/update HTTP/1.1
Accept: */*
Content-Length: 586
Content-Type: application/sparql-update


requestbody INSERT DATA { GRAPH <http://gerastree.at/g12> 
{<http://gerastree.at/waterhouse-kw#> 
<http://gerastree.at/lit_2014#titel> " the BOM "@xx  .


} }
callHTTP5 result is is Right HTTP/1.1 204 No Content
Date: Sun, 26 Mar 2017 10:32:08 GMT
Fuseki-Request-ID: 39
Connection: close

the literal is  "\65279 the BOM "  - if i remove the BOM mark, the 
contents are stored, but the response from the server is exactly the same!


please produce an appropriate error message!

andrew

--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil