[jira] [Updated] (JENA-2309) Enhancing Riot for Big Data

2022-03-14 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-2309:

Issue Type: Brainstorming  (was: Improvement)

> Enhancing Riot for Big Data
> ---
>
> Key: JENA-2309
> URL: https://issues.apache.org/jira/browse/JENA-2309
> Project: Apache Jena
>  Issue Type: Brainstorming
>  Components: RIOT
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> We have successfully managed to adapt Jena Riot to quite efficiently work 
> within Apache Spark, however we needed to make certain adaption that rely on 
> brittle reflection hacks and APIs that are marked for removal (namely 
> PipedRDFIterator):
> In principle, for writing RDF data out, we implemented a mapPartition 
> operation that maps the input RDF to lines of text via StreamRDF which is 
> understood by apache spark's RDD.saveAsText();
> However, for use with Big Data we need to
>  * disable blank node relabeling
>  * preconfigure the StreamRDF with a given set of prefixes (that is 
> broadcasted to each node)
> Furthermore
>  * The default PrefixMapping implementation is very inefficient when it comes 
> to handling a dump of prefix.cc. I am using 2500 prefixes. Each RDF term in 
> the output results in a scan of the full prefix map
>  * Even if the PrefixMapping is optimized, the recently added PrefixMap 
> adapter again does scanning - and its a final class so no easy override.
> And finally, we have a use case to allow for relative IRIs in the RDF: We are 
> creating DCAT catalogs from directory content as in this file:
> DCAT catalog with relative IRIs over directory content: [work-in-progress 
> example|https://hobbitdata.informatik.uni-leipzig.de/lsqv2/dumps/dcat.trig]
> If you retrieve the file with a semantic web client (riot, rapper, etc) it 
> will automatically use the download location as the base url and thus giving 
> absolute URLs to the published artifacts - regardless under which URL that 
> directory is hosted.
> *IRIxResolver: We rely on IRIProviderJDK which states "do not use in 
> production" however it is the only one the let us achieve the goal. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/irixresolver/IRIxResolverUtils.java#L30]
>  * Prologue: We use reflection to set the resolver and would like the 
> setResolver method [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prologue/PrologueUtils.java#L65]
>  * WriterStreamRDFBase: We need to be able to create instances of 
> WriterStreamRDF classes which we can configure with our own PrefixMap 
> instance (e.g. trie-backed), and our own LabelToNode stragegy ("asGiven") - 
> [our 
> code|https://github.com/SANSA-Stack/SANSA-Stack/blob/40fa6f89f421eee22c9789973ec828ec3f970c33/sansa-spark-jena-java/src/main/java/net/sansa_stack/spark/io/rdf/output/RddRdfWriter.java#L387]
>  * PrefixMapAdapter: We need an adapter that inherits the performance 
> characteristics of the backing PrefixMapping [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMapAdapter.java#L57]
>  * PrefixMapping: We need a trie-based implementation for efficiency. We 
> created one based on the trie class in jena which on initial experiments was 
> sufficiently fast. Though we did not benchmark whether e.g. PatriciaTrie from 
> commons collection would be faster. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMappingTrie.java#L27]
> With PrefixMapTrie the profiler showed that the amout of time spent on 
> abbreviate went from ~100% to 1% - though not totally sure about standard 
> conformance here.
>  * PipedRDFIterator / AsyncParser: We can read trig as a Splittable format 
> (which is pretty cool) - however this requires being able to start and stop 
> the RDF parser at will for probing. In other words, AsyncParser needs to 
> return ClosableIterators whose close method actually stops the parsing 
> thread. Also when scanning for prefixes we want to be able to create rules 
> such as "as long as the parser emits a prefix with less than e.g. 100 
> non-prefix events in between keep looking for prefixes" - AsyncParser has the 
> API for it with EltStreamRDF but it is private.
> For future-proofness we'd have these use cases to be reflected in jena.
> Because we have sorted all the above issues mostly out I'd prefer to address 
> these things with only one or a few PRs (maybe the Closab

[jira] [Updated] (JENA-2313) Support parsing fragments of Turtle and TriG

2022-03-14 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-2313:

Description: 
An operation needs to added to building a parser to allow a prefix map to be 
set for parsing a fragment of Turtle or TriG.

The other environmental factor, the base, is already settable.

This only applies to Turtle/TriG.

N-Triples/N-quads fragments are self contained anyway.

XML and JSON formats have the concept of a complete document nested by syntax 
so any fragment needs to be incorporated into a complete document.

When not used, this feature should impose zero-cost on the parsing step.

  was:
An operation needs to added to building a parser to allow a prefix map to be 
set for parsing a fragment of Turtle or TriG.

The other environmental factor, the base, is already settable.

This only applies to Turtle/TriG.

N-Triples/N-quads fragments are self contained anyway.

XML and JSON formats have the concept of a complete document nested by syntax 
so any fragment needs to be incorporated into a complete document.



> Support parsing fragments of Turtle and TriG
> 
>
> Key: JENA-2313
> URL: https://issues.apache.org/jira/browse/JENA-2313
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Priority: Minor
> Fix For: Jena 4.5.0
>
>
> An operation needs to added to building a parser to allow a prefix map to be 
> set for parsing a fragment of Turtle or TriG.
> The other environmental factor, the base, is already settable.
> This only applies to Turtle/TriG.
> N-Triples/N-quads fragments are self contained anyway.
> XML and JSON formats have the concept of a complete document nested by syntax 
> so any fragment needs to be incorporated into a complete document.
> When not used, this feature should impose zero-cost on the parsing step.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (JENA-2313) Support parsing fragments of Turtle and TriG

2022-03-14 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-2313:

Summary: Support parsing fragments of Turtle and TriG  (was: Support 
parsing fraements of Trurle and TriG.)

> Support parsing fragments of Turtle and TriG
> 
>
> Key: JENA-2313
> URL: https://issues.apache.org/jira/browse/JENA-2313
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Priority: Minor
> Fix For: Jena 4.5.0
>
>
> An operation needs to added to building a parser to allow a prefix map to be 
> set for parsing a fragment of Turtle or TriG.
> The other environmental factor, the base, is already settable.
> This only applies to Turtle/TriG.
> N-Triples/N-quads fragments are self contained anyway.
> XML and JSON formats have the concept of a complete document nested by syntax 
> so any fragment needs to be incorporated into a complete document.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (JENA-2313) Support parsing fraements of Trurle and TriG.

2022-03-14 Thread Andy Seaborne (Jira)
Andy Seaborne created JENA-2313:
---

 Summary: Support parsing fraements of Trurle and TriG.
 Key: JENA-2313
 URL: https://issues.apache.org/jira/browse/JENA-2313
 Project: Apache Jena
  Issue Type: Improvement
  Components: RIOT
Affects Versions: Jena 4.4.0
Reporter: Andy Seaborne
 Fix For: Jena 4.5.0


An operation needs to added to building a parser to allow a prefix map to be 
set for parsing a fragment of Turtle or TriG.

The other environmental factor, the base, is already settable.

This only applies to Turtle/TriG.

N-Triples/N-quads fragments are self contained anyway.

XML and JSON formats have the concept of a complete document nested by syntax 
so any fragment needs to be incorporated into a complete document.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2312) UI must use the service endpoint URL

2022-03-14 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506250#comment-17506250
 ] 

Andy Seaborne commented on JENA-2312:
-

The clock-tick for the 4.5.0 release would be late April. 4.4.0 was end of 
January.

> UI must use the service endpoint URL
> 
>
> Key: JENA-2312
> URL: https://issues.apache.org/jira/browse/JENA-2312
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Fuseki UI
>Affects Versions: Jena 4.4.0
>Reporter: Bruno P. Kinoshita
>Assignee: Bruno P. Kinoshita
>Priority: Major
> Fix For: Jena 4.5.0
>
>
> Not sure if 4.5.0 or 4.6.0, feel free to bump up the version if a release is 
> in progress, please.
> I decided to test the PR for GeoSparql - 
> [https://github.com/apache/jena/pull/1204]
> And used the example Assembler file from 
> [https://jena.apache.org/documentation/geosparql/geosparql-assembler.html.] 
> Fuseki loaded the Assembler fine. But when I tried to query it, I realized it 
> was getting 404 errors.
> I hard-coded the `/sparql` in the UI HTTP requests, but the GeoSparql example 
> doesn't define an endpoint. The UI must be able to derive the URL from the 
> given configuration, instead of using a hard-coded value.
> Already fixed the "Info" tab. Now going through the rest of the code to fix 
> it :)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (JENA-2309) Enhancing Riot for Big Data

2022-03-13 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505930#comment-17505930
 ] 

Andy Seaborne edited comment on JENA-2309 at 3/13/22, 10:57 PM:


{{RDFParserBuilder.resolver(IRIxResolver resolver)}} exists.

It is possible to add {{RDFParserBuilder.prefixMap(pmap)}} but it will be a 
copy to initial prefix map in {{ParserProfile.makeParserProfile}}. Generally 
style : the code avoids trusting mutable external datastructures.

{code:java}
RDFParser.create().resolver(myResolver).prefixes(pmap) parse(dest);
{code}

There is also {{RDFFactory}} to look at.


was (Author: andy.seaborne):
{{RDFParserBuilder.resolver(IRIxResolver resolver)}}

Parsing with a predefined set of prefixes is a matter sending them to the 
destination first.

(passing them any other way will not work.

{code:java}
StreamRDF dest = ...
prefixes.forEach(dest.prefix(...))
RDFParser.create().resolver(myResolver). parse(dest);
{code}

There is also {{RDFFactory}}.

> Enhancing Riot for Big Data
> ---
>
> Key: JENA-2309
> URL: https://issues.apache.org/jira/browse/JENA-2309
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> We have successfully managed to adapt Jena Riot to quite efficiently work 
> within Apache Spark, however we needed to make certain adaption that rely on 
> brittle reflection hacks and APIs that are marked for removal (namely 
> PipedRDFIterator):
> In principle, for writing RDF data out, we implemented a mapPartition 
> operation that maps the input RDF to lines of text via StreamRDF which is 
> understood by apache spark's RDD.saveAsText();
> However, for use with Big Data we need to
>  * disable blank node relabeling
>  * preconfigure the StreamRDF with a given set of prefixes (that is 
> broadcasted to each node)
> Furthermore
>  * The default PrefixMapping implementation is very inefficient when it comes 
> to handling a dump of prefix.cc. I am using 2500 prefixes. Each RDF term in 
> the output results in a scan of the full prefix map
>  * Even if the PrefixMapping is optimized, the recently added PrefixMap 
> adapter again does scanning - and its a final class so no easy override.
> And finally, we have a use case to allow for relative IRIs in the RDF: We are 
> creating DCAT catalogs from directory content as in this file:
> DCAT catalog with relative IRIs over directory content: [work-in-progress 
> example|https://hobbitdata.informatik.uni-leipzig.de/lsqv2/dumps/dcat.trig]
> If you retrieve the file with a semantic web client (riot, rapper, etc) it 
> will automatically use the download location as the base url and thus giving 
> absolute URLs to the published artifacts - regardless under which URL that 
> directory is hosted.
> *IRIxResolver: We rely on IRIProviderJDK which states "do not use in 
> production" however it is the only one the let us achieve the goal. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/irixresolver/IRIxResolverUtils.java#L30]
>  * Prologue: We use reflection to set the resolver and would like the 
> setResolver method [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prologue/PrologueUtils.java#L65]
>  * WriterStreamRDFBase: We need to be able to create instances of 
> WriterStreamRDF classes which we can configure with our own PrefixMap 
> instance (e.g. trie-backed), and our own LabelToNode stragegy ("asGiven") - 
> [our 
> code|https://github.com/SANSA-Stack/SANSA-Stack/blob/40fa6f89f421eee22c9789973ec828ec3f970c33/sansa-spark-jena-java/src/main/java/net/sansa_stack/spark/io/rdf/output/RddRdfWriter.java#L387]
>  * PrefixMapAdapter: We need an adapter that inherits the performance 
> characteristics of the backing PrefixMapping [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMapAdapter.java#L57]
>  * PrefixMapping: We need a trie-based implementation for efficiency. We 
> created one based on the trie class in jena which on initial experiments was 
> sufficiently fast. Though we did not benchmark whether e.g. PatriciaTrie from 
> commons collection would be faster. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMappingTrie.java#L27]
> With PrefixMapTrie the profiler showed that the amout of time spent on 
> abbreviate went f

[jira] [Commented] (JENA-2309) Enhancing Riot for Big Data

2022-03-13 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505930#comment-17505930
 ] 

Andy Seaborne commented on JENA-2309:
-

{{RDFParserBuilder.resolver(IRIxResolver resolver)}}

Parsing with a predefined set of prefixes is a matter sending them to the 
destination first.

(passing them any other way will not work.

{code:java}
StreamRDF dest = ...
prefixes.forEach(dest.prefix(...))
RDFParser.create().resolver(myResolver). parse(dest);
{code}

There is also {{RDFFactory}}.

> Enhancing Riot for Big Data
> ---
>
> Key: JENA-2309
> URL: https://issues.apache.org/jira/browse/JENA-2309
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> We have successfully managed to adapt Jena Riot to quite efficiently work 
> within Apache Spark, however we needed to make certain adaption that rely on 
> brittle reflection hacks and APIs that are marked for removal (namely 
> PipedRDFIterator):
> In principle, for writing RDF data out, we implemented a mapPartition 
> operation that maps the input RDF to lines of text via StreamRDF which is 
> understood by apache spark's RDD.saveAsText();
> However, for use with Big Data we need to
>  * disable blank node relabeling
>  * preconfigure the StreamRDF with a given set of prefixes (that is 
> broadcasted to each node)
> Furthermore
>  * The default PrefixMapping implementation is very inefficient when it comes 
> to handling a dump of prefix.cc. I am using 2500 prefixes. Each RDF term in 
> the output results in a scan of the full prefix map
>  * Even if the PrefixMapping is optimized, the recently added PrefixMap 
> adapter again does scanning - and its a final class so no easy override.
> And finally, we have a use case to allow for relative IRIs in the RDF: We are 
> creating DCAT catalogs from directory content as in this file:
> DCAT catalog with relative IRIs over directory content: [work-in-progress 
> example|https://hobbitdata.informatik.uni-leipzig.de/lsqv2/dumps/dcat.trig]
> If you retrieve the file with a semantic web client (riot, rapper, etc) it 
> will automatically use the download location as the base url and thus giving 
> absolute URLs to the published artifacts - regardless under which URL that 
> directory is hosted.
> *IRIxResolver: We rely on IRIProviderJDK which states "do not use in 
> production" however it is the only one the let us achieve the goal. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/irixresolver/IRIxResolverUtils.java#L30]
>  * Prologue: We use reflection to set the resolver and would like the 
> setResolver method [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prologue/PrologueUtils.java#L65]
>  * WriterStreamRDFBase: We need to be able to create instances of 
> WriterStreamRDF classes which we can configure with our own PrefixMap 
> instance (e.g. trie-backed), and our own LabelToNode stragegy ("asGiven") - 
> [our 
> code|https://github.com/SANSA-Stack/SANSA-Stack/blob/40fa6f89f421eee22c9789973ec828ec3f970c33/sansa-spark-jena-java/src/main/java/net/sansa_stack/spark/io/rdf/output/RddRdfWriter.java#L387]
>  * PrefixMapAdapter: We need an adapter that inherits the performance 
> characteristics of the backing PrefixMapping [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMapAdapter.java#L57]
>  * PrefixMapping: We need a trie-based implementation for efficiency. We 
> created one based on the trie class in jena which on initial experiments was 
> sufficiently fast. Though we did not benchmark whether e.g. PatriciaTrie from 
> commons collection would be faster. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMappingTrie.java#L27]
> With PrefixMapTrie the profiler showed that the amout of time spent on 
> abbreviate went from ~100% to 1% - though not totally sure about standard 
> conformance here.
>  * PipedRDFIterator / AsyncParser: We can read trig as a Splittable format 
> (which is pretty cool) - however this requires being able to start and stop 
> the RDF parser at will for probing. In other words, AsyncParser needs to 
> return ClosableIterators whose close method actually stops the parsing 
> thread. Also when scanning for prefixes we want to be able to create rules 
> such as "as long as the parser emits a p

[jira] [Resolved] (JENA-2308) tdb2.tdbloader cannot read from stdin when graph is specified

2022-03-13 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2308.
-
Fix Version/s: Jena 4.5.0
 Assignee: Andy Seaborne
   Resolution: Fixed

> tdb2.tdbloader cannot read from stdin when graph is specified
> -
>
> Key: JENA-2308
> URL: https://issues.apache.org/jira/browse/JENA-2308
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 4.4.0
> Environment: Mac OS X 12.2.1
> apache jena 4.4.0
> loading turtle file
>Reporter: Gilles Sérasset
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 4.5.0
>
>
> When specifying a graph in tdb2.tdbloader command line, the data cannot be 
> read from stdin anymore : 
> {code:sh}
> $ cat it_dbnary_ontolex.ttl | tdb2.tdbloader --loc DB2 --syntax Turtle
>   
>  
> 18:03:45 INFO  loader  :: Loader = LoaderPhased
> 18:03:55 INFO  loader  :: Finish - index SPO
> 18:03:55 INFO  loader  :: Start replay index SPO
> 18:03:55 INFO  loader  :: Index set:  SPO => SPO->POS, SPO->OSP
> 18:03:55 INFO  loader  :: Add: 1,000,000 Index (Batch: 10,101,010 / 
> Avg: 10,101,010)
> 18:03:56 INFO  loader  :: Add: 2,000,000 Index (Batch: 782,472 / Avg: 
> 1,452,432)
> 18:03:57 INFO  loader  :: Index set:  SPO => SPO->POS, SPO->OSP 
> [2,371,919 items, 2.2 seconds]
> 18:03:58 INFO  loader  :: Finish - index OSP
> 18:03:59 INFO  loader  :: Finish - index POS
> 18:03:59 INFO  loader  :: Time = 13.153 seconds : Triples = 2,371,919 
> : Rate = 180,333 /s
> $ cat it_dbnary_ontolex.ttl | tdb2.tdbloader --loc DB2 --syntax Turtle 
> --graph \#g1
> 18:03:38 INFO  loader  :: Loader = LoaderPhased
> 18:03:38 INFO  loader  :: No files to load
> {code}
> Expected behaviour : both command should load the data.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] (JENA-2309) Enhancing Riot for Big Data

2022-03-13 Thread Andy Seaborne (Jira)


[ https://issues.apache.org/jira/browse/JENA-2309 ]


Andy Seaborne deleted comment on JENA-2309:
-

was (Author: andy.seaborne):
Reformat my first comment because JIRA \{quote} has changed to something rather 
unhelpful.

> Enhancing Riot for Big Data
> ---
>
> Key: JENA-2309
> URL: https://issues.apache.org/jira/browse/JENA-2309
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> We have successfully managed to adapt Jena Riot to quite efficiently work 
> within Apache Spark, however we needed to make certain adaption that rely on 
> brittle reflection hacks and APIs that are marked for removal (namely 
> PipedRDFIterator):
> In principle, for writing RDF data out, we implemented a mapPartition 
> operation that maps the input RDF to lines of text via StreamRDF which is 
> understood by apache spark's RDD.saveAsText();
> However, for use with Big Data we need to
>  * disable blank node relabeling
>  * preconfigure the StreamRDF with a given set of prefixes (that is 
> broadcasted to each node)
> Furthermore
>  * The default PrefixMapping implementation is very inefficient when it comes 
> to handling a dump of prefix.cc. I am using 2500 prefixes. Each RDF term in 
> the output results in a scan of the full prefix map
>  * Even if the PrefixMapping is optimized, the recently added PrefixMap 
> adapter again does scanning - and its a final class so no easy override.
> And finally, we have a use case to allow for relative IRIs in the RDF: We are 
> creating DCAT catalogs from directory content as in this file:
> DCAT catalog with relative IRIs over directory content: [work-in-progress 
> example|https://hobbitdata.informatik.uni-leipzig.de/lsqv2/dumps/dcat.trig]
> If you retrieve the file with a semantic web client (riot, rapper, etc) it 
> will automatically use the download location as the base url and thus giving 
> absolute URLs to the published artifacts - regardless under which URL that 
> directory is hosted.
> *IRIxResolver: We rely on IRIProviderJDK which states "do not use in 
> production" however it is the only one the let us achieve the goal. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/irixresolver/IRIxResolverUtils.java#L30]
>  * Prologue: We use reflection to set the resolver and would like the 
> setResolver method [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prologue/PrologueUtils.java#L65]
>  * WriterStreamRDFBase: We need to be able to create instances of 
> WriterStreamRDF classes which we can configure with our own PrefixMap 
> instance (e.g. trie-backed), and our own LabelToNode stragegy ("asGiven") - 
> [our 
> code|https://github.com/SANSA-Stack/SANSA-Stack/blob/40fa6f89f421eee22c9789973ec828ec3f970c33/sansa-spark-jena-java/src/main/java/net/sansa_stack/spark/io/rdf/output/RddRdfWriter.java#L387]
>  * PrefixMapAdapter: We need an adapter that inherits the performance 
> characteristics of the backing PrefixMapping [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMapAdapter.java#L57]
>  * PrefixMapping: We need a trie-based implementation for efficiency. We 
> created one based on the trie class in jena which on initial experiments was 
> sufficiently fast. Though we did not benchmark whether e.g. PatriciaTrie from 
> commons collection would be faster. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMappingTrie.java#L27]
> With PrefixMapTrie the profiler showed that the amout of time spent on 
> abbreviate went from ~100% to 1% - though not totally sure about standard 
> conformance here.
>  * PipedRDFIterator / AsyncParser: We can read trig as a Splittable format 
> (which is pretty cool) - however this requires being able to start and stop 
> the RDF parser at will for probing. In other words, AsyncParser needs to 
> return ClosableIterators whose close method actually stops the parsing 
> thread. Also when scanning for prefixes we want to be able to create rules 
> such as "as long as the parser emits a prefix with less than e.g. 100 
> non-prefix events in between keep looking for prefixes" - AsyncParser has the 
> API for it with EltStreamRDF but it is private.
> For future-proofness we'd have these use cases to be reflected in jena.
> Because we have sorted all the above issues mostly out I'd prefer to address 
> these things with only one or a

[jira] [Commented] (JENA-2309) Enhancing Riot for Big Data

2022-03-13 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505461#comment-17505461
 ] 

Andy Seaborne commented on JENA-2309:
-

Reformat my first comment because JITA \{quote} has changed to something rather 
unhelpful.

> Enhancing Riot for Big Data
> ---
>
> Key: JENA-2309
> URL: https://issues.apache.org/jira/browse/JENA-2309
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> We have successfully managed to adapt Jena Riot to quite efficiently work 
> within Apache Spark, however we needed to make certain adaption that rely on 
> brittle reflection hacks and APIs that are marked for removal (namely 
> PipedRDFIterator):
> In principle, for writing RDF data out, we implemented a mapPartition 
> operation that maps the input RDF to lines of text via StreamRDF which is 
> understood by apache spark's RDD.saveAsText();
> However, for use with Big Data we need to
>  * disable blank node relabeling
>  * preconfigure the StreamRDF with a given set of prefixes (that is 
> broadcasted to each node)
> Furthermore
>  * The default PrefixMapping implementation is very inefficient when it comes 
> to handling a dump of prefix.cc. I am using 2500 prefixes. Each RDF term in 
> the output results in a scan of the full prefix map
>  * Even if the PrefixMapping is optimized, the recently added PrefixMap 
> adapter again does scanning - and its a final class so no easy override.
> And finally, we have a use case to allow for relative IRIs in the RDF: We are 
> creating DCAT catalogs from directory content as in this file:
> DCAT catalog with relative IRIs over directory content: [work-in-progress 
> example|https://hobbitdata.informatik.uni-leipzig.de/lsqv2/dumps/dcat.trig]
> If you retrieve the file with a semantic web client (riot, rapper, etc) it 
> will automatically use the download location as the base url and thus giving 
> absolute URLs to the published artifacts - regardless under which URL that 
> directory is hosted.
> *IRIxResolver: We rely on IRIProviderJDK which states "do not use in 
> production" however it is the only one the let us achieve the goal. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/irixresolver/IRIxResolverUtils.java#L30]
>  * Prologue: We use reflection to set the resolver and would like the 
> setResolver method [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prologue/PrologueUtils.java#L65]
>  * WriterStreamRDFBase: We need to be able to create instances of 
> WriterStreamRDF classes which we can configure with our own PrefixMap 
> instance (e.g. trie-backed), and our own LabelToNode stragegy ("asGiven") - 
> [our 
> code|https://github.com/SANSA-Stack/SANSA-Stack/blob/40fa6f89f421eee22c9789973ec828ec3f970c33/sansa-spark-jena-java/src/main/java/net/sansa_stack/spark/io/rdf/output/RddRdfWriter.java#L387]
>  * PrefixMapAdapter: We need an adapter that inherits the performance 
> characteristics of the backing PrefixMapping [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMapAdapter.java#L57]
>  * PrefixMapping: We need a trie-based implementation for efficiency. We 
> created one based on the trie class in jena which on initial experiments was 
> sufficiently fast. Though we did not benchmark whether e.g. PatriciaTrie from 
> commons collection would be faster. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMappingTrie.java#L27]
> With PrefixMapTrie the profiler showed that the amout of time spent on 
> abbreviate went from ~100% to 1% - though not totally sure about standard 
> conformance here.
>  * PipedRDFIterator / AsyncParser: We can read trig as a Splittable format 
> (which is pretty cool) - however this requires being able to start and stop 
> the RDF parser at will for probing. In other words, AsyncParser needs to 
> return ClosableIterators whose close method actually stops the parsing 
> thread. Also when scanning for prefixes we want to be able to create rules 
> such as "as long as the parser emits a prefix with less than e.g. 100 
> non-prefix events in between keep looking for prefixes" - AsyncParser has the 
> API for it with EltStreamRDF but it is private.
> For future-proofness we'd have these use cases to be reflected in jena.
> Because we have sorted all the above is

[jira] [Comment Edited] (JENA-2309) Enhancing Riot for Big Data

2022-03-13 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505461#comment-17505461
 ] 

Andy Seaborne edited comment on JENA-2309 at 3/13/22, 1:55 PM:
---

Reformat my first comment because JIRA \{quote} has changed to something rather 
unhelpful.


was (Author: andy.seaborne):
Reformat my first comment because JITA \{quote} has changed to something rather 
unhelpful.

> Enhancing Riot for Big Data
> ---
>
> Key: JENA-2309
> URL: https://issues.apache.org/jira/browse/JENA-2309
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> We have successfully managed to adapt Jena Riot to quite efficiently work 
> within Apache Spark, however we needed to make certain adaption that rely on 
> brittle reflection hacks and APIs that are marked for removal (namely 
> PipedRDFIterator):
> In principle, for writing RDF data out, we implemented a mapPartition 
> operation that maps the input RDF to lines of text via StreamRDF which is 
> understood by apache spark's RDD.saveAsText();
> However, for use with Big Data we need to
>  * disable blank node relabeling
>  * preconfigure the StreamRDF with a given set of prefixes (that is 
> broadcasted to each node)
> Furthermore
>  * The default PrefixMapping implementation is very inefficient when it comes 
> to handling a dump of prefix.cc. I am using 2500 prefixes. Each RDF term in 
> the output results in a scan of the full prefix map
>  * Even if the PrefixMapping is optimized, the recently added PrefixMap 
> adapter again does scanning - and its a final class so no easy override.
> And finally, we have a use case to allow for relative IRIs in the RDF: We are 
> creating DCAT catalogs from directory content as in this file:
> DCAT catalog with relative IRIs over directory content: [work-in-progress 
> example|https://hobbitdata.informatik.uni-leipzig.de/lsqv2/dumps/dcat.trig]
> If you retrieve the file with a semantic web client (riot, rapper, etc) it 
> will automatically use the download location as the base url and thus giving 
> absolute URLs to the published artifacts - regardless under which URL that 
> directory is hosted.
> *IRIxResolver: We rely on IRIProviderJDK which states "do not use in 
> production" however it is the only one the let us achieve the goal. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/irixresolver/IRIxResolverUtils.java#L30]
>  * Prologue: We use reflection to set the resolver and would like the 
> setResolver method [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prologue/PrologueUtils.java#L65]
>  * WriterStreamRDFBase: We need to be able to create instances of 
> WriterStreamRDF classes which we can configure with our own PrefixMap 
> instance (e.g. trie-backed), and our own LabelToNode stragegy ("asGiven") - 
> [our 
> code|https://github.com/SANSA-Stack/SANSA-Stack/blob/40fa6f89f421eee22c9789973ec828ec3f970c33/sansa-spark-jena-java/src/main/java/net/sansa_stack/spark/io/rdf/output/RddRdfWriter.java#L387]
>  * PrefixMapAdapter: We need an adapter that inherits the performance 
> characteristics of the backing PrefixMapping [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMapAdapter.java#L57]
>  * PrefixMapping: We need a trie-based implementation for efficiency. We 
> created one based on the trie class in jena which on initial experiments was 
> sufficiently fast. Though we did not benchmark whether e.g. PatriciaTrie from 
> commons collection would be faster. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMappingTrie.java#L27]
> With PrefixMapTrie the profiler showed that the amout of time spent on 
> abbreviate went from ~100% to 1% - though not totally sure about standard 
> conformance here.
>  * PipedRDFIterator / AsyncParser: We can read trig as a Splittable format 
> (which is pretty cool) - however this requires being able to start and stop 
> the RDF parser at will for probing. In other words, AsyncParser needs to 
> return ClosableIterators whose close method actually stops the parsing 
> thread. Also when scanning for prefixes we want to be able to create rules 
> such as "as long as the parser emits a prefix with less than e.g. 100 
> non-prefix events in between keep looking for prefixes" - AsyncParser 

[jira] [Comment Edited] (JENA-2309) Enhancing Riot for Big Data

2022-03-13 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505442#comment-17505442
 ] 

Andy Seaborne edited comment on JENA-2309 at 3/13/22, 1:53 PM:
---

It is difficult to understand the details here out of context of your big data 
processing stack.

bq. However, for use with Big Data we need to disable blank node relabeling

Already possible:

{code:java}
RDFParser.source(...).labelToNode(LabelToNode.createUseLabelAsGiven()). ...
{code}

But also the internal system id for bnode in default configuration uses a 
consistent
algorithm.

See {{LabelToNode.createScopeByDocumentHash}}. 

The start of parsing a large random number is created (UUID of which 122-bits 
are
random). Blank node labels in the parser stream are combined with the UUID bits
(using MurmurHash3).  There is a LRU cache for this; being a cache, the 
consistent calculation is critical.

You are configuring each processing node anyway. Pass the same UUID to all of
them during set up.

bq. preconfigure the StreamRDF with a given set of prefixes (that is 
broadcasted to each node)

>From the description in this ticket, this isn't a Jena issue.

bq. IRIxResolver

The JDK implementation of URIs is buggy for semantic web usage. What aspect of 
{{IRIProviderJDK}} are you
relying on? Do you normalize relative IRIs? Why does {{IRIProviderJenaIRI}} not 
work for you?

See also [iri4ld|https://github.com/afs/x4ld/tree/main/iri4ld]. There is a 
provider (not in the Jena code base).

bq. Prologue: Prologue: We use reflection to set the resolver

Prologue is historical - see the graph and datasetgraph writers - they don't 
use prologues.

Why do you wish to modify one in place?

Create new one or use the constructor {{{}Prologue(PrefixMapping pmap, 
IRIxResolver resolver)}} and have a switchable {{IRIxResolver}}.

Prologues can be shared and that includes with parsed queries (historical).

bq. The default PrefixMapping implementation is very inefficient when it comes 
to handling a dump of prefix.cc. I am using 2500 prefixes. Each RDF term in the 
output results in a scan of the full prefix map


Don't use {{PrefixMapping}}, use {{PrefixMap}}.

We have looked at this issue in the past for 500+ prefixes. 

There was, for a while, a Trie-based {{PrefixMap}}. After experimentation, 
tuning {{PrefixMapStd}} with a uri to prefix cache was as
fast. A cache approach means it adapts to the case of large prefix sets with 
small data.

This is abstracted in {{PrefixMapFactory.createForOutput}}. Jena writers build 
a per-output prefix map to ensure that get {{PrefixMapStd}} and not a 
projection of, say, a TDB2-backed prefix map. See {{RDFWriter.prefixMap}}. 

bq. AsyncParser

{{AsyncParser}} reads ahead and send blocks of work to the receiver. If you 
want to synchronously control the parser, y probably want
{{PipedRDFIterator}}. If you want receiver control we can expose 
{{AsyncParser.asyncParseIterator}} and {{EltStreamRDF}}(or some variant of) to 
give receiver-side control of the incoming stream. It would also be possible to 
so this with logic in the receiving {{StreamRDF}}.

bq.I'd prefer to address these things with only one or a few PRs

As above - some or them are already addressed, some may be addressed in 
different
ways.

There are several independent changes being suggested. They have different 
timescales.

Composite PRs make it hard to review now, and hard to track back feature 
changes. Better to have a history that can be looked
back at in 1-5 years time.



was (Author: andy.seaborne):
It is difficult to understand the details here out of context of your big data 
processing stack.

{quote}However, for use with Big Data we need to 
 * disable blank node relabeling
{quote}

Already possible:

{code:java}
RDFParser.source(...).labelToNode(LabelToNode.createUseLabelAsGiven()). ...
{code}

But also the internal system id for bnode in default configuration uses a 
consistent
algorithm.

See {{LabelToNode.createScopeByDocumentHash}}. 

The start of parsing a large random number is created (UUID of which 122-bits 
are
random). Blank node labels in the parser stream are combined with the UUID bits
(using MurmurHash3).  There is a LRU cache for this; being a cache, the 
consistent calculation is critical.

You are configuring each processing node anyway. Pass the same UUID to all of
them during set up.

{quote}
 * preconfigure the StreamRDF with a given set of prefixes (that is broadcasted 
to each node)
{quote}
>From the description in this ticket, this isn't a Jena issue.

{quote}
IRIxResolver
{quote}

The JDK implementation of URIs is buggy for semantic web usage. What aspect of 
{{IRIProviderJDK}} are you
relying on? Do you normalize relative IRIs? Why does {{IRIProviderJenaIRI}} not 
work for you?

See also [iri4ld|https://github.com/afs/x4ld/tree/main/iri4ld]. There is a 
provider (not in the Jena code

[jira] [Commented] (JENA-2309) Enhancing Riot for Big Data

2022-03-13 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505459#comment-17505459
 ] 

Andy Seaborne commented on JENA-2309:
-

Have you looked at the now-retired 
[jena-elephas|https://jena.apache.org/documentation/archive/hadoop/] for ideas?



> Enhancing Riot for Big Data
> ---
>
> Key: JENA-2309
> URL: https://issues.apache.org/jira/browse/JENA-2309
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> We have successfully managed to adapt Jena Riot to quite efficiently work 
> within Apache Spark, however we needed to make certain adaption that rely on 
> brittle reflection hacks and APIs that are marked for removal (namely 
> PipedRDFIterator):
> In principle, for writing RDF data out, we implemented a mapPartition 
> operation that maps the input RDF to lines of text via StreamRDF which is 
> understood by apache spark's RDD.saveAsText();
> However, for use with Big Data we need to
>  * disable blank node relabeling
>  * preconfigure the StreamRDF with a given set of prefixes (that is 
> broadcasted to each node)
> Furthermore
>  * The default PrefixMapping implementation is very inefficient when it comes 
> to handling a dump of prefix.cc. I am using 2500 prefixes. Each RDF term in 
> the output results in a scan of the full prefix map
>  * Even if the PrefixMapping is optimized, the recently added PrefixMap 
> adapter again does scanning - and its a final class so no easy override.
> And finally, we have a use case to allow for relative IRIs in the RDF: We are 
> creating DCAT catalogs from directory content as in this file:
> DCAT catalog with relative IRIs over directory content: [work-in-progress 
> example|https://hobbitdata.informatik.uni-leipzig.de/lsqv2/dumps/dcat.trig]
> If you retrieve the file with a semantic web client (riot, rapper, etc) it 
> will automatically use the download location as the base url and thus giving 
> absolute URLs to the published artifacts - regardless under which URL that 
> directory is hosted.
> *IRIxResolver: We rely on IRIProviderJDK which states "do not use in 
> production" however it is the only one the let us achieve the goal. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/irixresolver/IRIxResolverUtils.java#L30]
>  * Prologue: We use reflection to set the resolver and would like the 
> setResolver method [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prologue/PrologueUtils.java#L65]
>  * WriterStreamRDFBase: We need to be able to create instances of 
> WriterStreamRDF classes which we can configure with our own PrefixMap 
> instance (e.g. trie-backed), and our own LabelToNode stragegy ("asGiven") - 
> [our 
> code|https://github.com/SANSA-Stack/SANSA-Stack/blob/40fa6f89f421eee22c9789973ec828ec3f970c33/sansa-spark-jena-java/src/main/java/net/sansa_stack/spark/io/rdf/output/RddRdfWriter.java#L387]
>  * PrefixMapAdapter: We need an adapter that inherits the performance 
> characteristics of the backing PrefixMapping [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMapAdapter.java#L57]
>  * PrefixMapping: We need a trie-based implementation for efficiency. We 
> created one based on the trie class in jena which on initial experiments was 
> sufficiently fast. Though we did not benchmark whether e.g. PatriciaTrie from 
> commons collection would be faster. [our 
> code|https://github.com/Scaseco/jenax/blob/dd51ef9a39013d4ddbb4806fcad36b03a4dbaa7c/jenax-arq-parent/jenax-arq-utils/src/main/java/org/aksw/jenax/arq/util/prefix/PrefixMappingTrie.java#L27]
> With PrefixMapTrie the profiler showed that the amout of time spent on 
> abbreviate went from ~100% to 1% - though not totally sure about standard 
> conformance here.
>  * PipedRDFIterator / AsyncParser: We can read trig as a Splittable format 
> (which is pretty cool) - however this requires being able to start and stop 
> the RDF parser at will for probing. In other words, AsyncParser needs to 
> return ClosableIterators whose close method actually stops the parsing 
> thread. Also when scanning for prefixes we want to be able to create rules 
> such as "as long as the parser emits a prefix with less than e.g. 100 
> non-prefix events in between keep looking for prefixes" - AsyncParser has the 
> API for it with EltStreamRDF but it is private.
> For future-proofness we'd have these use cases to be reflected in jena.
> Because we ha

[jira] [Commented] (JENA-601) Provide better support for compressed input formats

2022-03-13 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505446#comment-17505446
 ] 

Andy Seaborne commented on JENA-601:


I was pointing out my experience with wikidata.

bz2 is still there. It has its uses when file size is much less than gz. 


> Provide better support for compressed input formats
> ---
>
> Key: JENA-601
> URL: https://issues.apache.org/jira/browse/JENA-601
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 2.11.0
>Reporter: Rob Vesse
>Priority: Major
>
> Currently Jena has little or not support for compressed input formats.  There 
> are the odd cases where some consideration is given e.g.
> - {{RDFLanguages.filenameToLang()}} strips off {{.gz}} extensions to help it 
> correctly detect file types
> - HTTP responses can deal with compressed responses by virtue of Apache 
> HttpClient
> What would be nice is to have a better strategy for handling compressed 
> inputs.  For example having a registry of known compression extensions e.g. 
> {{.gz}}, {{.bz2}}, {{.deflate}} which ARQ would strip off when trying to 
> deduce format from the filename.
> It would also be useful if the various locator implementations took 
> compression into account when opening input streams as I'm fairly sure if you 
> asked ARQ to open a {{foo.nt.gz}} file it would just open a raw input stream 
> and then the reading would fail.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2309) Enhancing Riot for Big Data

2022-03-13 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505442#comment-17505442
 ] 

Andy Seaborne commented on JENA-2309:
-

It is difficult to understand the details here out of context of your big data 
processing stack.

{quote}However, for use with Big Data we need to 
 * disable blank node relabeling
{quote}

Already possible:

{code:java}
RDFParser.source(...).labelToNode(LabelToNode.createUseLabelAsGiven()). ...
{code}

But also the internal system id for bnode in default configuration uses a 
consistent
algorithm.

See {{LabelToNode.createScopeByDocumentHash}}. 

The start of parsing a large random number is created (UUID of which 122-bits 
are
random). Blank node labels in the parser stream are combined with the UUID bits
(using MurmurHash3).  There is a LRU cache for this; being a cache, the 
consistent calculation is critical.

You are configuring each processing node anyway. Pass the same UUID to all of
them during set up.

{quote}
 * preconfigure the StreamRDF with a given set of prefixes (that is broadcasted 
to each node)
{quote}
>From the description in this ticket, this isn't a Jena issue.

{quote}
IRIxResolver
{quote}

The JDK implementation of URIs is buggy for semantic web usage. What aspect of 
{{IRIProviderJDK}} are you
relying on? Do you normalize relative IRIs? Why does {{IRIProviderJenaIRI}} not 
work for you?

See also [iri4ld|https://github.com/afs/x4ld/tree/main/iri4ld]. There is a 
provider (not in the Jena code base).

{quote}
Prologue: Prologue: We use reflection to set the resolver
{quote}
Prologue is historical - see the graph and datasetgraph writers - they don't 
use prologues.

Why do you wish to modify one in place?

Create new one or use the constructor {{{}Prologue(PrefixMapping pmap, 
IRIxResolver resolver)}} and have a switchable {{IRIxResolver}}.

Prologues can be shared and that includes with parsed queries (historical).

{quote} * The default PrefixMapping implementation is very inefficient when it 
comes to handling a dump of prefix.cc. I am using 2500 prefixes. Each RDF term 
in the output results in a scan of the full prefix map
{quote}

Don't use {{PrefixMapping}}, use {{PrefixMap}}.

We have looked at this issue in the past for 500+ prefixes. 

There was, for a while, a Trie-based {{PrefixMap}}. After experimentation, 
tuning {{PrefixMapStd}} with a uri to prefix cache was as
fast. A cache approach means it adapts to the case of large prefix sets with 
small data.

This is abstracted in {{PrefixMapFactory.createForOutput}}. Jena writers build 
a per-output prefix map to ensure that get {{PrefixMapStd}} and not a 
projection of, say, a TDB2-backed prefix map. See {{RDFWriter.prefixMap}}. 

{quote}
AsyncParser
{quote}

{{AsyncParser}} reads ahead and send blocks of work to the receiver. If you 
want to synchronously control the parser, y probably want
{{PipedRDFIterator}}. If you want receiver control we can expose 
{{AsyncParser.asyncParseIterator}} and {{EltStreamRDF}}(or some variant of) to 
give receiver-side control of the incoming stream. It would also be possible to 
so this with logic in the receiving {{StreamRDF}}.

{quote}
I'd prefer to address these things with only one or a few PRs
{quote}

As above - some or them are already addressed, some may be addressed in 
different
ways.

There are several independent changes being suggested. They have different 
timescales.

Composite PRs make it hard to review now, and hard to track back feature 
changes. Better to have a history that can be looked
back at in 1-5 years time.


> Enhancing Riot for Big Data
> ---
>
> Key: JENA-2309
> URL: https://issues.apache.org/jira/browse/JENA-2309
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> We have successfully managed to adapt Jena Riot to quite efficiently work 
> within Apache Spark, however we needed to make certain adaption that rely on 
> brittle reflection hacks and APIs that are marked for removal (namely 
> PipedRDFIterator):
> In principle, for writing RDF data out, we implemented a mapPartition 
> operation that maps the input RDF to lines of text via StreamRDF which is 
> understood by apache spark's RDD.saveAsText();
> However, for use with Big Data we need to
>  * disable blank node relabeling
>  * preconfigure the StreamRDF with a given set of prefixes (that is 
> broadcasted to each node)
> Furthermore
>  * The default PrefixMapping implementation is very inefficient when it comes 
> to handling a dump of prefix.cc. I am using 2500 prefixes. Each RDF term in 
> the output results in a scan of the full prefix map
>  * Even if the PrefixMapping is optimized, the recently added PrefixMap 
> adapter again does scanning - a

[jira] [Commented] (JENA-2294) tdb2.xloader creates invalid database - later update causes wrong answers.

2022-03-12 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505305#comment-17505305
 ] 

Andy Seaborne commented on JENA-2294:
-

This affects updates to the database after tdb2.xloader has run. 

The database produced by tdb2.xloader is valid for query but not for subsequent 
updates.




> tdb2.xloader creates invalid database - later update causes wrong answers.
> --
>
> Key: JENA-2294
> URL: https://issues.apache.org/jira/browse/JENA-2294
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>
> [Report from 
> users@|https://lists.apache.org/thread/lxwcolfowh29nbc79cq867jq051sf2nh].
> Recreate with: 
> {noformat}
> rm -rf BSBM
> xloader --loc BSBM ~/Datasets/BSBM/bsbm-50k.nt.gz
> tdb2.tdbquery --loc BSBM/ --file T.rq
> tdb2.tdbloader --loader=basic --loc BSBM/ X.nt 
> tdb2.tdbquery --loc BSBM/ --file T.rq
> {noformat}
> where
> {noformat}
> ==> X.nt <==
>    .
> ==> T.rq <==
> SELECT (count(?x) AS ?C) {
>   ?x a ?T .
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-601) Provide better support for compressed input formats

2022-03-11 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505108#comment-17505108
 ] 

Andy Seaborne commented on JENA-601:


bz2 is supported already :: JENA-1554.

But beware - decompressing bz2 is significantly slower than gz in a JVM 
(possibly because bz2 is pure java). So such so that downloading a bigger gz 
file and using that can be faster to load data.

An external decompressor will be running in parallel and may well be faster.

Also see the discussion zstd :: JENA-2181.

Apache Commons Compress uses an external JNI library for zstd decompression 
which requires external native library so we'd need multiarch jars)

What other compression formats are common? 

JENA-2181 and the PR discussion shows where in the code to add them if you want 
to put in a pull request.

> Provide better support for compressed input formats
> ---
>
> Key: JENA-601
> URL: https://issues.apache.org/jira/browse/JENA-601
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 2.11.0
>Reporter: Rob Vesse
>Priority: Major
>
> Currently Jena has little or not support for compressed input formats.  There 
> are the odd cases where some consideration is given e.g.
> - {{RDFLanguages.filenameToLang()}} strips off {{.gz}} extensions to help it 
> correctly detect file types
> - HTTP responses can deal with compressed responses by virtue of Apache 
> HttpClient
> What would be nice is to have a better strategy for handling compressed 
> inputs.  For example having a registry of known compression extensions e.g. 
> {{.gz}}, {{.bz2}}, {{.deflate}} which ARQ would strip off when trying to 
> deduce format from the filename.
> It would also be useful if the various locator implementations took 
> compression into account when opening input streams as I'm fairly sure if you 
> asked ARQ to open a {{foo.nt.gz}} file it would just open a raw input stream 
> and then the reading would fail.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-2306) Prepare for switching to JSON-LD 1.1 as the default.

2022-03-11 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2306.
-
Resolution: Done

> Prepare for switching to JSON-LD 1.1 as the default.
> 
>
> Key: JENA-2306
> URL: https://issues.apache.org/jira/browse/JENA-2306
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 4.5.0
>
>
> Prepare for switching to JSON-LD 1.1 as the default.
> This is complicated by the fact that, with Java11, java.net.http fails to 
> setup the HTTP/2 connection to schema.org.
> Works with Java17.
> The java.net.http handling is within Titanium; it is not the switchable RIOT 
> default HttpClient.
> It is not a titanium bug - it is reproducible without Titanium.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-2303) Resource leak in RDFConnectionAdapter

2022-03-10 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2303.
-
Fix Version/s: Jena 4.5.0
   Resolution: Fixed

> Resource leak in RDFConnectionAdapter
> -
>
> Key: JENA-2303
> URL: https://issues.apache.org/jira/browse/JENA-2303
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Assignee: Andy Seaborne
>Priority: Minor
> Fix For: Jena 4.5.0
>
>
> * There is a resource leak in RDFConnectionAdapter:
> {code:java}
> public class RDFConnectionAdapter implements RDFConnection {
> @Override
> public QueryExecution query(Query query) {
> QueryExec queryExec = get().query(query); // This line leaks resources
> return adapt(get().query(query));
> }
> }
> {code}
> * Also, please make access to the underlying links in RDFLinkModular public 
> in order to reduce the amount of re-wrapping needed when e.g. only only 
> wrapping the LinkSparqlQuery with custom query rewriting.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2306) Prepare for switching to JSON-LD 1.1 as the default.

2022-03-09 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503889#comment-17503889
 ] 

Andy Seaborne commented on JENA-2306:
-

{code:java}
    public static void main(String...args) throws Exception {
        try {
            HttpClient hc = HttpClient.newHttpClient();
            HttpRequest req = HttpRequest.newBuilder(new 
URI("https://schema.org/";))
                    //.version(Version.HTTP_1_1)
                    .build();
            HttpResponse res = hc.send(req, BodyHandlers.ofString());
            if ( res.statusCode() != 200 ) {
                System.out.println(res.statusCode());
            }
            else {
                String x = res.body();
                System.out.println("OK");
            }
        } catch (Exception ex) {
            System.err.println(ex.getMessage());
        }
 }
{code}

> Prepare for switching to JSON-LD 1.1 as the default.
> 
>
> Key: JENA-2306
> URL: https://issues.apache.org/jira/browse/JENA-2306
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 4.5.0
>
>
> Prepare for switching to JSON-LD 1.1 as the default.
> This is complicated by the fact that, with Java11, java.net.http fails to 
> setup the HTTP/2 connection to schema.org.
> Works with Java17.
> The java.net.http handling is within Titanium; it is not the switchable RIOT 
> default HttpClient.
> It is not a titanium bug - it is reproducible without Titanium.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (JENA-2306) Prepare for switching to JSON-LD 1.1 as the default.

2022-03-09 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-2306:

Description: 
Prepare for switching to JSON-LD 1.1 as the default.

This is complicated by the fact that, with Java11, java.net.http fails to setup 
the HTTP/2 connection to schema.org.

Works with Java17.

The java.net.http handling is within Titanium; it is not the switchable RIOT 
default HttpClient.

It is not a titanium bug - it is reproducible without Titanium.

 

  was:
Prepare for switching to JSON-LD 1.1 as the default.

This is complicated by the fact that, with Java11, java.net.http fails to setup 
the HTTP/2 connection to schema.org.

Works with Java17.

The java.net.http handling is within Titanium; it is not the switchable RIOT 
default HttpClient.

 


> Prepare for switching to JSON-LD 1.1 as the default.
> 
>
> Key: JENA-2306
> URL: https://issues.apache.org/jira/browse/JENA-2306
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 4.5.0
>
>
> Prepare for switching to JSON-LD 1.1 as the default.
> This is complicated by the fact that, with Java11, java.net.http fails to 
> setup the HTTP/2 connection to schema.org.
> Works with Java17.
> The java.net.http handling is within Titanium; it is not the switchable RIOT 
> default HttpClient.
> It is not a titanium bug - it is reproducible without Titanium.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (JENA-2306) Prepare for switching to JSON-LD 1.1 as the default.

2022-03-09 Thread Andy Seaborne (Jira)
Andy Seaborne created JENA-2306:
---

 Summary: Prepare for switching to JSON-LD 1.1 as the default.
 Key: JENA-2306
 URL: https://issues.apache.org/jira/browse/JENA-2306
 Project: Apache Jena
  Issue Type: Improvement
  Components: RIOT
Affects Versions: Jena 4.4.0
Reporter: Andy Seaborne
Assignee: Andy Seaborne
 Fix For: Jena 4.5.0


Prepare for switching to JSON-LD 1.1 as the default.

This is complicated by the fact that, with Java11, java.net.http fails to setup 
the HTTP/2 connection to schema.org.

Works with Java17.

The java.net.http handling is within Titanium; it is not the switchable RIOT 
default HttpClient.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (JENA-2305) Add GraalVM instructions to the documentation for jena-fuseki-docker

2022-03-09 Thread Andy Seaborne (Jira)
Andy Seaborne created JENA-2305:
---

 Summary: Add GraalVM instructions to the documentation for 
jena-fuseki-docker
 Key: JENA-2305
 URL: https://issues.apache.org/jira/browse/JENA-2305
 Project: Apache Jena
  Issue Type: Improvement
Reporter: Andy Seaborne






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (JENA-2304) NT,NQ : Colon in blank node labels.

2022-03-09 Thread Andy Seaborne (Jira)
Andy Seaborne created JENA-2304:
---

 Summary: NT,NQ : Colon in blank node labels.
 Key: JENA-2304
 URL: https://issues.apache.org/jira/browse/JENA-2304
 Project: Apache Jena
  Issue Type: Improvement
  Components: RIOT
Affects Versions: Jena 4.4.0
Reporter: Andy Seaborne
Assignee: Andy Seaborne


This started with [PR#1217|https://github.com/apache/jena/pull/1217] but that 
change breaks Turtle.

It turns out there is a mistake in the N-triples and N-quads specs.

[https://lists.w3.org/Archives/Public/public-rdf-comments/2022Mar/.html]

{{:}} was not intended to be in the blank node labels definition. It is not in 
Turtle, or SPARQL, and N-triples is supposed to be a subset of Turtle.

A fix need to change \{{TokenizerText.readBlankNodeLabel}} and to pass in a 
flag saying whether in N-Triples mode or not.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (JENA-2302) RowSetReaderJSON is not streaming

2022-03-08 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-2302:

Issue Type: Improvement  (was: Bug)

> RowSetReaderJSON is not streaming
> -
>
> Key: JENA-2302
> URL: https://issues.apache.org/jira/browse/JENA-2302
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> Retrieving all data from our TDB2 endpoint with jena 4.5.0-SNAPSHOT is no 
> longer streaming for the JSON format. I tracked the issue to RowSetReaderJson 
> which reads everything into in memory (and then checks whether it is a SPARQL 
> ASK result)
> {code:java}
> public class RowSetReaderJson {
> private void parse(InputStream in) {
> JsonObject obj = JSON.parse(in); // !!! Loads everything !!!
> // Boolean?
> if ( obj.hasKey(kBoolean) ) { ... }
> }
> }
> {code}
> Streaming works when switching the to RS_XML in the example below:
> {code:java}
> public class Main {
> public static void main(String[] args) {
> System.out.println("Test Started");
> try (QueryExecution qe = QueryExecutionHTTP.create()
> 
> .acceptHeader(ResultSetLang.RS_JSON.getContentType().getContentTypeStr())
> .endpoint("http://moin.aksw.org/sparql";).queryString("SELECT 
> * { ?s ?p ?o }").build()) {
> qe.execSelect().forEachRemaining(System.out::println);
> }
> System.out.println("Done");
> }
> }
> {code}
> For completeness, I can rule out any problem with TDB2 because streaming of 
> JSON works just fine with: 
> {code:bash}
> curl --data-urlencode "query=select * { ?s ?p ?o }"  
> "http://moin.aksw.org/sparql";
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2302) RowSetReaderJSON is not streaming

2022-03-08 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502941#comment-17502941
 ] 

Andy Seaborne commented on JENA-2302:
-

A quick glance and it looks good.

It needs to cope with the results-then-head case, not necessarily be as 
performant. If that situation occurs in the wild, I would expect it i more 
likely with small results rather that large. It is possible to determine the 
variables from the query itself, there is no need to scan the results.
 # Is results-then-head tested? It probably isn't in the ARQ test suite (please 
add!)
 # I don't see ASK results covered. Maybe I missed that.
 # Do you have performance measurements?
 # Formally {{DataBag}} is not order preserving (IIRC).
 # When does it spill? Or even in-memory only for the delayed results.
 # Please use constants for keyworks : {{JSONResultsKW}}
 # I didn't see coverage of the legacy case {{kTypedLiteral}} "typed-literal"
 # No author tags please.

GSON:
 # How big is GSON (with dependencies, i.e. impact on Fuseki)?
 # Is the proposed code only using the parser with no data mapping? (This is 
the risk for injection attacks that Jackson went through).

Two things next:
 * PR
 * Could you please email users@ to say the work is in-progress? See if we can 
identify any corner cases.

The JSON results are quite important so we have to take care that a release is 
not going to cause problems.

 

> RowSetReaderJSON is not streaming
> -
>
> Key: JENA-2302
> URL: https://issues.apache.org/jira/browse/JENA-2302
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> Retrieving all data from our TDB2 endpoint with jena 4.5.0-SNAPSHOT is no 
> longer streaming for the JSON format. I tracked the issue to RowSetReaderJson 
> which reads everything into in memory (and then checks whether it is a SPARQL 
> ASK result)
> {code:java}
> public class RowSetReaderJson {
> private void parse(InputStream in) {
> JsonObject obj = JSON.parse(in); // !!! Loads everything !!!
> // Boolean?
> if ( obj.hasKey(kBoolean) ) { ... }
> }
> }
> {code}
> Streaming works when switching the to RS_XML in the example below:
> {code:java}
> public class Main {
> public static void main(String[] args) {
> System.out.println("Test Started");
> try (QueryExecution qe = QueryExecutionHTTP.create()
> 
> .acceptHeader(ResultSetLang.RS_JSON.getContentType().getContentTypeStr())
> .endpoint("http://moin.aksw.org/sparql";).queryString("SELECT 
> * { ?s ?p ?o }").build()) {
> qe.execSelect().forEachRemaining(System.out::println);
> }
> System.out.println("Done");
> }
> }
> {code}
> For completeness, I can rule out any problem with TDB2 because streaming of 
> JSON works just fine with: 
> {code:bash}
> curl --data-urlencode "query=select * { ?s ?p ?o }"  
> "http://moin.aksw.org/sparql";
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2303) Resource leak in RDFConnectionAdapter

2022-03-08 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502843#comment-17502843
 ] 

Andy Seaborne commented on JENA-2303:
-

OK - fix and expose those components.

(not that functionality of RDFLinkModular exactly; they are used for operations 
and grumble if the link component is null. UnsupportedOperation not NPE)

> Resource leak in RDFConnectionAdapter
> -
>
> Key: JENA-2303
> URL: https://issues.apache.org/jira/browse/JENA-2303
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Assignee: Andy Seaborne
>Priority: Minor
>
> * There is a resource leak in RDFConnectionAdapter:
> {code:java}
> public class RDFConnectionAdapter implements RDFConnection {
> @Override
> public QueryExecution query(Query query) {
> QueryExec queryExec = get().query(query); // This line leaks resources
> return adapt(get().query(query));
> }
> }
> {code}
> * Also, please make access to the underlying links in RDFLinkModular public 
> in order to reduce the amount of re-wrapping needed when e.g. only only 
> wrapping the LinkSparqlQuery with custom query rewriting.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (JENA-2303) Resource leak in RDFConnectionAdapter

2022-03-08 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne reassigned JENA-2303:
---

Assignee: Andy Seaborne

> Resource leak in RDFConnectionAdapter
> -
>
> Key: JENA-2303
> URL: https://issues.apache.org/jira/browse/JENA-2303
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Assignee: Andy Seaborne
>Priority: Minor
>
> * There is a resource leak in RDFConnectionAdapter:
> {code:java}
> public class RDFConnectionAdapter implements RDFConnection {
> @Override
> public QueryExecution query(Query query) {
> QueryExec queryExec = get().query(query); // This line leaks resources
> return adapt(get().query(query));
> }
> }
> {code}
> * Also, please make access to the underlying links in RDFLinkModular public 
> in order to reduce the amount of re-wrapping needed when e.g. only only 
> wrapping the LinkSparqlQuery with custom query rewriting.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2303) Resource leak in RDFConnectionAdapter

2022-03-07 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502522#comment-17502522
 ] 

Andy Seaborne commented on JENA-2303:
-

Please explain.

 

> Resource leak in RDFConnectionAdapter
> -
>
> Key: JENA-2303
> URL: https://issues.apache.org/jira/browse/JENA-2303
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Minor
>
> * There is a resource leak in RDFConnectionAdapter:
> {code:java}
> public class RDFConnectionAdapter implements RDFConnection {
> @Override
> public QueryExecution query(Query query) {
> QueryExec queryExec = get().query(query); // This line leaks resources
> return adapt(get().query(query));
> }
> }
> {code}
> * Also, please make access to the underlying links in RDFLinkModular public 
> in order to reduce the amount of re-wrapping needed when e.g. only only 
> wrapping the LinkSparqlQuery with custom query rewriting.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (JENA-2302) RowSetReaderJSON is not streaming

2022-03-07 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502517#comment-17502517
 ] 

Andy Seaborne edited comment on JENA-2302 at 3/7/22, 7:14 PM:
--

It's quite unlikely to have repeated headers but on the web, "things happen". 
As the application may not have control of the data coming in, robustness is 
nice. But if the community prefers streaming, we can have both so corner cases 
can be handled by the app.

"header after the body" - if the sender assembles some JSON (i.e. a complete 
result set, correct order) and then serialses it, it is at the mercy of the 
JSON serializer used as to the output order. Many sort and "head" < "results".

The Jena JSON parser is streaming. The streaming part isn't accessible in the 
release. (Why our own parser? When it was written the state of parsers was 
different that it is today. All we want is a parser, not an ORM which leads to 
security problems as shown by jackson and GSON.)

We also have use of jakarta.json in the code base.

And jackson but that is (IIRC) just for jsonld-java which may get replaced as 
JSON 1.1 becomes dominant in the wild.
{quote}Would that be of interest?
{quote}
Yes.
{quote}when XML was the default result set format
{quote}
Content negotiation is settable!


was (Author: andy.seaborne):
It's quite unlikely to have repeated headers but on the web, "things happen". 
As the application may not have control of the data coming in, robustness is

"header after the body" - if the sender assembles some JSON (i.e. complete 
result set, correct order) and then serialses it, it is at the mercy of the 
JSON serializer used.

The Jena JSON parser is streaming. The streaming part isn't accessible in the 
release. (Why our own parser? When it was written the state of parsers was 
different that it is today. All we want is a parser, not an ORM which leads to 
security problems.)

We also have use of jakarta.json in the code base.

And jackson but that is (IIRC) just for jsonld-java which may get replaced as 
JSON 1.1 becomes dominant in the wild.
{quote}Would that be of interest?
{quote}
Yes.
{quote}when XML was the default result set format
{quote}
Content negotiation is settable!

> RowSetReaderJSON is not streaming
> -
>
> Key: JENA-2302
> URL: https://issues.apache.org/jira/browse/JENA-2302
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> Retrieving all data from our TDB2 endpoint with jena 4.5.0-SNAPSHOT is no 
> longer streaming for the JSON format. I tracked the issue to RowSetReaderJson 
> which reads everything into in memory (and then checks whether it is a SPARQL 
> ASK result)
> {code:java}
> public class RowSetReaderJson {
> private void parse(InputStream in) {
> JsonObject obj = JSON.parse(in); // !!! Loads everything !!!
> // Boolean?
> if ( obj.hasKey(kBoolean) ) { ... }
> }
> }
> {code}
> Streaming works when switching the to RS_XML in the example below:
> {code:java}
> public class Main {
> public static void main(String[] args) {
> System.out.println("Test Started");
> try (QueryExecution qe = QueryExecutionHTTP.create()
> 
> .acceptHeader(ResultSetLang.RS_JSON.getContentType().getContentTypeStr())
> .endpoint("http://moin.aksw.org/sparql";).queryString("SELECT 
> * { ?s ?p ?o }").build()) {
> qe.execSelect().forEachRemaining(System.out::println);
> }
> System.out.println("Done");
> }
> }
> {code}
> For completeness, I can rule out any problem with TDB2 because streaming of 
> JSON works just fine with: 
> {code:bash}
> curl --data-urlencode "query=select * { ?s ?p ?o }"  
> "http://moin.aksw.org/sparql";
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2302) RowSetReaderJSON is not streaming

2022-03-07 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502517#comment-17502517
 ] 

Andy Seaborne commented on JENA-2302:
-

It's quite unlikely to have repeated headers but on the web, "things happen". 
As the application may not have control of the data coming in, robustness is

"header after the body" - if the sender assembles some JSON (i.e. complete 
result set, correct order) and then serialses it, it is at the mercy of the 
JSON serializer used.

The Jena JSON parser is streaming. The streaming part isn't accessible in the 
release. (Why our own parser? When it was written the state of parsers was 
different that it is today. All we want is a parser, not an ORM which leads to 
security problems.)

We also have use of jakarta.json in the code base.

And jackson but that is (IIRC) just for jsonld-java which may get replaced as 
JSON 1.1 becomes dominant in the wild.
{quote}Would that be of interest?
{quote}
Yes.
{quote}when XML was the default result set format
{quote}
Content negotiation is settable!

> RowSetReaderJSON is not streaming
> -
>
> Key: JENA-2302
> URL: https://issues.apache.org/jira/browse/JENA-2302
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> Retrieving all data from our TDB2 endpoint with jena 4.5.0-SNAPSHOT is no 
> longer streaming for the JSON format. I tracked the issue to RowSetReaderJson 
> which reads everything into in memory (and then checks whether it is a SPARQL 
> ASK result)
> {code:java}
> public class RowSetReaderJson {
> private void parse(InputStream in) {
> JsonObject obj = JSON.parse(in); // !!! Loads everything !!!
> // Boolean?
> if ( obj.hasKey(kBoolean) ) { ... }
> }
> }
> {code}
> Streaming works when switching the to RS_XML in the example below:
> {code:java}
> public class Main {
> public static void main(String[] args) {
> System.out.println("Test Started");
> try (QueryExecution qe = QueryExecutionHTTP.create()
> 
> .acceptHeader(ResultSetLang.RS_JSON.getContentType().getContentTypeStr())
> .endpoint("http://moin.aksw.org/sparql";).queryString("SELECT 
> * { ?s ?p ?o }").build()) {
> qe.execSelect().forEachRemaining(System.out::println);
> }
> System.out.println("Done");
> }
> }
> {code}
> For completeness, I can rule out any problem with TDB2 because streaming of 
> JSON works just fine with: 
> {code:bash}
> curl --data-urlencode "query=select * { ?s ?p ?o }"  
> "http://moin.aksw.org/sparql";
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2302) RowSetReaderJSON is not streaming

2022-03-07 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502348#comment-17502348
 ] 

Andy Seaborne commented on JENA-2302:
-

{quote}is no longer streaming{quote}

I'm not clear here - what version was streaming?

The JSON result reader has always been non-streaming at least since Jena 3.6.0  
- RowSetReaderJSON is the previous ResultSetReaderJSON ported.

In XML, the order of elements is prescribed by the XML schema. The {{}} 
tag comes before {{}} each tag appears once. Streaming is possible 
-StAX.

JSON offers no such guarantee. What is more, in JSON a key can appear twice; 
conventionally, the second key takes precedence.

It would be possible to parse optimistically but handing partially read 
buffered streams to streaming parser once it is known to be stream-suitable is 
not a simple matter.

Parsing the results as JSON then processing the JSON data structure is robust. 
It is also robust against partial failures.

Fuseki will write in stream order but the parser is general. As separate 
Fuseki-specific parser is possible.

The fastest stream choice for Fuseki is the binary 
{{application/sparql-results+thrift}}.


> RowSetReaderJSON is not streaming
> -
>
> Key: JENA-2302
> URL: https://issues.apache.org/jira/browse/JENA-2302
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 4.5.0
>Reporter: Claus Stadler
>Priority: Major
>
> Retrieving all data from our TDB2 endpoint with jena 4.5.0-SNAPSHOT is no 
> longer streaming for the JSON format. I tracked the issue to RowSetReaderJson 
> which reads everything into in memory (and then checks whether it is a SPARQL 
> ASK result)
> {code:java}
> public class RowSetReaderJson {
> private void parse(InputStream in) {
> JsonObject obj = JSON.parse(in); // !!! Loads everything !!!
> // Boolean?
> if ( obj.hasKey(kBoolean) ) { ... }
> }
> }
> {code}
> Streaming works when switching the to RS_XML in the example below:
> {code:java}
> public class Main {
> public static void main(String[] args) {
> System.out.println("Test Started");
> try (QueryExecution qe = QueryExecutionHTTP.create()
> 
> .acceptHeader(ResultSetLang.RS_JSON.getContentType().getContentTypeStr())
> .endpoint("http://moin.aksw.org/sparql";).queryString("SELECT 
> * { ?s ?p ?o }").build()) {
> qe.execSelect().forEachRemaining(System.out::println);
> }
> System.out.println("Done");
> }
> }
> {code}
> For completeness, I can rule out any problem with TDB2 because streaming of 
> JSON works just fine with: 
> {code:bash}
> curl --data-urlencode "query=select * { ?s ?p ?o }"  
> "http://moin.aksw.org/sparql";
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (JENA-2301) Allow to activate/deactivate shapes

2022-03-06 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501949#comment-17501949
 ] 

Andy Seaborne edited comment on JENA-2301 at 3/6/22, 12:41 PM:
---

Thanks for the background; the principle that a collection of shapes are many 
multiple purposes is not unreasonable. Splitting by shape-imports would work 
but assumes the shapes were designed for that in the first place.

Are we talking about target shapes or node/property shapes that are no targets 
but are part of a target shape?

I do prefer a solution that is a long term basis for the system (lessons 
learned from the past!). There may be usages for other varying the shapes maybe 
closed, maybe changing the severity or message, or things I can't imagine. As 
above, setters/mutable shapes bring quite a few assumptions.

A variation of (1) is to add a parameter to validation that is a 
{{java.util.Predicate}} - a test of whether to evaluate a shape (target or 
non-target). Maybe that's more than a boolean test, more a function with a 
"yes, no-and-don't-recurse, no-but-do-recurse" result (recurse is follow 
property shapes and sh:node). It would enable deactivating a property shape 
transversed by one part of the validation that is also accessible by other 
routes 

Any solution that isn't modify the shapes graph is Jena-specific so add (4):

A graph that is the view of another graph with modifications (also called a 
"buffered graph"). The base graph isn't changed, the layer over the top records 
adds and deletes and makes {{Graph.find}} give the right answers and the base 
graph is not modified.


was (Author: andy.seaborne):
Thanks for the background; the principle that a collection of shapes are many 
multiple purposes is not unreasonable. Splitting by shape-imports would work 
but assumes the shapes were designed for that in the first place.

Are we talking about target shapes or node/property shapes that are no targets 
but are part of a target shape?

I do prefer a solution that is a long term basis for the system (lessons 
learned from the past!). There may be usages for other varying the shapes maybe 
closed, maybe changing the severity or message, or things I can't imagine. As 
above, setters/mutable shapes bring quite a few assumptions.

A variation of (1) is to add a parameter to validation that is a 
{{java.util.Predicate}} - a test of whether to evaluate a shape (target or 
non-target). Maybe that's more than a boolean test, more a function with a 
"yes, no-and-don't-recurse, no-but-do-recurse"  result (recurse is follow 
property shapes and sh:node).

Any solution that isn't modify the shapes graph is Jena-specific so add (4):

A graph that is the view of another graph with modifications (also called a 
"buffered graph"). The base graph isn't changed, the layer over the top records 
adds and deletes and makes {{Graph.find}} give the right answers and the base 
graph is not modified.

> Allow to activate/deactivate shapes
> ---
>
> Key: JENA-2301
> URL: https://issues.apache.org/jira/browse/JENA-2301
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: SHACL
>Reporter: Florian Kleedorfer
>Priority: Trivial
>
> I have a use case in which I need to toggle a shape's 'deactivated` flag 
> programatically.
> The current workaround I'm using is to extract the shape's subgraph, set the 
> flag in the extracted graph, and then parse it again. If there is a simpler 
> workaround, I'll be happy to use that. 
> Otherwise, here are my suggestions (for Shape.java):
> * Make the `deactivated` flag non-final and add a setter.
> * Provide a way to clone a shape and change the flag underway.
> Cheers!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2301) Allow to activate/deactivate shapes

2022-03-06 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501949#comment-17501949
 ] 

Andy Seaborne commented on JENA-2301:
-

Thanks for the background; the principle that a collection of shapes are many 
multiple purposes is not unreasonable. Splitting by shape-imports would work 
but assumes the shapes were designed for that in the first place.

Are we talking about target shapes or node/property shapes that are no targets 
but are part of a target shape?

I do prefer a solution that is a long term basis for the system (lessons 
learned from the past!). There may be usages for other varying the shapes maybe 
closed, maybe changing the severity or message, or things I can't imagine. As 
above, setters/mutable shapes bring quite a few assumptions.

A variation of (1) is to add a parameter to validation that is a 
{{java.util.Predicate}} - a test of whether to evaluate a shape (target or 
non-target). Maybe that's more than a boolean test, more a function with a 
"yes, no-and-don't-recurse, no-but-do-recurse"  result (recurse is follow 
property shapes and sh:node).

Any solution that isn't modify the shapes graph is Jena-specific so add (4):

A graph that is the view of another graph with modifications (also called a 
"buffered graph"). The base graph isn't changed, the layer over the top records 
adds and deletes and makes {{Graph.find}} give the right answers and the base 
graph is not modified.

> Allow to activate/deactivate shapes
> ---
>
> Key: JENA-2301
> URL: https://issues.apache.org/jira/browse/JENA-2301
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: SHACL
>Reporter: Florian Kleedorfer
>Priority: Trivial
>
> I have a use case in which I need to toggle a shape's 'deactivated` flag 
> programatically.
> The current workaround I'm using is to extract the shape's subgraph, set the 
> flag in the extracted graph, and then parse it again. If there is a simpler 
> workaround, I'll be happy to use that. 
> Otherwise, here are my suggestions (for Shape.java):
> * Make the `deactivated` flag non-final and add a setter.
> * Provide a way to clone a shape and change the flag underway.
> Cheers!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2301) Allow to activate/deactivate shapes

2022-03-05 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501800#comment-17501800
 ] 

Andy Seaborne commented on JENA-2301:
-

A bit more context would be great - what's the reason behind this?

There are benefits to having immutable-once-built data structures (sharing, 
concurrency, putting in maps and sets).

I can think of several ways:
* Provide the validator with a set of "not this time" shapes.
* If its many shapes, copy/change the graph and reparse. Parsing the graph is 
fast.
* The other general approach is to provide "shape transforms". Shapes have the 
visitor pattern already, adding transformers like query syntax of query algebra 
is doable. It is a copy-on-write ... which works properly if the elements are 
immutable once built.


> Allow to activate/deactivate shapes
> ---
>
> Key: JENA-2301
> URL: https://issues.apache.org/jira/browse/JENA-2301
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: SHACL
>Reporter: Florian Kleedorfer
>Priority: Trivial
>
> I have a use case in which I need to toggle a shape's 'deactivated` flag 
> programatically.
> The current workaround I'm using is to extract the shape's subgraph, set the 
> flag in the extracted graph, and then parse it again. If there is a simpler 
> workaround, I'll be happy to use that. 
> Otherwise, here are my suggestions (for Shape.java):
> * Make the `deactivated` flag non-final and add a setter.
> * Provide a way to clone a shape and change the flag underway.
> Cheers!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-674) add arguments for user & password in arq.rsparql command line

2022-03-05 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-674.

Resolution: Won't Do

No contribution received; unclear level of usage.

Workaround: Use the {{--service}} argument.


> add arguments for user & password in arq.rsparql command line
> -
>
> Key: JENA-674
> URL: https://issues.apache.org/jira/browse/JENA-674
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Cmd line tools
>Reporter: Jean-Marc Vanel
>Priority: Minor
>
> Here is a (Scala) Jena code sample that works for authentication in SPARQL :
> {noformat}
>   val queryExecution: QueryExecution =
> QueryExecutionFactory.sparqlService(service, query)
>   setContext(queryExecution)
>   val r = queryExecution.execSelect()
>   private def setContext(queryExecution: QueryExecution) {
> queryExecution match {
>   case qe: QueryEngineHTTP => 
> qe.setBasicAuthentication("me_myself_an_eye", "".toCharArray())
> }
>   }
> {noformat}
> Also , it would be useful to add an argument for giving the SPARQL as a 
> string .
> Finally, it seems that the doc. for command line tools is no more on Apache 
> site. I found this doc outside :
> http://richard.cyganiak.de/blog/wp-content/uploads/2013/09/jena-sparql-cli-v1.pdf



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (JENA-674) add arguments for user & password in arq.rsparql command line

2022-03-05 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-674.
--

> add arguments for user & password in arq.rsparql command line
> -
>
> Key: JENA-674
> URL: https://issues.apache.org/jira/browse/JENA-674
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Cmd line tools
>Reporter: Jean-Marc Vanel
>Priority: Minor
>
> Here is a (Scala) Jena code sample that works for authentication in SPARQL :
> {noformat}
>   val queryExecution: QueryExecution =
> QueryExecutionFactory.sparqlService(service, query)
>   setContext(queryExecution)
>   val r = queryExecution.execSelect()
>   private def setContext(queryExecution: QueryExecution) {
> queryExecution match {
>   case qe: QueryEngineHTTP => 
> qe.setBasicAuthentication("me_myself_an_eye", "".toCharArray())
> }
>   }
> {noformat}
> Also , it would be useful to add an argument for giving the SPARQL as a 
> string .
> Finally, it seems that the doc. for command line tools is no more on Apache 
> site. I found this doc outside :
> http://richard.cyganiak.de/blog/wp-content/uploads/2013/09/jena-sparql-cli-v1.pdf



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (JENA-331) XSDDatatype does not properly handle URIs

2022-03-05 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-331.
--

> XSDDatatype does not properly handle URIs
> -
>
> Key: JENA-331
> URL: https://issues.apache.org/jira/browse/JENA-331
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Jena
>Affects Versions: Jena 2.7.3
>Reporter: Claude Warren
>Priority: Minor
> Attachments: JENA-331.patch
>
>
> Using the TypeMapper to convert a URI to a literal and back again fails as a 
> String is returned not a URI.
> Test code:
> import com.hp.hpl.jena.datatypes.RDFDatatype;
> import com.hp.hpl.jena.datatypes.TypeMapper;
> import com.hp.hpl.jena.rdf.model.Literal;
> import com.hp.hpl.jena.rdf.model.ResourceFactory;
> import java.net.URI;
> import org.junit.Assert;
> import org.junit.Test;
> public class XSDDatatypeTest
> {
>  @Test
>  public void testURIConversion() throws Exception
>  {
>TypeMapper typeMapper = TypeMapper.getInstance();
>RDFDatatype dt = typeMapper.getTypeByClass( java.net.URI.class );
>URI uri = new URI("http://example.com"; );
>String lexicalForm = dt.unparse( uri );
>Literal l = ResourceFactory.createTypedLiteral( lexicalForm, dt );
>   
>Object o = dt.parse( l.getLexicalForm()  );
>Assert.assertEquals( uri, o );
>  }
> }



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-331) XSDDatatype does not properly handle URIs

2022-03-05 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-331.

Resolution: Won't Do

No response.

> XSDDatatype does not properly handle URIs
> -
>
> Key: JENA-331
> URL: https://issues.apache.org/jira/browse/JENA-331
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Jena
>Affects Versions: Jena 2.7.3
>Reporter: Claude Warren
>Priority: Minor
> Attachments: JENA-331.patch
>
>
> Using the TypeMapper to convert a URI to a literal and back again fails as a 
> String is returned not a URI.
> Test code:
> import com.hp.hpl.jena.datatypes.RDFDatatype;
> import com.hp.hpl.jena.datatypes.TypeMapper;
> import com.hp.hpl.jena.rdf.model.Literal;
> import com.hp.hpl.jena.rdf.model.ResourceFactory;
> import java.net.URI;
> import org.junit.Assert;
> import org.junit.Test;
> public class XSDDatatypeTest
> {
>  @Test
>  public void testURIConversion() throws Exception
>  {
>TypeMapper typeMapper = TypeMapper.getInstance();
>RDFDatatype dt = typeMapper.getTypeByClass( java.net.URI.class );
>URI uri = new URI("http://example.com"; );
>String lexicalForm = dt.unparse( uri );
>Literal l = ResourceFactory.createTypedLiteral( lexicalForm, dt );
>   
>Object o = dt.parse( l.getLexicalForm()  );
>Assert.assertEquals( uri, o );
>  }
> }



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-169) Add a search box to the Jena website

2022-03-05 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-169.

Resolution: Won't Do

> Add a search box to the Jena website
> 
>
> Key: JENA-169
> URL: https://issues.apache.org/jira/browse/JENA-169
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Web site
>Reporter: Paolo Castagna
>Priority: Minor
> Attachments: Screen Shot 2018-12-29 at 19.13.14-fullpage.png, 
> image-2018-12-29-19-17-20-592.png, image-2018-12-29-19-19-55-983.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It would be good to have a search box on the Jena website.
> With Google users can use "site:" to restrict a query to a specific website, 
> for example: "something site:incubator.apache.org/jena".
>  However, this is limited to a single website, you cannot use site: more than 
> once.
>  We could make this easier for people having:
> {code:java}
> http://www.google.com/search"; method="get">
> 
>  onclick="if(this.value == '...')
> {this.value = ''}
> "/>
> 
> {code}
> Another option (IMHO more interesting) is to use Google Customised Search: 
> [http://www.google.com/cse/]
>  For example: 
> [http://www.google.co.uk/cse/home?cx=009507611290970701536:-nmuokitb-0]
> You can then configure the websites you want to be used for a search, in this 
> case (for example):
>  [http://incubator.apache.org/jena/*]
>  [http://openjena.org/*]
>  [http://markmail.org/*]
>  [http://svn.apache.org/repos/asf/incubator/jena/Experimental/*]
>  [http://svn.apache.org/repos/asf/incubator/jena/Scratch/*]
>  [http://svn.apache.org/repos/asf/incubator/jena/Jena2/*]
>  https://issues.apache.org/jira/browse/JENA-*
>  ...
> You can have this on your website:
> {code:java}
> Loading
> http://www.google.co.uk/jsapi"; type="text/javascript">
> 
> google.load('search', '1',
> {language : 'en'}
> );
> google.setOnLoadCallback(function()
> { var customSearchControl = new google.search.CustomSearchControl( 
> '009507611290970701536:-nmuokitb-0'); 
> customSearchControl.setResultSetSize(google.search.Search.FILTERED_CSE_RESULTSET);
>  customSearchControl.draw('cse'); }
> , true);
> 
>  href="http://www.google.com/cse/style/look/default.css"; type="text/css" 
> />{code}
> Some useful (old!) quotes from Jakob Nielsen on "search":
> "Search is an important part of any big website. When users want to search, 
> they typically scan the homepage looking for "the little box where I can 
> type," so your search should be a box. [Make your search box at least 25 
> characters wide,] so it can accommodate multiple words without obscuring 
> parts of the user's query.
> (Update: Based on more recent findings, my recommendation is now to make the 
> search box 27 characters wide. This and other new guidelines are covered in 
> my tutorial on Fundamental Guidelines for Web Usability at the annual 
> Usability Week conference.)"
>  – [http://www.useit.com/alertbox/20020512.html] (2002)
> "This is a small point, but there's no reason to label the search box if 
> there's a "Search" button right next to it. Interaction design's less is more 
> principle tells us that extra elements in a dialogue distract users from the 
> salient points and reduce their ability to understand an interface."
>  – [http://www.useit.com/alertbox/20031110.html] (2003)
> Jakob Nielsen put the search box on to bottom right corner on his website. I 
> find that a good choice, but I am not sure it fits in the current layout. 
>  If that is not possible, having the search box elsewhere IMHO is better than 
> not having it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (JENA-169) Add a search box to the Jena website

2022-03-05 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-169.
--

> Add a search box to the Jena website
> 
>
> Key: JENA-169
> URL: https://issues.apache.org/jira/browse/JENA-169
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Web site
>Reporter: Paolo Castagna
>Priority: Minor
> Attachments: Screen Shot 2018-12-29 at 19.13.14-fullpage.png, 
> image-2018-12-29-19-17-20-592.png, image-2018-12-29-19-19-55-983.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It would be good to have a search box on the Jena website.
> With Google users can use "site:" to restrict a query to a specific website, 
> for example: "something site:incubator.apache.org/jena".
>  However, this is limited to a single website, you cannot use site: more than 
> once.
>  We could make this easier for people having:
> {code:java}
> http://www.google.com/search"; method="get">
> 
>  onclick="if(this.value == '...')
> {this.value = ''}
> "/>
> 
> {code}
> Another option (IMHO more interesting) is to use Google Customised Search: 
> [http://www.google.com/cse/]
>  For example: 
> [http://www.google.co.uk/cse/home?cx=009507611290970701536:-nmuokitb-0]
> You can then configure the websites you want to be used for a search, in this 
> case (for example):
>  [http://incubator.apache.org/jena/*]
>  [http://openjena.org/*]
>  [http://markmail.org/*]
>  [http://svn.apache.org/repos/asf/incubator/jena/Experimental/*]
>  [http://svn.apache.org/repos/asf/incubator/jena/Scratch/*]
>  [http://svn.apache.org/repos/asf/incubator/jena/Jena2/*]
>  https://issues.apache.org/jira/browse/JENA-*
>  ...
> You can have this on your website:
> {code:java}
> Loading
> http://www.google.co.uk/jsapi"; type="text/javascript">
> 
> google.load('search', '1',
> {language : 'en'}
> );
> google.setOnLoadCallback(function()
> { var customSearchControl = new google.search.CustomSearchControl( 
> '009507611290970701536:-nmuokitb-0'); 
> customSearchControl.setResultSetSize(google.search.Search.FILTERED_CSE_RESULTSET);
>  customSearchControl.draw('cse'); }
> , true);
> 
>  href="http://www.google.com/cse/style/look/default.css"; type="text/css" 
> />{code}
> Some useful (old!) quotes from Jakob Nielsen on "search":
> "Search is an important part of any big website. When users want to search, 
> they typically scan the homepage looking for "the little box where I can 
> type," so your search should be a box. [Make your search box at least 25 
> characters wide,] so it can accommodate multiple words without obscuring 
> parts of the user's query.
> (Update: Based on more recent findings, my recommendation is now to make the 
> search box 27 characters wide. This and other new guidelines are covered in 
> my tutorial on Fundamental Guidelines for Web Usability at the annual 
> Usability Week conference.)"
>  – [http://www.useit.com/alertbox/20020512.html] (2002)
> "This is a small point, but there's no reason to label the search box if 
> there's a "Search" button right next to it. Interaction design's less is more 
> principle tells us that extra elements in a dialogue distract users from the 
> salient points and reduce their ability to understand an interface."
>  – [http://www.useit.com/alertbox/20031110.html] (2003)
> Jakob Nielsen put the search box on to bottom right corner on his website. I 
> find that a good choice, but I am not sure it fits in the current layout. 
>  If that is not possible, having the search box elsewhere IMHO is better than 
> not having it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-84) Clearing the cache in an OntModel should clear the ModelMaker cache as well

2022-03-05 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-84.
---
Resolution: Won't Fix

> Clearing the cache in an OntModel should clear the ModelMaker cache as well
> ---
>
> Key: JENA-84
> URL: https://issues.apache.org/jira/browse/JENA-84
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Ontology API
>Reporter: Ian Dickinson
>Assignee: Ian Dickinson
>Priority: Minor
>
> There is a questionable design feature in which ModelMakers retain references 
> to models they have opened, making it difficult to re-read a file into the 
> same model. This particularly causes user confusion with respect to OntModel 
> imports. As a workaround in the short term, OntModel could provide a robust 
> clearCache operation which also clears the ModelMaker cache. Medium term, we 
> should review whether ModelMaker's memory is desirable or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (JENA-84) Clearing the cache in an OntModel should clear the ModelMaker cache as well

2022-03-05 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-84.
-

> Clearing the cache in an OntModel should clear the ModelMaker cache as well
> ---
>
> Key: JENA-84
> URL: https://issues.apache.org/jira/browse/JENA-84
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Ontology API
>Reporter: Ian Dickinson
>Assignee: Ian Dickinson
>Priority: Minor
>
> There is a questionable design feature in which ModelMakers retain references 
> to models they have opened, making it difficult to re-read a file into the 
> same model. This particularly causes user confusion with respect to OntModel 
> imports. As a workaround in the short term, OntModel could provide a robust 
> clearCache operation which also clears the ModelMaker cache. Medium term, we 
> should review whether ModelMaker's memory is desirable or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (JENA-31) Add possibility to connect serializers to prefix service

2022-03-05 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-31.
-

> Add possibility to connect serializers to prefix service
> 
>
> Key: JENA-31
> URL: https://issues.apache.org/jira/browse/JENA-31
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Jena
>Reporter: Reto Gmür
>Priority: Major
>
> Currently the serializers want to access the whole set of available 
> prefimappings, instead they shold only ask for a prefix suggestion for uris 
> actually in the model, that way it is possible to connect services which can 
> suggest prefixes for a huge number of uris which is not reasonably feasible 
> now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-31) Add possibility to connect serializers to prefix service

2022-03-05 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-31.
---
Resolution: Done

{{PrefixMappingUtils.calcInUsePrefixMapping}}

> Add possibility to connect serializers to prefix service
> 
>
> Key: JENA-31
> URL: https://issues.apache.org/jira/browse/JENA-31
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Jena
>Reporter: Reto Gmür
>Priority: Major
>
> Currently the serializers want to access the whole set of available 
> prefimappings, instead they shold only ask for a prefix suggestion for uris 
> actually in the model, that way it is possible to connect services which can 
> suggest prefixes for a huge number of uris which is not reasonably feasible 
> now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-2225) TDB/TDB2 dataset size stat serialized incorrectly for large datasets

2022-03-05 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2225.
-
Fix Version/s: Jena 4.5.0
   (was: Jena 4.4.0)
   Resolution: Fixed

> TDB/TDB2 dataset size stat serialized incorrectly for large datasets
> 
>
> Key: JENA-2225
> URL: https://issues.apache.org/jira/browse/JENA-2225
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB, TDB2
>Affects Versions: Jena 4.3.1
>Reporter: Lorenz Bühmann
>Assignee: Andy Seaborne
>Priority: Minor
> Fix For: Jena 4.5.0
>
> Attachments: stats.opt.gz
>
>
> When computing the TDB/TDB2 stats via CLI the size will be serialized 
> incorrectly for large datasets.
> For example for latest Wikidata Truthy we get
> {noformat}
> (count -1983667112)){noformat}
> This happens because for both the corresponding `Stats.java` class does 
> enforce an Integer type Node though the value is a long type:
> {code:java}
> if ( count >= 0 )
> addPair(meta.getList(), StatsMatcher.COUNT, 
> NodeFactoryExtra.intToNode((int)count)) ; {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-2285) Java Heap error when there is an optional in service block

2022-03-04 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2285.
-
Resolution: Information Provided

No response - presumed finished.

Short of seeing the real query, not much more to say.


> Java Heap error when there is an optional in service block
> --
>
> Key: JENA-2285
> URL: https://issues.apache.org/jira/browse/JENA-2285
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Priority: Major
>
> Here is the query:
> {code:java}
> PREFIX owl: 
> PREFIX rdf: 
> PREFIX rdfs: 
> PREFIX wdt: 
> PREFIX p: 
> PREFIX pq: 
> PREFIX ps: 
> PREFIX psv: 
> PREFIX wikibase: 
> SELECT ?wikidata_city_iri ?website
> WHERE {
>   BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)
>   BIND(IRI(?id) AS ?wikidata_city_iri) .  
>   SERVICE  {
> #Querying website   
>     OPTIONAL {
>       ?wikidata_city_iri wdt:P856 ?website.
>     } . 
>   }.
> } {code}
> When the query is executed I get"Java Heap Error" after, as I guess, Java 
> runs out of memory in the pool. The trace of the error is below.
> When OPTIONAL is commented out, then I get "no data" as expected.
> {code:java}
>  09:16:08 INFO  Fuseki          :: [5] POST 
> http://127.0.0.1:3030/WattTour/sparql
> 09:16:08 INFO  Fuseki          :: [5] Query = PREFIX owl: 
>  PREFIX rdf: 
>  PREFIX rdfs: 
>  PREFIX wdt: 
>  PREFIX p: 
>  PREFIX pq: 
>  PREFIX ps: 
>  PREFIX psv: 
>  PREFIX wikibase: 
>  SELECT ?wikidata_city_iri ?website WHERE {   
> BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)   BIND(IRI(?id) AS 
> ?wikidata_city_iri) .    SERVICE  { 
> #Querying website    OPTIONAL {       ?wikidata_city_iri wdt:P856 ?website.   
>   } .    }. }
> 09:18:43 WARN  HttpChannel     :: /$/ping
> javax.servlet.ServletException: Filtered request failed.
>         at 
> org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:384)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:284)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:247)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:506) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1378)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.j

[jira] [Closed] (JENA-2285) Java Heap error when there is an optional in service block

2022-03-04 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-2285.
---

> Java Heap error when there is an optional in service block
> --
>
> Key: JENA-2285
> URL: https://issues.apache.org/jira/browse/JENA-2285
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Priority: Major
>
> Here is the query:
> {code:java}
> PREFIX owl: 
> PREFIX rdf: 
> PREFIX rdfs: 
> PREFIX wdt: 
> PREFIX p: 
> PREFIX pq: 
> PREFIX ps: 
> PREFIX psv: 
> PREFIX wikibase: 
> SELECT ?wikidata_city_iri ?website
> WHERE {
>   BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)
>   BIND(IRI(?id) AS ?wikidata_city_iri) .  
>   SERVICE  {
> #Querying website   
>     OPTIONAL {
>       ?wikidata_city_iri wdt:P856 ?website.
>     } . 
>   }.
> } {code}
> When the query is executed I get"Java Heap Error" after, as I guess, Java 
> runs out of memory in the pool. The trace of the error is below.
> When OPTIONAL is commented out, then I get "no data" as expected.
> {code:java}
>  09:16:08 INFO  Fuseki          :: [5] POST 
> http://127.0.0.1:3030/WattTour/sparql
> 09:16:08 INFO  Fuseki          :: [5] Query = PREFIX owl: 
>  PREFIX rdf: 
>  PREFIX rdfs: 
>  PREFIX wdt: 
>  PREFIX p: 
>  PREFIX pq: 
>  PREFIX ps: 
>  PREFIX psv: 
>  PREFIX wikibase: 
>  SELECT ?wikidata_city_iri ?website WHERE {   
> BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)   BIND(IRI(?id) AS 
> ?wikidata_city_iri) .    SERVICE  { 
> #Querying website    OPTIONAL {       ?wikidata_city_iri wdt:P856 ?website.   
>   } .    }. }
> 09:18:43 WARN  HttpChannel     :: /$/ping
> javax.servlet.ServletException: Filtered request failed.
>         at 
> org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:384)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:284)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:247)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:506) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1378)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:463) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server

[jira] [Resolved] (JENA-2293) SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named graph

2022-03-04 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2293.
-
Fix Version/s: Jena 4.5.0
   Resolution: Fixed

> SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named graph
> 
>
> Key: JENA-2293
> URL: https://issues.apache.org/jira/browse/JENA-2293
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Richard Cyganiak
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 4.5.0
>
> Attachments: SPARQLUpdateTest.java
>
>
> When executing SPARQL Update requests against a dataset that does not 
> automatically create new graphs, COPY or MOVE operations with a non-existing 
> graph as the target will result in a NullPointerException. The same happens 
> when adding SILENT.
> I would expect these requests to result in UpdateExceptions, with a message 
> pointing out the non-existing graph. Or nothing in the case of SILENT.
> The attached JUnit 4 test demonstrates this by running queries against a 
> DatasetGraphOne:
> COPY DEFAULT TO 
> COPY SILENT DEFAULT TO 
> MOVE DEFAULT TO 
> MOVE SILENT DEFAULT TO 
> Each request produces an NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2293) SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named graph

2022-03-04 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501275#comment-17501275
 ] 

Andy Seaborne commented on JENA-2293:
-

Done with:
Commit 1e075c9c52b74b0479a474813bed5b274c9ea2e8 in jena's branch 
refs/heads/main from Andy Seaborne
[ https://gitbox.apache.org/repos/asf?p=jena.git;h=1e075c9 ]

Commit b8736221e0057439ee7714637b193baaeb518c0e in jena's branch 
refs/heads/main from Andy Seaborne
[ https://gitbox.apache.org/repos/asf?p=jena.git;h=b873622 ]


> SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named graph
> 
>
> Key: JENA-2293
> URL: https://issues.apache.org/jira/browse/JENA-2293
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Richard Cyganiak
>Assignee: Andy Seaborne
>Priority: Major
> Attachments: SPARQLUpdateTest.java
>
>
> When executing SPARQL Update requests against a dataset that does not 
> automatically create new graphs, COPY or MOVE operations with a non-existing 
> graph as the target will result in a NullPointerException. The same happens 
> when adding SILENT.
> I would expect these requests to result in UpdateExceptions, with a message 
> pointing out the non-existing graph. Or nothing in the case of SILENT.
> The attached JUnit 4 test demonstrates this by running queries against a 
> DatasetGraphOne:
> COPY DEFAULT TO 
> COPY SILENT DEFAULT TO 
> MOVE DEFAULT TO 
> MOVE SILENT DEFAULT TO 
> Each request produces an NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2294) tdb2.xloader creates invalid database - later update causes wrong answers.

2022-03-04 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501274#comment-17501274
 ] 

Andy Seaborne commented on JENA-2294:
-

(ignore the commits - they have the wrong ticket id)

> tdb2.xloader creates invalid database - later update causes wrong answers.
> --
>
> Key: JENA-2294
> URL: https://issues.apache.org/jira/browse/JENA-2294
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>
> [Report from 
> users@|https://lists.apache.org/thread/lxwcolfowh29nbc79cq867jq051sf2nh].
> Recreate with: 
> {noformat}
> rm -rf BSBM
> xloader --loc BSBM ~/Datasets/BSBM/bsbm-50k.nt.gz
> tdb2.tdbquery --loc BSBM/ --file T.rq
> tdb2.tdbloader --loader=basic --loc BSBM/ X.nt 
> tdb2.tdbquery --loc BSBM/ --file T.rq
> {noformat}
> where
> {noformat}
> ==> X.nt <==
>    .
> ==> T.rq <==
> SELECT (count(?x) AS ?C) {
>   ?x a ?T .
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-2298) Add convenience methods to RDFParser to directly read into a fresh model or dataset

2022-03-04 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2298.
-
Resolution: Done

> Add convenience methods to RDFParser to directly read into a fresh model or 
> dataset
> ---
>
> Key: JENA-2298
> URL: https://issues.apache.org/jira/browse/JENA-2298
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 4.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-2299) RDFWriter: Rename/migrate from create(...) to source(...)

2022-03-04 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2299.
-
Resolution: Done

> RDFWriter: Rename/migrate from create(...) to source(...)
> -
>
> Key: JENA-2299
> URL: https://issues.apache.org/jira/browse/JENA-2299
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: RIOT
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 4.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2225) TDB/TDB2 dataset size stat serialized incorrectly for large datasets

2022-03-03 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500776#comment-17500776
 ] 

Andy Seaborne commented on JENA-2225:
-

Full stats (gz)? It is the looking for other places that it will be useful for. 
Even just highlighting where large numbers occur.

8 byte integers are no big deal these days! Might as well ensure it's all longs.

 

> TDB/TDB2 dataset size stat serialized incorrectly for large datasets
> 
>
> Key: JENA-2225
> URL: https://issues.apache.org/jira/browse/JENA-2225
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB, TDB2
>Affects Versions: Jena 4.3.1
>Reporter: Lorenz Bühmann
>Assignee: Andy Seaborne
>Priority: Minor
> Fix For: Jena 4.4.0
>
> Attachments: stats.opt
>
>
> When computing the TDB/TDB2 stats via CLI the size will be serialized 
> incorrectly for large datasets.
> For example for latest Wikidata Truthy we get
> {noformat}
> (count -1983667112)){noformat}
> This happens because for both the corresponding `Stats.java` class does 
> enforce an Integer type Node though the value is a long type:
> {code:java}
> if ( count >= 0 )
> addPair(meta.getList(), StatsMatcher.COUNT, 
> NodeFactoryExtra.intToNode((int)count)) ; {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2225) TDB/TDB2 dataset size stat serialized incorrectly for large datasets

2022-03-03 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500739#comment-17500739
 ] 

Andy Seaborne commented on JENA-2225:
-

[~LorenzB] – Please could you attach the stats file to the ticket to use for 
testing?

> TDB/TDB2 dataset size stat serialized incorrectly for large datasets
> 
>
> Key: JENA-2225
> URL: https://issues.apache.org/jira/browse/JENA-2225
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB, TDB2
>Affects Versions: Jena 4.3.1
>Reporter: Lorenz Bühmann
>Assignee: Andy Seaborne
>Priority: Minor
> Fix For: Jena 4.4.0
>
>
> When computing the TDB/TDB2 stats via CLI the size will be serialized 
> incorrectly for large datasets.
> For example for latest Wikidata Truthy we get
> {noformat}
> (count -1983667112)){noformat}
> This happens because for both the corresponding `Stats.java` class does 
> enforce an Integer type Node though the value is a long type:
> {code:java}
> if ( count >= 0 )
> addPair(meta.getList(), StatsMatcher.COUNT, 
> NodeFactoryExtra.intToNode((int)count)) ; {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (JENA-2225) TDB/TDB2 dataset size stat serialized incorrectly for large datasets

2022-03-03 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500711#comment-17500711
 ] 

Andy Seaborne edited comment on JENA-2225 at 3/3/22, 1:05 PM:
--

Not sure if opening a new issue would be better, but I guess we're not done 
here. We didn't recognize this because apparently I didn't know TDB2 assumes 
stats file in TDB2_LOCATION/DataXXX:

Now that the stats are being loaded, the change to long values leads to 
additional parse errors during the reordering setup/application because there 
are still integer values assumed:

{noformat}
java.lang.NumberFormatException: For input string: "1525095"
at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Integer.parseInt(Integer.java:652)
at java.base/java.lang.Integer.parseInt(Integer.java:770)
at org.apache.jena.sparql.sse.Item.asInteger(Item.java:275)
at 
org.apache.jena.sparql.engine.optimizer.StatsMatcher.init(StatsMatcher.java:123)
at 
org.apache.jena.sparql.engine.optimizer.StatsMatcher.(StatsMatcher.java:97)
at 
org.apache.jena.sparql.engine.optimizer.reorder.ReorderLib.weighted(ReorderLib.java:84)
at 
org.apache.jena.tdb2.store.TDB2StorageBuilder.chooseReorderTransformation(TDB2StorageBuilder.java:352)
at 
org.apache.jena.tdb2.store.TDB2StorageBuilder.build(TDB2StorageBuilder.java:112)
at 
org.apache.jena.tdb2.sys.StoreConnection.make(StoreConnection.java:91)
at 
org.apache.jena.tdb2.sys.StoreConnection.connectCreate(StoreConnection.java:59)
at 
org.apache.jena.tdb2.sys.DatabaseOps.createSwitchable(DatabaseOps.java:100)
at org.apache.jena.tdb2.sys.DatabaseOps.create(DatabaseOps.java:81)
at 
org.apache.jena.tdb2.sys.DatabaseConnection.build(DatabaseConnection.java:101)
at 
org.apache.jena.tdb2.sys.DatabaseConnection.lambda$make$0(DatabaseConnection.java:72)
at 
java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
at 
org.apache.jena.tdb2.sys.DatabaseConnection.make(DatabaseConnection.java:72)
at 
org.apache.jena.tdb2.sys.DatabaseConnection.connectCreate(DatabaseConnection.java:61)
at 
org.apache.jena.tdb2.sys.DatabaseConnection.connectCreate(DatabaseConnection.java:52)
at 
org.apache.jena.tdb2.DatabaseMgr.DB_ConnectCreate(DatabaseMgr.java:41)
at 
org.apache.jena.tdb2.DatabaseMgr.connectDatasetGraph(DatabaseMgr.java:46)
at org.apache.jena.tdb2.TDB2Factory.connectDataset(TDB2Factory.java:40)
at tdb2.cmdline.ModTDBDataset.createDataset(ModTDBDataset.java:105)
at arq.cmdline.ModDataset.getDataset(ModDataset.java:35)
at arq.query.getDataset(query.java:179)
at arq.query.queryExec(query.java:226)
at arq.query.exec(query.java:157)
at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:87)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43)
at tdb2.tdbquery.main(tdbquery.java:30)
{noformat}


was (Author: lorenzb):
Not sure if opening a new issue would be better, but I guess we're not done 
here. We didn't recognize this because apparently I didn't know TDB2 assumes 
stats file in TDB2_LOCATION/DataXXX:

Now that the stats are being loaded, the change to long values leads to 
additional parse errors during the reordering setup/application because there 
are still integer values assumed:

{{
java.lang.NumberFormatException: For input string: "1525095"
at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Integer.parseInt(Integer.java:652)
at java.base/java.lang.Integer.parseInt(Integer.java:770)
at org.apache.jena.sparql.sse.Item.asInteger(Item.java:275)
at 
org.apache.jena.sparql.engine.optimizer.StatsMatcher.init(StatsMatcher.java:123)
at 
org.apache.jena.sparql.engine.optimizer.StatsMatcher.(StatsMatcher.java:97)
at 
org.apache.jena.sparql.engine.optimizer.reorder.ReorderLib.weighted(ReorderLib.java:84)
at 
org.apache.jena.tdb2.store.TDB2StorageBuilder.chooseReorderTransformation(TDB2StorageBuilder.java:352)
at 
org.apache.jena.tdb2.store.TDB2StorageBuilder.build(TDB2StorageBuilder.java:112)
at 
org.apache.jena.tdb2.sys.StoreConnection.make(StoreConnection.java:91)
at 
org.apache.jena.tdb2.sys.StoreConnection.connectCreate(StoreConnection.java:59)
at 
org.apache.jena.tdb2.sys.DatabaseOps.createSwitchable(DatabaseOps.java:100)
at org.apache.jena.tdb2.sys.DatabaseOps.create(DatabaseOps.java:81)
at 
org.apache.jena.tdb2.sys.DatabaseConnection.build(DatabaseConnection.java:101)
at 
org.apache.jena.tdb2.sys.DatabaseConnection.lambd

[jira] [Reopened] (JENA-2225) TDB/TDB2 dataset size stat serialized incorrectly for large datasets

2022-03-03 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne reopened JENA-2225:
-

> TDB/TDB2 dataset size stat serialized incorrectly for large datasets
> 
>
> Key: JENA-2225
> URL: https://issues.apache.org/jira/browse/JENA-2225
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB, TDB2
>Affects Versions: Jena 4.3.1
>Reporter: Lorenz Bühmann
>Assignee: Andy Seaborne
>Priority: Minor
> Fix For: Jena 4.4.0
>
>
> When computing the TDB/TDB2 stats via CLI the size will be serialized 
> incorrectly for large datasets.
> For example for latest Wikidata Truthy we get
> {noformat}
> (count -1983667112)){noformat}
> This happens because for both the corresponding `Stats.java` class does 
> enforce an Integer type Node though the value is a long type:
> {code:java}
> if ( count >= 0 )
> addPair(meta.getList(), StatsMatcher.COUNT, 
> NodeFactoryExtra.intToNode((int)count)) ; {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2297) Change configuration files to YAML or other similarly simple syntax

2022-03-03 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500650#comment-17500650
 ] 

Andy Seaborne commented on JENA-2297:
-

Turtle has comments.

The configuration file can be in any RDF syntax - including JSON-LD (which 
technically is YAML). An {{@context}} would be nice to have _if_ someone wishes 
to propose and contribute one.

There are some RDF-in-YAML systems so you can write in YAML and translate to 
another RDF syntax. Try one of those.

There is a lot of RDF-based code behind configuration - 1000's of lines of code.

You can even have the server write configuration files for you so you are never 
starting from scratch (see your stackoverflow answer.)

A YAML overlay that translated to RDF for interpretation would be practical and 
it could be a subset of of the expressivity if someone wishes to make a 
concrete proposal and contribute.

It will have to address the needs of Fuseki configuration - multiple 
independent subsystems like jena-text, inference, data access control. They use 
the ability of RDF to have data from different places without clashes.

YAML has its own peculiarities.

 

> Change configuration files to YAML or other similarly simple syntax
> ---
>
> Key: JENA-2297
> URL: https://issues.apache.org/jira/browse/JENA-2297
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Fuseki
>Reporter: Wolfgang Fahl
>Priority: Minor
>
> As outlined in 
> [https://stackoverflow.com/questions/63874908/fuseki-configuration] i never 
> understood the turtle configuration files. RDF syntax is just to outwordly to 
> beginners. 
> I suggest to not go with ttl files as the only option for the configuration 
> since these files are not self explaining. In YAML e.g. you can have proper 
> comments. Converting the YAML Files to ttl for internal use is IMHO fine. 
> It's just for better readability.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (JENA-2299) RDFWriter: Rename/migrate from create(...) to source(...)

2022-03-02 Thread Andy Seaborne (Jira)
Andy Seaborne created JENA-2299:
---

 Summary: RDFWriter: Rename/migrate from create(...) to source(...)
 Key: JENA-2299
 URL: https://issues.apache.org/jira/browse/JENA-2299
 Project: Apache Jena
  Issue Type: Improvement
  Components: RIOT
Affects Versions: Jena 4.4.0
Reporter: Andy Seaborne
Assignee: Andy Seaborne
 Fix For: Jena 4.5.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (JENA-2298) Add convenience methods to RDFParser to directly read into a fresh model or dataset

2022-03-02 Thread Andy Seaborne (Jira)
Andy Seaborne created JENA-2298:
---

 Summary: Add convenience methods to RDFParser to directly read 
into a fresh model or dataset
 Key: JENA-2298
 URL: https://issues.apache.org/jira/browse/JENA-2298
 Project: Apache Jena
  Issue Type: Improvement
  Components: RIOT
Affects Versions: Jena 4.4.0
Reporter: Andy Seaborne
Assignee: Andy Seaborne
 Fix For: Jena 4.5.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-2290) GraphRDFS doesn't implement contains

2022-03-01 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2290.
-
Resolution: Fixed

> GraphRDFS doesn't implement contains
> 
>
> Key: JENA-2290
> URL: https://issues.apache.org/jira/browse/JENA-2290
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 4.5.0
>
>
> While trying to use the RDFS dataset to use light-weight reasoning I 
> recognized that the contains method isn't implemented properly?
> I can't say if this holds for all contains calls as there is some contains 
> method directly in the dataset.
> But the following path is what I got trouble with:
> Given {{D}} being a {{DatasetGraphRDFS}} and then getting the named model 
> {{M}} for a particular graph which in fact is then backed by a {{GraphRDFS}} 
> instance {{G}} this {{G}} doesn't seem to make use of the inferred triples 
> when we call {{{}contains{}}}. This method is still only implemented in the 
> {{GraphWrapper}} superclass and doesn't make use of the overridden {{find}} 
> method.
> Don't what would be best place to implement it. Sure, we could make use of 
> {{find}} directly in the {{GraphRDFS}} class, e.g.
> {code:java}
> @Override
> public boolean contains(Node s, Node p, Node o) {
> return find(s, p, o).hasNext();
> }
> @Override
> public boolean contains(Triple t) {
> return contains(t.getSubject(), t.getPredicate(), t.getObject());
> } {code}
> But I'm wondering about efficiency as I don't know how efficient the 
> inference streams are built. I mean we could maybe terminate way earlier in 
> the {{MatchRDFS}} but it would of course lead to some more lines of code.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2293) SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named graph

2022-02-28 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498881#comment-17498881
 ] 

Andy Seaborne commented on JENA-2293:
-

Are there any in the Jena codebase? (I tried others and didn't find any.)

A related contract changed to "auto-create" a long time ago and 
{{DatasetGraphOne}}, which is an adapter, is a outlier. The right answer maybe 
{{unsupportedMethod}} like {{DatasetGraphOne.addGraph}}.

Maybe the javadoc of {{DatasetGraph.getGraph}} is out of date.

(yes, update should be made robust but the original description claimed a 
universal problem.)

> SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named graph
> 
>
> Key: JENA-2293
> URL: https://issues.apache.org/jira/browse/JENA-2293
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Richard Cyganiak
>Assignee: Andy Seaborne
>Priority: Major
> Attachments: SPARQLUpdateTest.java
>
>
> When executing SPARQL Update requests against a dataset that does not 
> automatically create new graphs, COPY or MOVE operations with a non-existing 
> graph as the target will result in a NullPointerException. The same happens 
> when adding SILENT.
> I would expect these requests to result in UpdateExceptions, with a message 
> pointing out the non-existing graph. Or nothing in the case of SILENT.
> The attached JUnit 4 test demonstrates this by running queries against a 
> DatasetGraphOne:
> COPY DEFAULT TO 
> COPY SILENT DEFAULT TO 
> MOVE DEFAULT TO 
> MOVE SILENT DEFAULT TO 
> Each request produces an NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2293) SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named graph

2022-02-27 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498590#comment-17498590
 ] 

Andy Seaborne commented on JENA-2293:
-

The original title:
"SPARQL Update: NPE when COPY or MOVE to non-existing graph"
changed to 
"SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named graph"

To anyone reading this:

This is specific to {{DatasetGraphOne}} and its javadoc states that named graph 
can not be added. 

"CREATE" will fail with an UnsupportedOperationException on {{DatasetGraphOne}}.

A dataset created with {{DatasetFactory.create(graph)} works and does 
automatically create graphs as needed.




> SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named graph
> 
>
> Key: JENA-2293
> URL: https://issues.apache.org/jira/browse/JENA-2293
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Richard Cyganiak
>Assignee: Andy Seaborne
>Priority: Major
> Attachments: SPARQLUpdateTest.java
>
>
> When executing SPARQL Update requests against a dataset that does not 
> automatically create new graphs, COPY or MOVE operations with a non-existing 
> graph as the target will result in a NullPointerException. The same happens 
> when adding SILENT.
> I would expect these requests to result in UpdateExceptions, with a message 
> pointing out the non-existing graph. Or nothing in the case of SILENT.
> The attached JUnit 4 test demonstrates this by running queries against a 
> DatasetGraphOne:
> COPY DEFAULT TO 
> COPY SILENT DEFAULT TO 
> MOVE DEFAULT TO 
> MOVE SILENT DEFAULT TO 
> Each request produces an NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (JENA-2293) SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named graph

2022-02-27 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-2293:

Summary: SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named 
graph  (was: SPARQL Update: NPE when COPY or MOVE to non-existing graph)

> SPARQL Update: DatasetGraphOne: NPE when COPY or MOVE to named graph
> 
>
> Key: JENA-2293
> URL: https://issues.apache.org/jira/browse/JENA-2293
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Richard Cyganiak
>Assignee: Andy Seaborne
>Priority: Major
> Attachments: SPARQLUpdateTest.java
>
>
> When executing SPARQL Update requests against a dataset that does not 
> automatically create new graphs, COPY or MOVE operations with a non-existing 
> graph as the target will result in a NullPointerException. The same happens 
> when adding SILENT.
> I would expect these requests to result in UpdateExceptions, with a message 
> pointing out the non-existing graph. Or nothing in the case of SILENT.
> The attached JUnit 4 test demonstrates this by running queries against a 
> DatasetGraphOne:
> COPY DEFAULT TO 
> COPY SILENT DEFAULT TO 
> MOVE DEFAULT TO 
> MOVE SILENT DEFAULT TO 
> Each request produces an NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (JENA-2293) SPARQL Update: NPE when COPY or MOVE to non-existing graph

2022-02-27 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne reassigned JENA-2293:
---

Assignee: Andy Seaborne

> SPARQL Update: NPE when COPY or MOVE to non-existing graph
> --
>
> Key: JENA-2293
> URL: https://issues.apache.org/jira/browse/JENA-2293
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Richard Cyganiak
>Assignee: Andy Seaborne
>Priority: Major
> Attachments: SPARQLUpdateTest.java
>
>
> When executing SPARQL Update requests against a dataset that does not 
> automatically create new graphs, COPY or MOVE operations with a non-existing 
> graph as the target will result in a NullPointerException. The same happens 
> when adding SILENT.
> I would expect these requests to result in UpdateExceptions, with a message 
> pointing out the non-existing graph. Or nothing in the case of SILENT.
> The attached JUnit 4 test demonstrates this by running queries against a 
> DatasetGraphOne:
> COPY DEFAULT TO 
> COPY SILENT DEFAULT TO 
> MOVE DEFAULT TO 
> MOVE SILENT DEFAULT TO 
> Each request produces an NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (JENA-2294) tdb2.xloader creates invalid database - later update causes wrong answersfails.

2022-02-27 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-2294:

Summary: tdb2.xloader creates invalid database - later update causes wrong 
answersfails.  (was: tdb2.xloader creates invalid database - later update 
fails.)

> tdb2.xloader creates invalid database - later update causes wrong 
> answersfails.
> ---
>
> Key: JENA-2294
> URL: https://issues.apache.org/jira/browse/JENA-2294
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>
> [Report from 
> users@|https://lists.apache.org/thread/lxwcolfowh29nbc79cq867jq051sf2nh].
> Recreate with: 
> {noformat}
> rm -rf BSBM
> xloader --loc BSBM ~/Datasets/BSBM/bsbm-50k.nt.gz
> tdb2.tdbquery --loc BSBM/ --file T.rq
> tdb2.tdbloader --loader=basic --loc BSBM/ X.nt 
> tdb2.tdbquery --loc BSBM/ --file T.rq
> {noformat}
> where
> {noformat}
> ==> X.nt <==
>    .
> ==> T.rq <==
> SELECT (count(?x) AS ?C) {
>   ?x a ?T .
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (JENA-2294) tdb2.xloader creates invalid database - later update causes wrong answers.

2022-02-27 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-2294:

Summary: tdb2.xloader creates invalid database - later update causes wrong 
answers.  (was: tdb2.xloader creates invalid database - later update causes 
wrong answersfails.)

> tdb2.xloader creates invalid database - later update causes wrong answers.
> --
>
> Key: JENA-2294
> URL: https://issues.apache.org/jira/browse/JENA-2294
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>
> [Report from 
> users@|https://lists.apache.org/thread/lxwcolfowh29nbc79cq867jq051sf2nh].
> Recreate with: 
> {noformat}
> rm -rf BSBM
> xloader --loc BSBM ~/Datasets/BSBM/bsbm-50k.nt.gz
> tdb2.tdbquery --loc BSBM/ --file T.rq
> tdb2.tdbloader --loader=basic --loc BSBM/ X.nt 
> tdb2.tdbquery --loc BSBM/ --file T.rq
> {noformat}
> where
> {noformat}
> ==> X.nt <==
>    .
> ==> T.rq <==
> SELECT (count(?x) AS ?C) {
>   ?x a ?T .
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2294) tdb2.xloader creates invalid database - later update fails.

2022-02-27 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498580#comment-17498580
 ] 

Andy Seaborne commented on JENA-2294:
-

It is also possible to cause a stack overflow with different data added to a 
250k load at the {{tdb2.tdbloader --loader=basic}} step:

{noformat}
Exception in thread "main" java.lang.StackOverflowError
    at java.base/java.nio.IntBuffer.limit(IntBuffer.java:1529)
    at java.base/java.nio.IntBuffer.limit(IntBuffer.java:267)
    at java.base/java.nio.Buffer.(Buffer.java:245)
    at java.base/java.nio.IntBuffer.(IntBuffer.java:288)
    at java.base/java.nio.IntBuffer.(IntBuffer.java:296)
    at java.base/java.nio.DirectIntBufferS.(DirectIntBufferS.java:208)
    at 
java.base/java.nio.DirectByteBuffer.asIntBuffer(DirectByteBuffer.java:761)
    at org.apache.jena.dboe.base.buffer.PtrBuffer.(PtrBuffer.java:41)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNodeMgr.formatBPTreeNode(BPTreeNodeMgr.java:209)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNodeMgr.overlay(BPTreeNodeMgr.java:159)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNodeMgr$Block2BPTreeNode.fromBlock(BPTreeNodeMgr.java:104)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNodeMgr$Block2BPTreeNode.fromBlock(BPTreeNodeMgr.java:1)
    at 
org.apache.jena.dboe.base.page.PageBlockMgr.getRead$(PageBlockMgr.java:116)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNodeMgr.getRead(BPTreeNodeMgr.java:66)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNodeMgr.getRead(BPTreeNodeMgr.java:1)
    at org.apache.jena.dboe.trans.bplustree.BPTreeNode.get(BPTreeNode.java:160)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:501)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522)
    at 
org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522)
    ...
{noformat}

> tdb2.xloader creates invalid database - later update fails.
> ---
>
> Key: JENA-2294
> URL: https://issues.apache.org/jira/browse/JENA-2294
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>
> [Report from 
> users@|https://lists.apache.org/thread/lxwcolfowh29nbc79cq867jq051sf2nh].
> Recreate with: 
> {noformat}
> rm -rf BSBM
> xloader --loc BSBM ~/Datasets/BSBM/bsbm-50k.nt.gz
> tdb2.tdbquery --loc BSBM/ --file T.rq
> tdb2.tdbloader --loader=basic --loc BSBM/ X.nt 
> tdb2.tdbquery --loc BSBM/ --file T.rq
> {noformat}
> where
> {noformat}
> ==> X.nt <==
>    .
> ==> T.rq <==
> SELECT (count(?x) AS ?C) {
>   ?x a ?T .
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (JENA-2294) tdb2.xloader creates invalid database - later update fails.

2022-02-27 Thread Andy Seaborne (Jira)
Andy Seaborne created JENA-2294:
---

 Summary: tdb2.xloader creates invalid database - later update 
fails.
 Key: JENA-2294
 URL: https://issues.apache.org/jira/browse/JENA-2294
 Project: Apache Jena
  Issue Type: Bug
  Components: TDB2
Affects Versions: Jena 4.4.0
Reporter: Andy Seaborne
Assignee: Andy Seaborne


[Report from 
users@|https://lists.apache.org/thread/lxwcolfowh29nbc79cq867jq051sf2nh].

Recreate with: 
{noformat}
rm -rf BSBM
xloader --loc BSBM ~/Datasets/BSBM/bsbm-50k.nt.gz
tdb2.tdbquery --loc BSBM/ --file T.rq
tdb2.tdbloader --loader=basic --loc BSBM/ X.nt 
tdb2.tdbquery --loc BSBM/ --file T.rq
{noformat}
where
{noformat}
==> X.nt <==
   .

==> T.rq <==
SELECT (count(?x) AS ?C) {
  ?x a ?T .
}
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-2291) Avoid core FileManager/LocationMapper initialization if replaced

2022-02-25 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2291.
-
Resolution: Done

> Avoid core FileManager/LocationMapper initialization if replaced
> 
>
> Key: JENA-2291
> URL: https://issues.apache.org/jira/browse/JENA-2291
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Ontology API, RIOT
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 4.5.0
>
>
> Records:
> https://stackoverflow.com/questions/71157639/block-implicit-location-mapping-loading-in-jena



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (JENA-2292) WKT literal (in)equality check fails

2022-02-25 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-2292.
---

> WKT literal (in)equality check fails 
> -
>
> Key: JENA-2292
> URL: https://issues.apache.org/jira/browse/JENA-2292
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Priority: Major
>
> The inequality check of WKT literals fails in filter expression:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   VALUES ?wkt1 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   VALUES ?wkt2 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   FILTER(?wkt1 != ?wkt2)
> } 
> {code}
> Equality check on the other hand works:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   VALUES ?wkt1 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   VALUES ?wkt2 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   FILTER(?wkt1 = ?wkt2)
> } 
> {code}
> I don't know if this is intended by SPARQL itself for non-standard datatypes, 
> while checking code it goes into
> {code:java}
> case VSPACE_UNKNOWN:
> {
> // One or two unknown value spaces, or one has a lang tag (but not both).
> Node node1 = nv1.asNode() ;
> Node node2 = nv2.asNode() ;
> if ( ! SystemARQ.ValueExtensions )
> // No value extensions => raw rdfTermEquals
> return NodeFunctions.rdfTermEquals(node1, node2) ;
> // Some "value spaces" are know to be not equal (no overlap).
> // Like one literal with a language tag, and one without can't be sameAs.
> if ( ! node1.isLiteral() || ! node2.isLiteral() )
> // Can't both be non-literals - that's VSPACE_NODE
> // One or other not a literal => not sameAs
> return false ;
> // Two literals at this point.
> if ( NodeFunctions.sameTerm(node1, node2) )
> return true ;
> if ( ! node1.getLiteralLanguage().equals("") ||
>  ! node2.getLiteralLanguage().equals("") )
> // One had lang tag but weren't sameNode => not equals
> return false ;
> raise(new ExprEvalException("Unknown equality test: "+nv1+" and "+nv2)) ;
> throw new ARQInternalErrorException("raise returned (sameValueAs)") ;
> } {code}
> and here indeed {{NodeFunctions.sameTerm(node1, node2)}} returns false but 
> this isn't forwarded as return value but instead an exception is thrown, 
> thus, inequality check in a filter always leads to error and thus false.
> It also will fail for equality check of non equal WKT literals of course, all 
> cases here:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral = 
> "Point(11.4167 53.6333)"^^geo:wktLiteral AS ?equal_true)
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral = 
> "Point(11.575 48.1375)"^^geo:wktLiteral AS ?equal_false)
>   
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral != 
> "Point(11.575 48.1375)"^^geo:wktLiteral AS ?not_equal_true)
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral != 
> "Point(11.4167 53.6333)"^^geo:wktLiteral AS ?not_equal_false)
> } 
> {code}
> Result:
> {noformat}
> ++-++-+
> | equal_true | equal_false | not_equal_true | not_equal_false |
> ++-++-+
> | true       |             |                | false           |
> ++-++-+{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-2292) WKT literal (in)equality check fails

2022-02-25 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2292.
-
Resolution: Information Provided

> WKT literal (in)equality check fails 
> -
>
> Key: JENA-2292
> URL: https://issues.apache.org/jira/browse/JENA-2292
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Priority: Major
>
> The inequality check of WKT literals fails in filter expression:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   VALUES ?wkt1 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   VALUES ?wkt2 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   FILTER(?wkt1 != ?wkt2)
> } 
> {code}
> Equality check on the other hand works:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   VALUES ?wkt1 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   VALUES ?wkt2 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   FILTER(?wkt1 = ?wkt2)
> } 
> {code}
> I don't know if this is intended by SPARQL itself for non-standard datatypes, 
> while checking code it goes into
> {code:java}
> case VSPACE_UNKNOWN:
> {
> // One or two unknown value spaces, or one has a lang tag (but not both).
> Node node1 = nv1.asNode() ;
> Node node2 = nv2.asNode() ;
> if ( ! SystemARQ.ValueExtensions )
> // No value extensions => raw rdfTermEquals
> return NodeFunctions.rdfTermEquals(node1, node2) ;
> // Some "value spaces" are know to be not equal (no overlap).
> // Like one literal with a language tag, and one without can't be sameAs.
> if ( ! node1.isLiteral() || ! node2.isLiteral() )
> // Can't both be non-literals - that's VSPACE_NODE
> // One or other not a literal => not sameAs
> return false ;
> // Two literals at this point.
> if ( NodeFunctions.sameTerm(node1, node2) )
> return true ;
> if ( ! node1.getLiteralLanguage().equals("") ||
>  ! node2.getLiteralLanguage().equals("") )
> // One had lang tag but weren't sameNode => not equals
> return false ;
> raise(new ExprEvalException("Unknown equality test: "+nv1+" and "+nv2)) ;
> throw new ARQInternalErrorException("raise returned (sameValueAs)") ;
> } {code}
> and here indeed {{NodeFunctions.sameTerm(node1, node2)}} returns false but 
> this isn't forwarded as return value but instead an exception is thrown, 
> thus, inequality check in a filter always leads to error and thus false.
> It also will fail for equality check of non equal WKT literals of course, all 
> cases here:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral = 
> "Point(11.4167 53.6333)"^^geo:wktLiteral AS ?equal_true)
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral = 
> "Point(11.575 48.1375)"^^geo:wktLiteral AS ?equal_false)
>   
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral != 
> "Point(11.575 48.1375)"^^geo:wktLiteral AS ?not_equal_true)
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral != 
> "Point(11.4167 53.6333)"^^geo:wktLiteral AS ?not_equal_false)
> } 
> {code}
> Result:
> {noformat}
> ++-++-+
> | equal_true | equal_false | not_equal_true | not_equal_false |
> ++-++-+
> | true       |             |                | false           |
> ++-++-+{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2292) WKT literal (in)equality check fails

2022-02-25 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498277#comment-17498277
 ] 

Andy Seaborne commented on JENA-2292:
-

{quote}Minor question: an extension like GeoSPARQL could introduce a value 
space 
{quote}
ARQ does not have an easy way to add value spaces. Geo is particularly 
difficult because it's 2D.

Adding custom functions to SPARQL such as accessors for the coordinates of a 
point is one way to go.

> WKT literal (in)equality check fails 
> -
>
> Key: JENA-2292
> URL: https://issues.apache.org/jira/browse/JENA-2292
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Priority: Major
>
> The inequality check of WKT literals fails in filter expression:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   VALUES ?wkt1 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   VALUES ?wkt2 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   FILTER(?wkt1 != ?wkt2)
> } 
> {code}
> Equality check on the other hand works:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   VALUES ?wkt1 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   VALUES ?wkt2 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   FILTER(?wkt1 = ?wkt2)
> } 
> {code}
> I don't know if this is intended by SPARQL itself for non-standard datatypes, 
> while checking code it goes into
> {code:java}
> case VSPACE_UNKNOWN:
> {
> // One or two unknown value spaces, or one has a lang tag (but not both).
> Node node1 = nv1.asNode() ;
> Node node2 = nv2.asNode() ;
> if ( ! SystemARQ.ValueExtensions )
> // No value extensions => raw rdfTermEquals
> return NodeFunctions.rdfTermEquals(node1, node2) ;
> // Some "value spaces" are know to be not equal (no overlap).
> // Like one literal with a language tag, and one without can't be sameAs.
> if ( ! node1.isLiteral() || ! node2.isLiteral() )
> // Can't both be non-literals - that's VSPACE_NODE
> // One or other not a literal => not sameAs
> return false ;
> // Two literals at this point.
> if ( NodeFunctions.sameTerm(node1, node2) )
> return true ;
> if ( ! node1.getLiteralLanguage().equals("") ||
>  ! node2.getLiteralLanguage().equals("") )
> // One had lang tag but weren't sameNode => not equals
> return false ;
> raise(new ExprEvalException("Unknown equality test: "+nv1+" and "+nv2)) ;
> throw new ARQInternalErrorException("raise returned (sameValueAs)") ;
> } {code}
> and here indeed {{NodeFunctions.sameTerm(node1, node2)}} returns false but 
> this isn't forwarded as return value but instead an exception is thrown, 
> thus, inequality check in a filter always leads to error and thus false.
> It also will fail for equality check of non equal WKT literals of course, all 
> cases here:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral = 
> "Point(11.4167 53.6333)"^^geo:wktLiteral AS ?equal_true)
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral = 
> "Point(11.575 48.1375)"^^geo:wktLiteral AS ?equal_false)
>   
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral != 
> "Point(11.575 48.1375)"^^geo:wktLiteral AS ?not_equal_true)
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral != 
> "Point(11.4167 53.6333)"^^geo:wktLiteral AS ?not_equal_false)
> } 
> {code}
> Result:
> {noformat}
> ++-++-+
> | equal_true | equal_false | not_equal_true | not_equal_false |
> ++-++-+
> | true       |             |                | false           |
> ++-++-+{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (JENA-2292) WKT literal (in)equality check fails

2022-02-25 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-2292:

Component/s: (was: GeoSPARQL)

> WKT literal (in)equality check fails 
> -
>
> Key: JENA-2292
> URL: https://issues.apache.org/jira/browse/JENA-2292
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Priority: Major
>
> The inequality check of WKT literals fails in filter expression:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   VALUES ?wkt1 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   VALUES ?wkt2 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   FILTER(?wkt1 != ?wkt2)
> } 
> {code}
> Equality check on the other hand works:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   VALUES ?wkt1 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   VALUES ?wkt2 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   FILTER(?wkt1 = ?wkt2)
> } 
> {code}
> I don't know if this is intended by SPARQL itself for non-standard datatypes, 
> while checking code it goes into
> {code:java}
> case VSPACE_UNKNOWN:
> {
> // One or two unknown value spaces, or one has a lang tag (but not both).
> Node node1 = nv1.asNode() ;
> Node node2 = nv2.asNode() ;
> if ( ! SystemARQ.ValueExtensions )
> // No value extensions => raw rdfTermEquals
> return NodeFunctions.rdfTermEquals(node1, node2) ;
> // Some "value spaces" are know to be not equal (no overlap).
> // Like one literal with a language tag, and one without can't be sameAs.
> if ( ! node1.isLiteral() || ! node2.isLiteral() )
> // Can't both be non-literals - that's VSPACE_NODE
> // One or other not a literal => not sameAs
> return false ;
> // Two literals at this point.
> if ( NodeFunctions.sameTerm(node1, node2) )
> return true ;
> if ( ! node1.getLiteralLanguage().equals("") ||
>  ! node2.getLiteralLanguage().equals("") )
> // One had lang tag but weren't sameNode => not equals
> return false ;
> raise(new ExprEvalException("Unknown equality test: "+nv1+" and "+nv2)) ;
> throw new ARQInternalErrorException("raise returned (sameValueAs)") ;
> } {code}
> and here indeed {{NodeFunctions.sameTerm(node1, node2)}} returns false but 
> this isn't forwarded as return value but instead an exception is thrown, 
> thus, inequality check in a filter always leads to error and thus false.
> It also will fail for equality check of non equal WKT literals of course, all 
> cases here:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral = 
> "Point(11.4167 53.6333)"^^geo:wktLiteral AS ?equal_true)
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral = 
> "Point(11.575 48.1375)"^^geo:wktLiteral AS ?equal_false)
>   
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral != 
> "Point(11.575 48.1375)"^^geo:wktLiteral AS ?not_equal_true)
>   BIND("Point(11.4167 53.6333)"^^geo:wktLiteral != 
> "Point(11.4167 53.6333)"^^geo:wktLiteral AS ?not_equal_false)
> } 
> {code}
> Result:
> {noformat}
> ++-++-+
> | equal_true | equal_false | not_equal_true | not_equal_false |
> ++-++-+
> | true       |             |                | false           |
> ++-++-+{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2292) WKT literal (in)equality check fails

2022-02-25 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498046#comment-17498046
 ] 

Andy Seaborne commented on JENA-2292:
-

The results are correct. It is a matter of what the engine knows just because 
something is a datatype.

This is nothing to do with {{geo:wktLiteral}} so let's use datatype 
{{{}:dtype{}}}.

A datatype is a mapping function from the lexical space to a value space.

If the SPARQL evaluator doesn't know something to be definitely true, it raises 
an error for "unknown", and does not return false. (RDF open world assumption).

For a datatype we don't understand, we can still know that two things are 
(representations of) the same value if the lexical forms are the same.
But 

Returning "false" for different lexical forms may be wrong - "123" and 
"+000123" are the same value as integers but different lexical forms.

The rule is that if the LHS and RHS are the same string we know they must be 
same value, so "=" is true and "!=" is false. Otherwise we don't know and it is 
an error for "unknown" because the whole expression if now uncertain. (And 
before anyone mentions it, not all things in SPARQL expressions are proper 
functions. {{{}COALESCE{}}}, {{IF}} and {{BOUND}} aren't functions because they 
don't evaluate their arguments before calling the function itself; they operate 
on un-evaluated expressions.

 
{noformat}
"abc"^^:dtype = "abc"^^:dtype
{noformat}
True. Same lexical forms mean same value.

 
{noformat}
"abc"^^:dtype = "def"^^:dtype
{noformat}
Unknown. Different lexical forms, no idea.

 
{noformat}
"abc"^^:dtype != "abc"^^:dtype
{noformat}
must be false. Must be same value so "!=" is false.

 
{noformat}
"abc"^^:dtype != "def"^^:dtype
{noformat}
Unknown. Different lexical forms, no idea.

 

c.f.
{noformat}
PREFIX : 

SELECT * {
   BIND ("abc"^^:dtype AS ?a)
   BIND ("def"^^:dtype AS ?b)
   BIND (COALESCE(?a != ?b, false) AS ?X)
}
{noformat}

> WKT literal (in)equality check fails 
> -
>
> Key: JENA-2292
> URL: https://issues.apache.org/jira/browse/JENA-2292
> Project: Apache Jena
>  Issue Type: Bug
>  Components: GeoSPARQL
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Priority: Major
>
> The inequality check of WKT literals fails in filter expression:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   VALUES ?wkt1 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   VALUES ?wkt2 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   FILTER(?wkt1 != ?wkt2)
> } 
> {code}
> Equality check on the other hand works:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   VALUES ?wkt1 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   VALUES ?wkt2 {"Point(11.4167 53.6333)"^^geo:wktLiteral 
> "Point(11.575 48.1375)"^^geo:wktLiteral}
>   FILTER(?wkt1 = ?wkt2)
> } 
> {code}
> I don't know if this is intended by SPARQL itself for non-standard datatypes, 
> while checking code it goes into
> {code:java}
> case VSPACE_UNKNOWN:
> {
> // One or two unknown value spaces, or one has a lang tag (but not both).
> Node node1 = nv1.asNode() ;
> Node node2 = nv2.asNode() ;
> if ( ! SystemARQ.ValueExtensions )
> // No value extensions => raw rdfTermEquals
> return NodeFunctions.rdfTermEquals(node1, node2) ;
> // Some "value spaces" are know to be not equal (no overlap).
> // Like one literal with a language tag, and one without can't be sameAs.
> if ( ! node1.isLiteral() || ! node2.isLiteral() )
> // Can't both be non-literals - that's VSPACE_NODE
> // One or other not a literal => not sameAs
> return false ;
> // Two literals at this point.
> if ( NodeFunctions.sameTerm(node1, node2) )
> return true ;
> if ( ! node1.getLiteralLanguage().equals("") ||
>  ! node2.getLiteralLanguage().equals("") )
> // One had lang tag but weren't sameNode => not equals
> return false ;
> raise(new ExprEvalException("Unknown equality test: "+nv1+" and "+nv2)) ;
> throw new ARQInternalErrorException("raise returned (sameValueAs)") ;
> } {code}
> and here indeed {{NodeFunctions.sameTerm(node1, node2)}} returns false but 
> this isn't forwarded as return value but instead an exception is thrown, 
> thus, inequality check in a filter always leads to error and thus false.
> It also will fail for equality check of non equal WKT literals of course, all 
> cases here:
> {code:sql}
> PREFIX geo: 
> SELECT * {
>   BIND("Point(11.4167 53.6333)"^^

[jira] [Updated] (JENA-2290) GraphRDFS doesn't implement contains

2022-02-24 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-2290:

Fix Version/s: Jena 4.5.0

> GraphRDFS doesn't implement contains
> 
>
> Key: JENA-2290
> URL: https://issues.apache.org/jira/browse/JENA-2290
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 4.5.0
>
>
> While trying to use the RDFS dataset to use light-weight reasoning I 
> recognized that the contains method isn't implemented properly?
> I can't say if this holds for all contains calls as there is some contains 
> method directly in the dataset.
> But the following path is what I got trouble with:
> Given {{D}} being a {{DatasetGraphRDFS}} and then getting the named model 
> {{M}} for a particular graph which in fact is then backed by a {{GraphRDFS}} 
> instance {{G}} this {{G}} doesn't seem to make use of the inferred triples 
> when we call {{{}contains{}}}. This method is still only implemented in the 
> {{GraphWrapper}} superclass and doesn't make use of the overridden {{find}} 
> method.
> Don't what would be best place to implement it. Sure, we could make use of 
> {{find}} directly in the {{GraphRDFS}} class, e.g.
> {code:java}
> @Override
> public boolean contains(Node s, Node p, Node o) {
> return find(s, p, o).hasNext();
> }
> @Override
> public boolean contains(Triple t) {
> return contains(t.getSubject(), t.getPredicate(), t.getObject());
> } {code}
> But I'm wondering about efficiency as I don't know how efficient the 
> inference streams are built. I mean we could maybe terminate way earlier in 
> the {{MatchRDFS}} but it would of course lead to some more lines of code.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2279) tdb2.xloader fails with "Can't find gzip program" error

2022-02-24 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497781#comment-17497781
 ] 

Andy Seaborne commented on JENA-2279:
-

[~LorenzB] -- fixed (I hope!) + a fix for very small files (less than 250K 
triples).

> tdb2.xloader fails with "Can't find gzip program" error
> ---
>
> Key: JENA-2279
> URL: https://issues.apache.org/jira/browse/JENA-2279
> Project: Apache Jena
>  Issue Type: Bug
>Reporter: Sivaram Kalidas
>Assignee: Andy Seaborne
>Priority: Critical
> Fix For: Jena 4.5.0
>
>
> The function 
> [https://github.com/apache/jena/blob/6d15fee37b3639e723b9e6da0a173879a36e853b/jena-db/jena-tdb2/src/main/java/org/apache/jena/tdb2/xloader/BulkLoaderX.java#L76]
> Always returns false.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (JENA-2288) Counting aggregation inside SERVICE provides wrong result

2022-02-24 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-2288.
---

> Counting aggregation inside SERVICE provides wrong result
> -
>
> Key: JENA-2288
> URL: https://issues.apache.org/jira/browse/JENA-2288
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Assignee: Andy Seaborne
>Priority: Major
>
> Here is a query which retrieves museums in the specific city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>     
>   SERVICE  {
>     {
>       select ?wikidata_iri ?museum
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       }
>     }
>   }
> } {code}
> This query returns 3 results:
> |||
> |||
> |||
> And here is a query which is supposed to count the number of the same museums 
> in the same city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum_count_in_city
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>   
>   SERVICE  {
>     {
>       select ?wikidata_iri (COUNT(?museum) as ?museum_count_in_city)
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       } group by ?wikidata_iri
>     }
>   }
> }{code}
> But the count value produced by the query is wrong:
> |<[http://www.wikidata.org/entity/Q612]>|"201"{^}^^<[http://www.w3.org/2001/XMLSchema#integer]>{^}|
> It outputs *201* instead of expected *3.*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (JENA-2288) Counting aggregation inside SERVICE provides wrong result

2022-02-24 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497039#comment-17497039
 ] 

Andy Seaborne edited comment on JENA-2288 at 2/24/22, 1:45 PM:
---

[~LorenzB]'s analysis looks right.

I can't reproduce this using Fuseki as the SERVICE target, and looking at the 
execution, I don't see a way for the grand total count to leak back to the 
outer query as it is never calculated. Maybe it's the duplicates for the path 
confusing the far end. 

This ticket is "Cannot reproduce". Please reopen if you have an example that 
does not depend on  the characteristics of another system.



was (Author: andy.seaborne):
[~LorenzB]'s analysis looks right.

I can't reproduce this using Fuseki as the SERVICE target, and looking at the 
execution, I don't see a way for the grand total count to leak back to the 
outer query as it is never calculated. Maybe it's the duplicates for the path 
confusing the far end. 

This ticket is "Cannot reproduce". Please reopen if you have a example that 
does not depend on  the characteristics of another system.


> Counting aggregation inside SERVICE provides wrong result
> -
>
> Key: JENA-2288
> URL: https://issues.apache.org/jira/browse/JENA-2288
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Assignee: Andy Seaborne
>Priority: Major
>
> Here is a query which retrieves museums in the specific city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>     
>   SERVICE  {
>     {
>       select ?wikidata_iri ?museum
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       }
>     }
>   }
> } {code}
> This query returns 3 results:
> |||
> |||
> |||
> And here is a query which is supposed to count the number of the same museums 
> in the same city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum_count_in_city
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>   
>   SERVICE  {
>     {
>       select ?wikidata_iri (COUNT(?museum) as ?museum_count_in_city)
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       } group by ?wikidata_iri
>     }
>   }
> }{code}
> But the count value produced by the query is wrong:
> |<[http://www.wikidata.org/entity/Q612]>|"201"{^}^^<[http://www.w3.org/2001/XMLSchema#integer]>{^}|
> It outputs *201* instead of expected *3.*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (JENA-2288) Counting aggregation inside SERVICE provides wrong result

2022-02-24 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497039#comment-17497039
 ] 

Andy Seaborne edited comment on JENA-2288 at 2/24/22, 1:45 PM:
---

[~LorenzB]'s analysis looks right.

I can't reproduce this using Fuseki as the SERVICE target, and looking at the 
execution, I don't see a way for the grand total count to leak back to the 
outer query as it is never calculated. Maybe it's the duplicates for the path 
confusing the far end. 

This ticket is "Cannot reproduce". Please reopen if you have a example that 
does not depend on  the characteristics of another system.



was (Author: andy.seaborne):
[~LorenzB]'s analysis looks right.

I can't reproduce this using Fuseki as the SERVICE target, and looking at the 
execution, I don't see a way for the grand total count to leak back to the 
outer query as it is never calculated. Maybe it's the duplicates for the path 
confuse the far end. 

This ticket is "Cannot reproduce". Please reopen if you have a example that 
does not depend on  the characteristics of another system.


> Counting aggregation inside SERVICE provides wrong result
> -
>
> Key: JENA-2288
> URL: https://issues.apache.org/jira/browse/JENA-2288
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Assignee: Andy Seaborne
>Priority: Major
>
> Here is a query which retrieves museums in the specific city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>     
>   SERVICE  {
>     {
>       select ?wikidata_iri ?museum
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       }
>     }
>   }
> } {code}
> This query returns 3 results:
> |||
> |||
> |||
> And here is a query which is supposed to count the number of the same museums 
> in the same city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum_count_in_city
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>   
>   SERVICE  {
>     {
>       select ?wikidata_iri (COUNT(?museum) as ?museum_count_in_city)
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       } group by ?wikidata_iri
>     }
>   }
> }{code}
> But the count value produced by the query is wrong:
> |<[http://www.wikidata.org/entity/Q612]>|"201"{^}^^<[http://www.w3.org/2001/XMLSchema#integer]>{^}|
> It outputs *201* instead of expected *3.*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (JENA-2288) Counting aggregation inside SERVICE provides wrong result

2022-02-24 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2288.
-
  Assignee: Andy Seaborne
Resolution: Cannot Reproduce

> Counting aggregation inside SERVICE provides wrong result
> -
>
> Key: JENA-2288
> URL: https://issues.apache.org/jira/browse/JENA-2288
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Assignee: Andy Seaborne
>Priority: Major
>
> Here is a query which retrieves museums in the specific city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>     
>   SERVICE  {
>     {
>       select ?wikidata_iri ?museum
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       }
>     }
>   }
> } {code}
> This query returns 3 results:
> |||
> |||
> |||
> And here is a query which is supposed to count the number of the same museums 
> in the same city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum_count_in_city
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>   
>   SERVICE  {
>     {
>       select ?wikidata_iri (COUNT(?museum) as ?museum_count_in_city)
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       } group by ?wikidata_iri
>     }
>   }
> }{code}
> But the count value produced by the query is wrong:
> |<[http://www.wikidata.org/entity/Q612]>|"201"{^}^^<[http://www.w3.org/2001/XMLSchema#integer]>{^}|
> It outputs *201* instead of expected *3.*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (JENA-2288) Counting aggregation inside SERVICE provides wrong result

2022-02-24 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497039#comment-17497039
 ] 

Andy Seaborne edited comment on JENA-2288 at 2/24/22, 1:44 PM:
---

[~LorenzB]'s analysis looks right.

I can't reproduce this using Fuseki as the SERVICE target, and looking at the 
execution, I don't see a way for the grand total count to leak back to the 
outer query as it is never calculated. Maybe it's the duplicates for the path 
confuse the far end. 

This ticket is "Cannot reproduce". Please reopen if you have a example that 
does not depend on  the characteristics of another system.



was (Author: andy.seaborne):
[~LorenzB]'s analysis looks right.

I can't reproduce this using Fuseki as the SERVICE target, and looking at the 
execution, I don't see a way for the grand total count to leak back to the 
outer query as it is never calculated. May be it's the duplicates for the path 
confuse the at the far end. 

This ticket is "Cannot reproduce".


> Counting aggregation inside SERVICE provides wrong result
> -
>
> Key: JENA-2288
> URL: https://issues.apache.org/jira/browse/JENA-2288
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Priority: Major
>
> Here is a query which retrieves museums in the specific city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>     
>   SERVICE  {
>     {
>       select ?wikidata_iri ?museum
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       }
>     }
>   }
> } {code}
> This query returns 3 results:
> |||
> |||
> |||
> And here is a query which is supposed to count the number of the same museums 
> in the same city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum_count_in_city
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>   
>   SERVICE  {
>     {
>       select ?wikidata_iri (COUNT(?museum) as ?museum_count_in_city)
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       } group by ?wikidata_iri
>     }
>   }
> }{code}
> But the count value produced by the query is wrong:
> |<[http://www.wikidata.org/entity/Q612]>|"201"{^}^^<[http://www.w3.org/2001/XMLSchema#integer]>{^}|
> It outputs *201* instead of expected *3.*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2279) tdb2.xloader fails with "Can't find gzip program" error

2022-02-24 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497387#comment-17497387
 ] 

Andy Seaborne commented on JENA-2279:
-

Fix in-progress. 

I only have Ubuntu 21.04, I wanted to make sure there wasn't more cases I 
hadn't come across.

FWIW: It will be in /usr/bin/gzip when the reshuffling works its way through.


> tdb2.xloader fails with "Can't find gzip program" error
> ---
>
> Key: JENA-2279
> URL: https://issues.apache.org/jira/browse/JENA-2279
> Project: Apache Jena
>  Issue Type: Bug
>Reporter: Sivaram Kalidas
>Assignee: Andy Seaborne
>Priority: Critical
> Fix For: Jena 4.5.0
>
>
> The function 
> [https://github.com/apache/jena/blob/6d15fee37b3639e723b9e6da0a173879a36e853b/jena-db/jena-tdb2/src/main/java/org/apache/jena/tdb2/xloader/BulkLoaderX.java#L76]
> Always returns false.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2289) Geospatial index not being loaded with assembler

2022-02-24 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497289#comment-17497289
 ] 

Andy Seaborne commented on JENA-2289:
-

[~LorenzB], did you think you could put in a PR?

> Geospatial index not being loaded with assembler
> 
>
> Key: JENA-2289
> URL: https://issues.apache.org/jira/browse/JENA-2289
> Project: Apache Jena
>  Issue Type: Bug
>  Components: GeoSPARQL
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Priority: Major
>
> Using the GeoSPARQL component via assembler doesn't load the geospatial index 
> from disk and therefore doesn't register it to the context of the dataset 
> which makes querying fail on a second startup.
> In the method {{prepareSpatialExtension}} in {{GeoAssembler}} class we have 
> currently the following cases handled:
> case 1: no file given
> case 2: file given and either doesn't exists yet or empty
> Thus, case 3 is missing:
> case 3: file given and it isn't empty, i.e. the index has been precomputed
> First I thought just ignore the file existence/emptyness check and go with
>  
> {code:java}
> GeoSPARQLConfig.setupSpatialIndex(dataset, spatialIndexPath.toFile());{code}
>  
> But this would lead to an unnecessary retrieval of the SRS occurrences.
> What I did for testing is to add 
> {code:java}
> SpatialIndex si = SpatialIndex.load(spatialIndexPath.toFile());
> SpatialIndex.setSpatialIndex(dataset, si); {code}
> but maybe we want to have this case also encapsulated in the 
> {{GeoSPARQLConfig}} class? Something like
> {code:java}
> GeoSPARQLConfig.setupFromPrecomputedIndex(dataset, 
> spatialIndexPath.toFile());{code}
> maybe?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2279) tdb2.xloader fails with "Can't find gzip program" error

2022-02-24 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497270#comment-17497270
 ] 

Andy Seaborne commented on JENA-2279:
-

What distro are you running, which version and was it upgraded from a previous 
version or a fresh install?

There are changes happening in Ubuntu, and may be other distros, about "/bin" 
and "/usr/bin". 



> tdb2.xloader fails with "Can't find gzip program" error
> ---
>
> Key: JENA-2279
> URL: https://issues.apache.org/jira/browse/JENA-2279
> Project: Apache Jena
>  Issue Type: Bug
>Reporter: Sivaram Kalidas
>Assignee: Andy Seaborne
>Priority: Critical
> Fix For: Jena 4.5.0
>
>
> The function 
> [https://github.com/apache/jena/blob/6d15fee37b3639e723b9e6da0a173879a36e853b/jena-db/jena-tdb2/src/main/java/org/apache/jena/tdb2/xloader/BulkLoaderX.java#L76]
> Always returns false.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2288) Counting aggregation inside SERVICE provides wrong result

2022-02-23 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497039#comment-17497039
 ] 

Andy Seaborne commented on JENA-2288:
-

[~LorenzB]'s analysis looks right.

I can't reproduce this using Fuseki as the SERVICE target, and looking at the 
execution, I don't see a way for the grand total count to leak back to the 
outer query as it is never calculated. May be it's the duplicates for the path 
confuse the at the far end. 

This ticket is "Cannot reproduce".


> Counting aggregation inside SERVICE provides wrong result
> -
>
> Key: JENA-2288
> URL: https://issues.apache.org/jira/browse/JENA-2288
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Priority: Major
>
> Here is a query which retrieves museums in the specific city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>     
>   SERVICE  {
>     {
>       select ?wikidata_iri ?museum
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       }
>     }
>   }
> } {code}
> This query returns 3 results:
> |||
> |||
> |||
> And here is a query which is supposed to count the number of the same museums 
> in the same city:
> {code:java}
> PREFIX wd: 
> PREFIX wdt: 
> SELECT ?wikidata_iri ?museum_count_in_city
> WHERE {
>   VALUES (?wikidata_iri) { () } .
>   
>   SERVICE  {
>     {
>       select ?wikidata_iri (COUNT(?museum) as ?museum_count_in_city)
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       } group by ?wikidata_iri
>     }
>   }
> }{code}
> But the count value produced by the query is wrong:
> |<[http://www.wikidata.org/entity/Q612]>|"201"{^}^^<[http://www.w3.org/2001/XMLSchema#integer]>{^}|
> It outputs *201* instead of expected *3.*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (JENA-2290) GraphRDFS doesn't implement contains

2022-02-22 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496000#comment-17496000
 ] 

Andy Seaborne edited comment on JENA-2290 at 2/22/22, 1:46 PM:
---

Short of writing a parallel "contains" implementation for all 
{{MatchRDFS.find_*}}, {{find().hasNext}} is about as good as it gets.

 Most, but not all, {{find_*}} are on-demand streams using {{flatMap}} so the 
saving is in stream setup, rather than computation.

The code has some notes of possible places for improvements to use more streams.


was (Author: andy.seaborne):
Short of writing a parallel "contains" implementation for all 
{{MatchRDFS.find_*}}, {{find().hasNext}} is about as good as it gets.

 Mostly, but not all, {{find_*}} are on-demand streams using {{flatMap}} so the 
saving is in stream setup, rather than computation.

The code has some notes of possible places for improvements to use more streams.

> GraphRDFS doesn't implement contains
> 
>
> Key: JENA-2290
> URL: https://issues.apache.org/jira/browse/JENA-2290
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Assignee: Andy Seaborne
>Priority: Major
>
> While trying to use the RDFS dataset to use light-weight reasoning I 
> recognized that the contains method isn't implemented properly?
> I can't say if this holds for all contains calls as there is some contains 
> method directly in the dataset.
> But the following path is what I got trouble with:
> Given {{D}} being a {{DatasetGraphRDFS}} and then getting the named model 
> {{M}} for a particular graph which in fact is then backed by a {{GraphRDFS}} 
> instance {{G}} this {{G}} doesn't seem to make use of the inferred triples 
> when we call {{{}contains{}}}. This method is still only implemented in the 
> {{GraphWrapper}} superclass and doesn't make use of the overridden {{find}} 
> method.
> Don't what would be best place to implement it. Sure, we could make use of 
> {{find}} directly in the {{GraphRDFS}} class, e.g.
> {code:java}
> @Override
> public boolean contains(Node s, Node p, Node o) {
> return find(s, p, o).hasNext();
> }
> @Override
> public boolean contains(Triple t) {
> return contains(t.getSubject(), t.getPredicate(), t.getObject());
> } {code}
> But I'm wondering about efficiency as I don't know how efficient the 
> inference streams are built. I mean we could maybe terminate way earlier in 
> the {{MatchRDFS}} but it would of course lead to some more lines of code.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (JENA-2290) GraphRDFS doesn't implement contains

2022-02-22 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496000#comment-17496000
 ] 

Andy Seaborne edited comment on JENA-2290 at 2/22/22, 1:45 PM:
---

Short of writing a parallel "contains" implementation for all 
{{MatchRDFS.find_*}}, {{find().hasNext}} is about as good as it gets.

 Mostly, but not all, {{find_*}} are on-demand streams using {{flatMap}} so the 
saving is in stream setup, rather than computation.

The code has some notes of possible places for improvements to use more streams.


was (Author: andy.seaborne):
Short of writing a parallel "contains" implementation for all 
{{MatchRDFS.find_*}}, {{find().hasNext}} is about as good as it gets.

 Mostly {{find_*}} are on-demand streams using {{flatMap}} so the saving is in 
stream setup, rather than computation.

The code has some notes of possible places for improvements to use more streams.

> GraphRDFS doesn't implement contains
> 
>
> Key: JENA-2290
> URL: https://issues.apache.org/jira/browse/JENA-2290
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Assignee: Andy Seaborne
>Priority: Major
>
> While trying to use the RDFS dataset to use light-weight reasoning I 
> recognized that the contains method isn't implemented properly?
> I can't say if this holds for all contains calls as there is some contains 
> method directly in the dataset.
> But the following path is what I got trouble with:
> Given {{D}} being a {{DatasetGraphRDFS}} and then getting the named model 
> {{M}} for a particular graph which in fact is then backed by a {{GraphRDFS}} 
> instance {{G}} this {{G}} doesn't seem to make use of the inferred triples 
> when we call {{{}contains{}}}. This method is still only implemented in the 
> {{GraphWrapper}} superclass and doesn't make use of the overridden {{find}} 
> method.
> Don't what would be best place to implement it. Sure, we could make use of 
> {{find}} directly in the {{GraphRDFS}} class, e.g.
> {code:java}
> @Override
> public boolean contains(Node s, Node p, Node o) {
> return find(s, p, o).hasNext();
> }
> @Override
> public boolean contains(Triple t) {
> return contains(t.getSubject(), t.getPredicate(), t.getObject());
> } {code}
> But I'm wondering about efficiency as I don't know how efficient the 
> inference streams are built. I mean we could maybe terminate way earlier in 
> the {{MatchRDFS}} but it would of course lead to some more lines of code.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (JENA-2290) GraphRDFS doesn't implement contains

2022-02-22 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-2290:

Priority: Major  (was: Minor)

> GraphRDFS doesn't implement contains
> 
>
> Key: JENA-2290
> URL: https://issues.apache.org/jira/browse/JENA-2290
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Assignee: Andy Seaborne
>Priority: Major
>
> While trying to use the RDFS dataset to use light-weight reasoning I 
> recognized that the contains method isn't implemented properly?
> I can't say if this holds for all contains calls as there is some contains 
> method directly in the dataset.
> But the following path is what I got trouble with:
> Given {{D}} being a {{DatasetGraphRDFS}} and then getting the named model 
> {{M}} for a particular graph which in fact is then backed by a {{GraphRDFS}} 
> instance {{G}} this {{G}} doesn't seem to make use of the inferred triples 
> when we call {{{}contains{}}}. This method is still only implemented in the 
> {{GraphWrapper}} superclass and doesn't make use of the overridden {{find}} 
> method.
> Don't what would be best place to implement it. Sure, we could make use of 
> {{find}} directly in the {{GraphRDFS}} class, e.g.
> {code:java}
> @Override
> public boolean contains(Node s, Node p, Node o) {
> return find(s, p, o).hasNext();
> }
> @Override
> public boolean contains(Triple t) {
> return contains(t.getSubject(), t.getPredicate(), t.getObject());
> } {code}
> But I'm wondering about efficiency as I don't know how efficient the 
> inference streams are built. I mean we could maybe terminate way earlier in 
> the {{MatchRDFS}} but it would of course lead to some more lines of code.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (JENA-2291) Avoid core FileManager/LocationMapper initialization if replaced

2022-02-22 Thread Andy Seaborne (Jira)
Andy Seaborne created JENA-2291:
---

 Summary: Avoid core FileManager/LocationMapper initialization if 
replaced
 Key: JENA-2291
 URL: https://issues.apache.org/jira/browse/JENA-2291
 Project: Apache Jena
  Issue Type: Improvement
  Components: Ontology API, RIOT
Affects Versions: Jena 4.4.0
Reporter: Andy Seaborne
Assignee: Andy Seaborne
 Fix For: Jena 4.5.0


Records:

https://stackoverflow.com/questions/71157639/block-implicit-location-mapping-loading-in-jena



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2290) GraphRDFS doesn't implement contains

2022-02-22 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496000#comment-17496000
 ] 

Andy Seaborne commented on JENA-2290:
-

Short of writing a parallel "contains" implementation for all 
{{MatchRDFS.find_*}}, {{find().hasNext}} is about as good as it gets.

 Mostly {{find_*}} are on-demand streams using {{flatMap}} so the saving is in 
stream setup, rather than computation.

The code has some notes of possible places for improvements to use more streams.

> GraphRDFS doesn't implement contains
> 
>
> Key: JENA-2290
> URL: https://issues.apache.org/jira/browse/JENA-2290
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Assignee: Andy Seaborne
>Priority: Minor
>
> While trying to use the RDFS dataset to use light-weight reasoning I 
> recognized that the contains method isn't implemented properly?
> I can't say if this holds for all contains calls as there is some contains 
> method directly in the dataset.
> But the following path is what I got trouble with:
> Given {{D}} being a {{DatasetGraphRDFS}} and then getting the named model 
> {{M}} for a particular graph which in fact is then backed by a {{GraphRDFS}} 
> instance {{G}} this {{G}} doesn't seem to make use of the inferred triples 
> when we call {{{}contains{}}}. This method is still only implemented in the 
> {{GraphWrapper}} superclass and doesn't make use of the overridden {{find}} 
> method.
> Don't what would be best place to implement it. Sure, we could make use of 
> {{find}} directly in the {{GraphRDFS}} class, e.g.
> {code:java}
> @Override
> public boolean contains(Node s, Node p, Node o) {
> return find(s, p, o).hasNext();
> }
> @Override
> public boolean contains(Triple t) {
> return contains(t.getSubject(), t.getPredicate(), t.getObject());
> } {code}
> But I'm wondering about efficiency as I don't know how efficient the 
> inference streams are built. I mean we could maybe terminate way earlier in 
> the {{MatchRDFS}} but it would of course lead to some more lines of code.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (JENA-2290) GraphRDFS doesn't implement contains

2022-02-22 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne reassigned JENA-2290:
---

Assignee: Andy Seaborne

> GraphRDFS doesn't implement contains
> 
>
> Key: JENA-2290
> URL: https://issues.apache.org/jira/browse/JENA-2290
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
>Affects Versions: Jena 4.4.0
>Reporter: Lorenz Bühmann
>Assignee: Andy Seaborne
>Priority: Minor
>
> While trying to use the RDFS dataset to use light-weight reasoning I 
> recognized that the contains method isn't implemented properly?
> I can't say if this holds for all contains calls as there is some contains 
> method directly in the dataset.
> But the following path is what I got trouble with:
> Given {{D}} being a {{DatasetGraphRDFS}} and then getting the named model 
> {{M}} for a particular graph which in fact is then backed by a {{GraphRDFS}} 
> instance {{G}} this {{G}} doesn't seem to make use of the inferred triples 
> when we call {{{}contains{}}}. This method is still only implemented in the 
> {{GraphWrapper}} superclass and doesn't make use of the overridden {{find}} 
> method.
> Don't what would be best place to implement it. Sure, we could make use of 
> {{find}} directly in the {{GraphRDFS}} class, e.g.
> {code:java}
> @Override
> public boolean contains(Node s, Node p, Node o) {
> return find(s, p, o).hasNext();
> }
> @Override
> public boolean contains(Triple t) {
> return contains(t.getSubject(), t.getPredicate(), t.getObject());
> } {code}
> But I'm wondering about efficiency as I don't know how efficient the 
> inference streams are built. I mean we could maybe terminate way earlier in 
> the {{MatchRDFS}} but it would of course lead to some more lines of code.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2285) Java Heap error when there is an optional in service block

2022-02-20 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495261#comment-17495261
 ] 

Andy Seaborne commented on JENA-2285:
-

{quote}it worked.
{quote}
meaning no heap error?

Have you experimented with a larger heap?

Only if the two forms both execute without a heap error and come up with 
different answers is there an issue (as explained earlier about the optimizer).

You can not read SPARQL queries top-to-bottom. Evaluation does not work like 
that; it works "inside out", evaluating blocks and executing joins on them 
(like a functional programming language or indeed SQL). The optimizer tries to 
find a faster way to the same results but it's not perfect.

In the "as well", more is passed to the far end. Everything inside SERVICE goes 
to {{{}[https://query.wikidata.org/sparql]{}}}.

> Java Heap error when there is an optional in service block
> --
>
> Key: JENA-2285
> URL: https://issues.apache.org/jira/browse/JENA-2285
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Priority: Major
>
> Here is the query:
> {code:java}
> PREFIX owl: 
> PREFIX rdf: 
> PREFIX rdfs: 
> PREFIX wdt: 
> PREFIX p: 
> PREFIX pq: 
> PREFIX ps: 
> PREFIX psv: 
> PREFIX wikibase: 
> SELECT ?wikidata_city_iri ?website
> WHERE {
>   BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)
>   BIND(IRI(?id) AS ?wikidata_city_iri) .  
>   SERVICE  {
> #Querying website   
>     OPTIONAL {
>       ?wikidata_city_iri wdt:P856 ?website.
>     } . 
>   }.
> } {code}
> When the query is executed I get"Java Heap Error" after, as I guess, Java 
> runs out of memory in the pool. The trace of the error is below.
> When OPTIONAL is commented out, then I get "no data" as expected.
> {code:java}
>  09:16:08 INFO  Fuseki          :: [5] POST 
> http://127.0.0.1:3030/WattTour/sparql
> 09:16:08 INFO  Fuseki          :: [5] Query = PREFIX owl: 
>  PREFIX rdf: 
>  PREFIX rdfs: 
>  PREFIX wdt: 
>  PREFIX p: 
>  PREFIX pq: 
>  PREFIX ps: 
>  PREFIX psv: 
>  PREFIX wikibase: 
>  SELECT ?wikidata_city_iri ?website WHERE {   
> BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)   BIND(IRI(?id) AS 
> ?wikidata_city_iri) .    SERVICE  { 
> #Querying website    OPTIONAL {       ?wikidata_city_iri wdt:P856 ?website.   
>   } .    }. }
> 09:18:43 WARN  HttpChannel     :: /$/ping
> javax.servlet.ServletException: Filtered request failed.
>         at 
> org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:384)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:284)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:247)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:506) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHa

[jira] [Resolved] (JENA-2287) Response in text/plain to SPARQL Update (HTML form) when accept text/plain

2022-02-20 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-2287.
-
Fix Version/s: Jena 4.5.0
   Resolution: Done

> Response in text/plain to SPARQL Update (HTML form) when accept text/plain
> --
>
> Key: JENA-2287
> URL: https://issues.apache.org/jira/browse/JENA-2287
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Fuseki
>Affects Versions: Jena 4.4.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Minor
> Fix For: Jena 4.5.0
>
>
> yasgui sends "Accept: text/plain;,*/*;q=0.9" so we can legitimately respond 
> with plain text.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2285) Java Heap error when there is an optional in service block

2022-02-19 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495045#comment-17495045
 ] 

Andy Seaborne commented on JENA-2285:
-

Have you tried:
{code}
OPTIONAL {
   SERVICE  {
  ?wikidata_city_iri wdt:P856 ?website.
} 
  }
{code}

(Also from Java code., the new "substitution" operations on the QueryExecution 
builder pattern are useful but that's API calls.)

> Java Heap error when there is an optional in service block
> --
>
> Key: JENA-2285
> URL: https://issues.apache.org/jira/browse/JENA-2285
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Priority: Major
>
> Here is the query:
> {code:java}
> PREFIX owl: 
> PREFIX rdf: 
> PREFIX rdfs: 
> PREFIX wdt: 
> PREFIX p: 
> PREFIX pq: 
> PREFIX ps: 
> PREFIX psv: 
> PREFIX wikibase: 
> SELECT ?wikidata_city_iri ?website
> WHERE {
>   BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)
>   BIND(IRI(?id) AS ?wikidata_city_iri) .  
>   SERVICE  {
> #Querying website   
>     OPTIONAL {
>       ?wikidata_city_iri wdt:P856 ?website.
>     } . 
>   }.
> } {code}
> When the query is executed I get"Java Heap Error" after, as I guess, Java 
> runs out of memory in the pool. The trace of the error is below.
> When OPTIONAL is commented out, then I get "no data" as expected.
> {code:java}
>  09:16:08 INFO  Fuseki          :: [5] POST 
> http://127.0.0.1:3030/WattTour/sparql
> 09:16:08 INFO  Fuseki          :: [5] Query = PREFIX owl: 
>  PREFIX rdf: 
>  PREFIX rdfs: 
>  PREFIX wdt: 
>  PREFIX p: 
>  PREFIX pq: 
>  PREFIX ps: 
>  PREFIX psv: 
>  PREFIX wikibase: 
>  SELECT ?wikidata_city_iri ?website WHERE {   
> BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)   BIND(IRI(?id) AS 
> ?wikidata_city_iri) .    SERVICE  { 
> #Querying website    OPTIONAL {       ?wikidata_city_iri wdt:P856 ?website.   
>   } .    }. }
> 09:18:43 WARN  HttpChannel     :: /$/ping
> javax.servlet.ServletException: Filtered request failed.
>         at 
> org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:384)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:284)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:247)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:506) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(Context

[jira] [Commented] (JENA-2285) Java Heap error when there is an optional in service block

2022-02-19 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495036#comment-17495036
 ] 

Andy Seaborne commented on JENA-2285:
-

{{VALUES}}, {{BIND}}, any pattern that sets variables used in the SERVICE. 
Makes no difference. The OPTIONAL

Either move the VALUES inside SERVICE or find a way not use the OPTIONAL.  A 
plain triple pattern is handled with constants passing down.

As you have it, ARQ is executing
{code}SELECT * { OPTIONAL { ?wikidata_city_iri wdt:P856 ?website.}{code}

i.e. all {{wdt:P856}}.

You can try increasing the heap size - on Linux set environment variable 
JVM_ARGS.  Java runs out of space, it gets very slow so it might even executed 
faster than 154,663 (which is I presume 2mins 34seconds).

{{qparse --print=opt --file=}} can be used to show what the execution 
is.

{{(sequence)}} is the operator to pass prebound variables. {{(join)}} does not.


> Java Heap error when there is an optional in service block
> --
>
> Key: JENA-2285
> URL: https://issues.apache.org/jira/browse/JENA-2285
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Priority: Major
>
> Here is the query:
> {code:java}
> PREFIX owl: 
> PREFIX rdf: 
> PREFIX rdfs: 
> PREFIX wdt: 
> PREFIX p: 
> PREFIX pq: 
> PREFIX ps: 
> PREFIX psv: 
> PREFIX wikibase: 
> SELECT ?wikidata_city_iri ?website
> WHERE {
>   BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)
>   BIND(IRI(?id) AS ?wikidata_city_iri) .  
>   SERVICE  {
> #Querying website   
>     OPTIONAL {
>       ?wikidata_city_iri wdt:P856 ?website.
>     } . 
>   }.
> } {code}
> When the query is executed I get"Java Heap Error" after, as I guess, Java 
> runs out of memory in the pool. The trace of the error is below.
> When OPTIONAL is commented out, then I get "no data" as expected.
> {code:java}
>  09:16:08 INFO  Fuseki          :: [5] POST 
> http://127.0.0.1:3030/WattTour/sparql
> 09:16:08 INFO  Fuseki          :: [5] Query = PREFIX owl: 
>  PREFIX rdf: 
>  PREFIX rdfs: 
>  PREFIX wdt: 
>  PREFIX p: 
>  PREFIX pq: 
>  PREFIX ps: 
>  PREFIX psv: 
>  PREFIX wikibase: 
>  SELECT ?wikidata_city_iri ?website WHERE {   
> BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)   BIND(IRI(?id) AS 
> ?wikidata_city_iri) .    SERVICE  { 
> #Querying website    OPTIONAL {       ?wikidata_city_iri wdt:P856 ?website.   
>   } .    }. }
> 09:18:43 WARN  HttpChannel     :: /$/ping
> javax.servlet.ServletException: Filtered request failed.
>         at 
> org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:384)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:284)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:247)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:506) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>  ~[fuseki-server.

[jira] [Created] (JENA-2287) Response in text/plain to SPARQL Update (HTML form) when accept text/plain

2022-02-19 Thread Andy Seaborne (Jira)
Andy Seaborne created JENA-2287:
---

 Summary: Response in text/plain to SPARQL Update (HTML form) when 
accept text/plain
 Key: JENA-2287
 URL: https://issues.apache.org/jira/browse/JENA-2287
 Project: Apache Jena
  Issue Type: Improvement
  Components: Fuseki
Affects Versions: Jena 4.4.0
Reporter: Andy Seaborne
Assignee: Andy Seaborne


yasgui sends "Accept: text/plain;,*/*;q=0.9" so we can legitimately respond 
with plain text.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2282) Fuseki2 Query Store

2022-02-19 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494966#comment-17494966
 ] 

Andy Seaborne commented on JENA-2282:
-

[~Car]

bq. Is current config - the dataset & endpoints specification in RDF - already 
stored in a separate dataset or is it just read from RDF files as needed?

The state is in files. Having a system dataset didn't work out too well - it 
gets out of step with the configuration files that the user edits. So the 
config files are primary and read into the server.

Fuseki has admin operations that return the current stat as JSON. Endpoint 
`/$/datasets`.

In a  design let's be clear what is Fuseki-specific and what is going to work 
for any endpoint. i..e try to introspect the server, and also allow the user to 
type in an endpoint as well.

> Fuseki2 Query Store
> ---
>
> Key: JENA-2282
> URL: https://issues.apache.org/jira/browse/JENA-2282
> Project: Apache Jena
>  Issue Type: Wish
>  Components: Fuseki UI
>Affects Versions: Jena 4.4.0
>Reporter: Nicholas
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Many triplestore applications have a way to store SPARQL queries. These sorts 
> of application parts are really useful: you can keep coming back to useful 
> queries. If the queries can be named, then you can build up a query library.
> Not hard to make and super useful. Unless there is already an extension for 
> this or plans for such, I'm happy to (have my staff) give this a go!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Deleted] (JENA-2286) Buy Ativan Online Overnight

2022-02-18 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne deleted JENA-2286:



> Buy Ativan Online Overnight 
> 
>
> Key: JENA-2286
> URL: https://issues.apache.org/jira/browse/JENA-2286
> Project: Apache Jena
>  Issue Type: Bug
>Reporter: jaipurjnu
>Priority: Major
>  Labels: buyativanforsale
>
> Jaipur National University houses one of the best [*MBBS colleges in 
> Rajasthan*|https://www.jnujaipur.ac.in/programmes/ug-programmes/mbbs]. The 
> Institute for Medical Sciences and Research Centre is dedicated to delivering 
> world-class education, research, and other scholarly activities that benefit 
> patients and society, healthcare practice and practitioners, and the 
> healthcare education community significantly. The Medical College is 
> committed to maintain high academic standards and continuously enhance the 
> quality of the learning and teaching process. For more information on the 
> graduation and postgraduate program offered by the college on our website.
> [best colleges of rajasthan|
> |https://quesanswer.com/question/jaipur-national-university-is-amongst-the-top-10-university-in-india/|
> ]
> [top colleges of rajasthan|
> |https://quesanswer.com/question/looking-for-the-best-colleges-in-rajasthan-look-no-further/|
> ]
> [top 10 university in india|
> |https://quesanswer.com/question/enroll-at-jaipur-national-university-for-mba-in-fashion-designing/|
> ]
> [top 5 law colleges in india|
> |https://docs.google.com/document/d/1LaVl0WGU6paSD_enBqd9hHagaN-GW-Q0seGlZ1fZiF8/edit|
> ]
> [top hr colleges in india|
> |https://www.nairaland.com/6991986/best-aicte-approved-engineering-college#110323055|
> ]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (JENA-2285) Java Heap error when there is an optional in service block

2022-02-18 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494514#comment-17494514
 ] 

Andy Seaborne commented on JENA-2285:
-

Hi [~dz2002], thank you for a minimal example.

Try putting the BIND statements inside the SERVICE call. They are not being 
passed to the WikiData SPARQL endpoint. 

Optimization around SERVICE is very cautious (i.e. very little is done) because 
ARQ does know what the other end is capable of. That does mean that  the query 
writer has be involved.




> Java Heap error when there is an optional in service block
> --
>
> Key: JENA-2285
> URL: https://issues.apache.org/jira/browse/JENA-2285
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.4.0
>Reporter: Dmitry Zhelobanov
>Priority: Major
>
> Here is the query:
> {code:java}
> PREFIX owl: 
> PREFIX rdf: 
> PREFIX rdfs: 
> PREFIX wdt: 
> PREFIX p: 
> PREFIX pq: 
> PREFIX ps: 
> PREFIX psv: 
> PREFIX wikibase: 
> SELECT ?wikidata_city_iri ?website
> WHERE {
>   BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)
>   BIND(IRI(?id) AS ?wikidata_city_iri) .  
>   SERVICE  {
> #Querying website   
>     OPTIONAL {
>       ?wikidata_city_iri wdt:P856 ?website.
>     } . 
>   }.
> } {code}
> When the query is executed I get"Java Heap Error" after, as I guess, Java 
> runs out of memory in the pool. The trace of the error is below.
> When OPTIONAL is commented out, then I get "no data" as expected.
> {code:java}
>  09:16:08 INFO  Fuseki          :: [5] POST 
> http://127.0.0.1:3030/WattTour/sparql
> 09:16:08 INFO  Fuseki          :: [5] Query = PREFIX owl: 
>  PREFIX rdf: 
>  PREFIX rdfs: 
>  PREFIX wdt: 
>  PREFIX p: 
>  PREFIX pq: 
>  PREFIX ps: 
>  PREFIX psv: 
>  PREFIX wikibase: 
>  SELECT ?wikidata_city_iri ?website WHERE {   
> BIND(IRI("http://www.wikidata.org/entity/Q15757";) as ?id)   BIND(IRI(?id) AS 
> ?wikidata_city_iri) .    SERVICE  { 
> #Querying website    OPTIONAL {       ?wikidata_city_iri wdt:P856 ?website.   
>   } .    }. }
> 09:18:43 WARN  HttpChannel     :: /$/ping
> javax.servlet.ServletException: Filtered request failed.
>         at 
> org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:384)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:284)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:247)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:506) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578) 
> ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571)
>  ~[fuseki-server.jar:4.4.0]
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221)
>  ~[fuseki-server.jar:4.4.0]
>

  1   2   3   4   5   6   7   8   9   10   >