[jira] [Commented] (JENA-2179) TDB throws Unicode Replacement Character exception while fetching data

2021-10-20 Thread Holger Knublauch (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432177#comment-17432177
 ] 

Holger Knublauch commented on JENA-2179:


BTW the same seems to happen using RDF Delta:

{code:java}
[line: 1276, col: 437] Unicode replacement character U+FFFD.

org.apache.jena.riot.RiotParseException: [line: 1276, col: 428] Unicode 
replacement character U+FFFD in string
at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
at org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332)
at org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768)
at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238)
at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89)
at 
org.seaborne.patch.text.RDFPatchReaderText.nextToken(RDFPatchReaderText.java:243)
at 
org.seaborne.patch.text.RDFPatchReaderText.nextNode(RDFPatchReaderText.java:254)
at 
org.seaborne.patch.text.RDFPatchReaderText.doOneLine(RDFPatchReaderText.java:104)
at org.seaborne.patch.text.RDFPatchReaderText.apply1(RDFPatchReaderText.java:72)
at org.seaborne.patch.text.RDFPatchReaderText.read(RDFPatchReaderText.java:49)
at org.seaborne.patch.text.RDFPatchReaderText.apply(RDFPatchReaderText.java:59)
at 
org.seaborne.delta.client.DeltaLinkHTTP.lambda$fetchCommon$8(DeltaLinkHTTP.java:211)
at org.seaborne.delta.client.DeltaLinkHTTP.retry(DeltaLinkHTTP.java:125)
at org.seaborne.delta.client.DeltaLinkHTTP.fetchCommon(DeltaLinkHTTP.java:204)
at org.seaborne.delta.client.DeltaLinkHTTP.fetch(DeltaLinkHTTP.java:184)
at org.topbraidlive.edg.backup.BackupUtils.getPatch(BackupUtils.java:368)
{code}


> TDB throws Unicode Replacement Character exception while fetching data
> --
>
> Key: JENA-2179
> URL: https://issues.apache.org/jira/browse/JENA-2179
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB
>Affects Versions: Jena 4.2.0
>Reporter: Holger Knublauch
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 4.3.0
>
> Attachments: TBS4190_Test.java
>
>
> This seems to have been introduced with 
> https://issues.apache.org/jira/browse/JENA-2120
> With TDB databases that contain the replacement character in a literal, the 
> warnings are reported as Exceptions. We have seen this:
> {code:java}
> WARN  [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler - 
> Exception while fetching data (/resources[0]/turtleSourceCode) : [line: 1, 
> col: 318] Unicode replacement character U+FFFD in string
> org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode 
> replacement character U+FFFD in string
>   at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
>  ~[jena-arq-4.2.0.jar:4.2.0]
>   at 
> org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>   at 
> org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>   at 
> org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>   at 
> org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>   at 
> org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119) 
> ~[jena-tdb-4.2.0.jar:4.2.0]
>   at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118) 
> ~[jena-tdb-4.2.0.jar:4.2.0]
> {code}
> TDB seems to use the fallback error handler causing an exception to be thrown 
> instead of just printing the warning (to the log).
> Richard says he believes a fix would be to change NodecSEE.createTokenizer():
> {code:java}
> return TokenizerText.create()
> .fromString(string)
> .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
> .build();
> {code}
> Is there any known work-around in 4.2.0? We cannot even query those triples 
> from the offending TDBs at the moment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (JENA-2179) TDB throws Unicode Replacement Character exception while fetching data

2021-10-06 Thread Holger Knublauch (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425267#comment-17425267
 ] 

Holger Knublauch commented on JENA-2179:


The data was populated with a previous version of Jena, from TTL files that had 
this character (ITIL glossary, had some incorrect literals in it). Anyone who 
has the (old) TopBraid samples installed will now have this corrupted data and 
it would crash when they upgrade.

Before upgrading to Jena 4.2.0 this was OK because no such checks were 
happening.

The fix mentioned above seems to work, and I have just tested it on our product 
where we now use a Reflection hack to overwrite the private field NodeLib.nodec 
with a fixed version of that class that only differs on 
NodecSEE.createTokenizer() as above.

The change for JENA-2120 seems well-motivated and well-intended as explained in 
the comment above TokenizerText.warning: /** Warning - can continue. */  
However, from the context of NodeLib it doesn't produce a warning only but an 
Exception. It would cause similar issues for any other warnings that are 
reported.

Sorry we are in the middle of a release crunch so I don't have time to look 
into a formal Jena test case.


> TDB throws Unicode Replacement Character exception while fetching data
> --
>
> Key: JENA-2179
> URL: https://issues.apache.org/jira/browse/JENA-2179
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB
>Affects Versions: Jena 4.2.0
>Reporter: Holger Knublauch
>Priority: Major
>
> This seems to have been introduced with 
> https://issues.apache.org/jira/browse/JENA-2120
> With TDB databases that contain the replacement character in a literal, the 
> warnings are reported as Exceptions. We have seen this:
> {code:java}
> WARN  [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler - 
> Exception while fetching data (/resources[0]/turtleSourceCode) : [line: 1, 
> col: 318] Unicode replacement character U+FFFD in string
> org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode 
> replacement character U+FFFD in string
>   at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
>  ~[jena-arq-4.2.0.jar:4.2.0]
>   at 
> org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>   at 
> org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>   at 
> org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>   at 
> org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>   at 
> org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119) 
> ~[jena-tdb-4.2.0.jar:4.2.0]
>   at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118) 
> ~[jena-tdb-4.2.0.jar:4.2.0]
> {code}
> TDB seems to use the fallback error handler causing an exception to be thrown 
> instead of just printing the warning (to the log).
> Richard says he believes a fix would be to change NodecSEE.createTokenizer():
> {code:java}
> return TokenizerText.create()
> .fromString(string)
> .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
> .build();
> {code}
> Is there any known work-around in 4.2.0? We cannot even query those triples 
> from the offending TDBs at the moment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (JENA-2179) TDB throws Unicode Replacement Character exception while fetching data

2021-10-05 Thread Holger Knublauch (Jira)
Holger Knublauch created JENA-2179:
--

 Summary: TDB throws Unicode Replacement Character exception while 
fetching data
 Key: JENA-2179
 URL: https://issues.apache.org/jira/browse/JENA-2179
 Project: Apache Jena
  Issue Type: Bug
  Components: TDB
Affects Versions: Jena 4.2.0
Reporter: Holger Knublauch


This seems to have been introduced with 
https://issues.apache.org/jira/browse/JENA-2120

With TDB databases that contain the replacement character in a literal, the 
warnings are reported as Exceptions. We have seen this:

{code:java}
WARN  [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler - Exception 
while fetching data (/resources[0]/turtleSourceCode) : [line: 1, col: 318] 
Unicode replacement character U+FFFD in string
org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode 
replacement character U+FFFD in string
at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
 ~[jena-arq-4.2.0.jar:4.2.0]
at 
org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332) 
~[jena-arq-4.2.0.jar:4.2.0]
at 
org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768) 
~[jena-arq-4.2.0.jar:4.2.0]
at 
org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238) 
~[jena-arq-4.2.0.jar:4.2.0]
at 
org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89) 
~[jena-arq-4.2.0.jar:4.2.0]
at 
org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119) 
~[jena-tdb-4.2.0.jar:4.2.0]
at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118) 
~[jena-tdb-4.2.0.jar:4.2.0]
{code}

TDB seems to use the fallback error handler causing an exception to be thrown 
instead of just printing the warning (to the log).

Richard says he believes a fix would be to change NodecSEE.createTokenizer():

{code:java}
return TokenizerText.create()
.fromString(string)
.errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
.build();
{code}

Is there any known work-around in 4.2.0? We cannot even query those triples 
from the offending TDBs at the moment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (JENA-2157) Unclosed iterator in PrefixMappingUtils.calcInUsePrefixMapping

2021-09-07 Thread Holger Knublauch (Jira)
Holger Knublauch created JENA-2157:
--

 Summary: Unclosed iterator in 
PrefixMappingUtils.calcInUsePrefixMapping
 Key: JENA-2157
 URL: https://issues.apache.org/jira/browse/JENA-2157
 Project: Apache Jena
  Issue Type: Bug
Affects Versions: Jena 4.1.0
Reporter: Holger Knublauch


The graph.find(null, null, null) is not exhausted when the break in line 121 is 
reached. Requires an iter.close() in that case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (JENA-2151) Iter.filter does not close nested iterator

2021-08-23 Thread Holger Knublauch (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403447#comment-17403447
 ] 

Holger Knublauch commented on JENA-2151:


Ok when I looked at that diff it didn't compile because it also needed 
IteratorCloseable. Then I thought there may be other changes. But I have added 
that file too and replaced the .jars and the product seems to pass the tests 
without unclosed iterator warnings. So green lights from my end.

> Iter.filter does not close nested iterator
> --
>
> Key: JENA-2151
> URL: https://issues.apache.org/jira/browse/JENA-2151
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.1.0
>    Reporter: Holger Knublauch
>Priority: Major
> Attachments: IteratorFilter.java
>
>
> We recently attempted to upgrade our product to Jena 4.1.0 but noticed 
> unclosed iterator warnings. I believe I have tracked it down to the fact that 
> Iter.filter does not return a Closeable iterator and therefore does not close 
> its nested (stream) iterator. I am attaching an implementation class that 
> seems to fix it. With this, org.apache.jena.atlas.iterator.Iter.filter simply 
> need to become
> {code:java}
> public static  Iterator filter(final Iterator stream, 
> final Predicate filter) {
> return new IteratorFilter(stream, filter);
> }
> {code}
> (Although Iter.filter hasn't changed for a while, I suspect some other 
> changes to Jena caused the SPARQL engine to use it, and this has broken some 
> scenarios for us - in particular calling SPIN/SHACL-SPARQL functions with 
> BGPs in the WHERE clause).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (JENA-2151) Iter.filter does not close nested iterator

2021-08-22 Thread Holger Knublauch (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402922#comment-17402922
 ] 

Holger Knublauch commented on JENA-2151:


Hi Andy, it's difficult for me to try this out because there are many other 
changes in that branch since 4.1 and we only updated to 4.1.0 last week, not 
4.2.0-SNAPSHOT. I do not really want to open up more cans of worms here. Do you 
have a time line for the 4.2.0 release? We still have a month or so before our 
own code freeze and could then switch to 4.2.0 to confirm this. Meanwhile I 
have a patch that works in case we must release with 4.1.0.

> Iter.filter does not close nested iterator
> --
>
> Key: JENA-2151
> URL: https://issues.apache.org/jira/browse/JENA-2151
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.1.0
>    Reporter: Holger Knublauch
>Priority: Major
> Attachments: IteratorFilter.java
>
>
> We recently attempted to upgrade our product to Jena 4.1.0 but noticed 
> unclosed iterator warnings. I believe I have tracked it down to the fact that 
> Iter.filter does not return a Closeable iterator and therefore does not close 
> its nested (stream) iterator. I am attaching an implementation class that 
> seems to fix it. With this, org.apache.jena.atlas.iterator.Iter.filter simply 
> need to become
> {code:java}
> public static  Iterator filter(final Iterator stream, 
> final Predicate filter) {
> return new IteratorFilter(stream, filter);
> }
> {code}
> (Although Iter.filter hasn't changed for a while, I suspect some other 
> changes to Jena caused the SPARQL engine to use it, and this has broken some 
> scenarios for us - in particular calling SPIN/SHACL-SPARQL functions with 
> BGPs in the WHERE clause).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (JENA-2151) Iter.filter does not close nested iterator

2021-08-21 Thread Holger Knublauch (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402734#comment-17402734
 ] 

Holger Knublauch commented on JENA-2151:


Yes the warnings are in TQ tracking code. The iterators may not be exhausted 
because SHACL/SPIN functions only walk to the first result binding.

You know better than I do what the contracts of Iter should be, so whatever 
works best is of course fine for me. If you say the change should go into 
SolveRX3 then perfectly fine, as long as know that this will be fixed from the 
next releases onwards. Meanwhile we can live with the patched up jar file. 
Returning an ExtendedIterator sounds good.

> Iter.filter does not close nested iterator
> --
>
> Key: JENA-2151
> URL: https://issues.apache.org/jira/browse/JENA-2151
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.1.0
>    Reporter: Holger Knublauch
>Priority: Major
> Attachments: IteratorFilter.java
>
>
> We recently attempted to upgrade our product to Jena 4.1.0 but noticed 
> unclosed iterator warnings. I believe I have tracked it down to the fact that 
> Iter.filter does not return a Closeable iterator and therefore does not close 
> its nested (stream) iterator. I am attaching an implementation class that 
> seems to fix it. With this, org.apache.jena.atlas.iterator.Iter.filter simply 
> need to become
> {code:java}
> public static  Iterator filter(final Iterator stream, 
> final Predicate filter) {
> return new IteratorFilter(stream, filter);
> }
> {code}
> (Although Iter.filter hasn't changed for a while, I suspect some other 
> changes to Jena caused the SPARQL engine to use it, and this has broken some 
> scenarios for us - in particular calling SPIN/SHACL-SPARQL functions with 
> BGPs in the WHERE clause).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (JENA-2151) Iter.filter does not close nested iterator

2021-08-20 Thread Holger Knublauch (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402462#comment-17402462
 ] 

Holger Knublauch commented on JENA-2151:


Here is a stack trace from where it reaches the Iter.filter function

{noformat}
Thread [qtp947882-131] (Suspended (breakpoint at line 238 in Iter))
  Iter.filter(Iterator, Predicate) line: 238 
  Iter.removeNulls(Iterator) line: 404
  StageMatchTriple.accessTriple(Binding, Graph, Triple, Predicate, 
ExecutionContext) line: 63
  StageMatchTriple.lambda$accessTriple$0(Graph, Triple, Predicate, 
ExecutionContext, Binding) line: 48
  1513195333.apply(Object) line: not available
  IteratorFlatMap.hasNext() line: 57
  IterAbortable.hasNext() line: 49
  QueryIterAbortable(QueryIterPlainWrapper).hasNextBinding() line: 60
  QueryIterAbortable(QueryIteratorBase).hasNext() line: 114
  QueryIterAssign(QueryIterProcessBinding).hasNextBinding() line: 66
  QueryIterAssign(QueryIteratorBase).hasNext() line: 114
  QueryIterConcat.hasNextBinding() line: 82
  QueryIterConcat(QueryIteratorBase).hasNext() line: 114
  QueryIterUnion(QueryIterRepeatApply).hasNextBinding() line: 69
  QueryIterUnion(QueryIteratorBase).hasNext() line: 114
  QueryIterProject(QueryIterConvert).hasNextBinding() line: 58
  QueryIterProject(QueryIteratorBase).hasNext() line: 114
  QueryIteratorCheck(QueryIteratorWrapper).hasNextBinding() line: 38
  QueryIteratorCheck(QueryIteratorBase).hasNext() line: 114
  QueryIteratorCloseable(QueryIteratorWrapper).hasNextBinding() line: 38
  QueryIteratorCloseable(QueryIteratorBase).hasNext() line: 114
  ResultSetStream.hasNext() line: 64
  ResultSetCheckCondition.hasNext() line: 55
  SHACLSPARQLARQFunction.executeBody(Dataset, Model, QuerySolution) line: 142
  SHACLSPARQLARQFunction(SHACLARQFunction).exec(Binding, ExprList, String, 
FunctionEnv) line: 211
  E_Function.evalSpecial(Binding, FunctionEnv) line: 69
  ...
{noformat}

So it seems to come in from StageMatchTriple. Maybe line 54 of SolverRX3 has 
changed due to RDF-star work?

When I replaced my local copy of the Iter class with the "fixed" version, the 
problem went away (without other changes to our code base). So I am optimistic 
that this will resolve the problem.

I also don't see downsides of having Iter.filter implement Closeable. Do you?

I discovered this issue when I was walking through a long chain of close() 
calls. It stopped at the filter iterator because that didn't implement 
Closeable.

> Iter.filter does not close nested iterator
> --
>
> Key: JENA-2151
> URL: https://issues.apache.org/jira/browse/JENA-2151
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.1.0
>Reporter: Holger Knublauch
>Priority: Major
> Attachments: IteratorFilter.java
>
>
> We recently attempted to upgrade our product to Jena 4.1.0 but noticed 
> unclosed iterator warnings. I believe I have tracked it down to the fact that 
> Iter.filter does not return a Closeable iterator and therefore does not close 
> its nested (stream) iterator. I am attaching an implementation class that 
> seems to fix it. With this, org.apache.jena.atlas.iterator.Iter.filter simply 
> need to become
> {code:java}
> public static  Iterator filter(final Iterator stream, 
> final Predicate filter) {
> return new IteratorFilter(stream, filter);
> }
> {code}
> (Although Iter.filter hasn't changed for a while, I suspect some other 
> changes to Jena caused the SPARQL engine to use it, and this has broken some 
> scenarios for us - in particular calling SPIN/SHACL-SPARQL functions with 
> BGPs in the WHERE clause).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (JENA-2151) Iter.filter does not close nested iterator

2021-08-19 Thread Holger Knublauch (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holger Knublauch updated JENA-2151:
---
Attachment: IteratorFilter.java

> Iter.filter does not close nested iterator
> --
>
> Key: JENA-2151
> URL: https://issues.apache.org/jira/browse/JENA-2151
> Project: Apache Jena
>  Issue Type: Bug
>Affects Versions: Jena 4.1.0
>    Reporter: Holger Knublauch
>Priority: Major
> Attachments: IteratorFilter.java
>
>
> We recently attempted to upgrade our product to Jena 4.1.0 but noticed 
> unclosed iterator warnings. I believe I have tracked it down to the fact that 
> Iter.filter does not return a Closeable iterator and therefore does not close 
> its nested (stream) iterator. I am attaching an implementation class that 
> seems to fix it. With this, org.apache.jena.atlas.iterator.Iter.filter simply 
> need to become
> {code:java}
> public static  Iterator filter(final Iterator stream, 
> final Predicate filter) {
> return new IteratorFilter(stream, filter);
> }
> {code}
> (Although Iter.filter hasn't changed for a while, I suspect some other 
> changes to Jena caused the SPARQL engine to use it, and this has broken some 
> scenarios for us - in particular calling SPIN/SHACL-SPARQL functions with 
> BGPs in the WHERE clause).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (JENA-2151) Iter.filter does not close nested iterator

2021-08-19 Thread Holger Knublauch (Jira)
Holger Knublauch created JENA-2151:
--

 Summary: Iter.filter does not close nested iterator
 Key: JENA-2151
 URL: https://issues.apache.org/jira/browse/JENA-2151
 Project: Apache Jena
  Issue Type: Bug
Affects Versions: Jena 4.1.0
Reporter: Holger Knublauch
 Attachments: IteratorFilter.java

We recently attempted to upgrade our product to Jena 4.1.0 but noticed unclosed 
iterator warnings. I believe I have tracked it down to the fact that 
Iter.filter does not return a Closeable iterator and therefore does not close 
its nested (stream) iterator. I am attaching an implementation class that seems 
to fix it. With this, org.apache.jena.atlas.iterator.Iter.filter simply need to 
become
{code:java}
public static  Iterator filter(final Iterator stream, 
final Predicate filter) {
return new IteratorFilter(stream, filter);
}

{code}
(Although Iter.filter hasn't changed for a while, I suspect some other changes 
to Jena caused the SPARQL engine to use it, and this has broken some scenarios 
for us - in particular calling SPIN/SHACL-SPARQL functions with BGPs in the 
WHERE clause).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (JENA-1194) Syntaxtransform does not handle HAVING expressions

2016-06-12 Thread Holger Knublauch (JIRA)
Holger Knublauch created JENA-1194:
--

 Summary: Syntaxtransform does not handle HAVING expressions
 Key: JENA-1194
 URL: https://issues.apache.org/jira/browse/JENA-1194
 Project: Apache Jena
  Issue Type: Bug
  Components: Jena
Affects Versions: Jena 3.1.0
Reporter: Holger Knublauch


QueryTransformOps.transform(query, substitutions) does not handle variables in 
HAVING clauses. For example in

SELECT * { } HAVING (?count > $minCount)

the variable $minCount would not be substituted.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JENA-1129) NullPointerException in Query Transformation

2016-02-03 Thread Holger Knublauch (JIRA)
Holger Knublauch created JENA-1129:
--

 Summary: NullPointerException in Query Transformation
 Key: JENA-1129
 URL: https://issues.apache.org/jira/browse/JENA-1129
 Project: Apache Jena
  Issue Type: Bug
  Components: ARQ
Affects Versions: Jena 3.1.0
Reporter: Holger Knublauch


Query query = QueryFactory.create(
  " ASK { "
+ " FILTER NOT EXISTS {"
+ " BIND (?width * ?height AS ?value)"
+ " }"
+ " }");
QueryTransformOps.transform(query, new HashMap());

->

java.lang.NullPointerException
at org.apache.jena.sparql.expr.ExprVar.apply(ExprVar.java:92)
at 
org.apache.jena.sparql.expr.ExprTransformer$ApplyExprTransformVisitor.visit(ExprTransformer.java:176)
at org.apache.jena.sparql.expr.ExprVar.visit(ExprVar.java:90)
at 
org.apache.jena.sparql.expr.ExprWalker$Walker.visit(ExprWalker.java:97)
at 
org.apache.jena.sparql.expr.ExprWalker$WalkerBottomUp.visit(ExprWalker.java:113)
at org.apache.jena.sparql.expr.ExprVar.visit(ExprVar.java:90)
at 
org.apache.jena.sparql.expr.ExprWalker$Walker.visitExprFunction(ExprWalker.java:68)
at 
org.apache.jena.sparql.expr.ExprVisitorFunction.visit(ExprVisitorFunction.java:29)
at 
org.apache.jena.sparql.expr.ExprFunction2.visit(ExprFunction2.java:109)
at org.apache.jena.sparql.expr.ExprWalker.walk(ExprWalker.java:36)
at 
org.apache.jena.sparql.expr.ExprTransformer.transformation(ExprTransformer.java:62)
at 
org.apache.jena.sparql.expr.ExprTransformer.transformation(ExprTransformer.java:45)
at 
org.apache.jena.sparql.expr.ExprTransformer.transform(ExprTransformer.java:36)
at 
org.apache.jena.sparql.syntax.syntaxtransform.ElementTransformer$ApplyTransformVisitor.visit(ElementTransformer.java:159)
at 
org.apache.jena.sparql.syntax.ElementWalker$Walker.visit(ElementWalker.java:100)
at org.apache.jena.sparql.syntax.ElementBind.visit(ElementBind.java:68)
at 
org.apache.jena.sparql.syntax.ElementWalker$Walker.visit(ElementWalker.java:127)
at 
org.apache.jena.sparql.syntax.ElementGroup.visit(ElementGroup.java:120)
at 
org.apache.jena.sparql.syntax.ElementWalker.walk(ElementWalker.java:39)
at 
org.apache.jena.sparql.syntax.ElementWalker.walk(ElementWalker.java:33)
at 
org.apache.jena.sparql.syntax.syntaxtransform.ElementTransformer.applyTransformation(ElementTransformer.java:89)
at 
org.apache.jena.sparql.syntax.syntaxtransform.ElementTransformer.transformation(ElementTransformer.java:83)
at 
org.apache.jena.sparql.syntax.syntaxtransform.ElementTransformer.transformation(ElementTransformer.java:74)
at 
org.apache.jena.sparql.syntax.syntaxtransform.ElementTransformer.transform(ElementTransformer.java:66)
at 
org.apache.jena.sparql.syntax.syntaxtransform.ElementTransformer.transform(ElementTransformer.java:56)
at 
org.apache.jena.sparql.syntax.syntaxtransform.ExprTransformNodeElement.transform(ExprTransformNodeElement.java:67)
at 
org.apache.jena.sparql.expr.ExprFunctionOp.apply(ExprFunctionOp.java:98)
at 
org.apache.jena.sparql.expr.ExprTransformer$ApplyExprTransformVisitor.visit(ExprTransformer.java:161)
at 
org.apache.jena.sparql.expr.ExprFunctionOp.visit(ExprFunctionOp.java:97)
at 
org.apache.jena.sparql.expr.ExprWalker$Walker.visit(ExprWalker.java:92)
at 
org.apache.jena.sparql.expr.ExprWalker$WalkerBottomUp.visit(ExprWalker.java:113)
at 
org.apache.jena.sparql.expr.ExprFunctionOp.visit(ExprFunctionOp.java:97)
at org.apache.jena.sparql.expr.ExprWalker.walk(ExprWalker.java:36)
at 
org.apache.jena.sparql.expr.ExprTransformer.transformation(ExprTransformer.java:62)
at 
org.apache.jena.sparql.expr.ExprTransformer.transformation(ExprTransformer.java:45)
at 
org.apache.jena.sparql.expr.ExprTransformer.transform(ExprTransformer.java:36)
at 
org.apache.jena.sparql.syntax.syntaxtransform.ElementTransformer$ApplyTransformVisitor.transformExpr(ElementTransformer.java:286)
at 
org.apache.jena.sparql.syntax.syntaxtransform.ElementTransformer$ApplyTransformVisitor.visit(ElementTransformer.java:139)
at 
org.apache.jena.sparql.syntax.ElementWalker$Walker.visit(ElementWalker.java:84)
at 
org.apache.jena.sparql.syntax.ElementFilter.visit(ElementFilter.java:35)
at 
org.apache.jena.sparql.syntax.ElementWalker$Walker.visit(ElementWalker.java:127)
at 
org.apache.jena.sparql.syntax.ElementGroup.visit(ElementGroup.java:120)
at 
org.apache.jena.sparql.syntax.ElementWalker.walk(ElementWalker.java:39)
at 
org.apache.jena.sparql.syntax

[jira] [Created] (JENA-1124) Please add RDF.HTML constant

2016-01-27 Thread Holger Knublauch (JIRA)
Holger Knublauch created JENA-1124:
--

 Summary: Please add RDF.HTML constant
 Key: JENA-1124
 URL: https://issues.apache.org/jira/browse/JENA-1124
 Project: Apache Jena
  Issue Type: Improvement
  Components: Core
Affects Versions: Jena 3.0.1
Reporter: Holger Knublauch
Priority: Trivial


The vocabulary class for RDF seems to lack a constant for rdf:HTML.

There is also rdf:PlainLiteral, but that seems to be outdated with RDF 1.1?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Query parameterization.

2015-06-30 Thread Holger Knublauch

Hi Andy,

this looks great, and is just in time for the ongoing discussions in the 
SHACL group. I apologize in advance for not having the bandwidth yet to 
try this out from your branch, but this topic will definitely bubble up 
in the priorities soon...


I have not fully understood how the semantics of this are different from 
the setInitialBinding feature that we currently use in SPIN, and which 
seems to do a pretty good job. However, having a facility to do further 
pre-processing in advance may improve performance and provide a more 
formal definition of what setInitialBinding is doing. I am personally 
not enthusiastic about approaches based on text-substitution, so working 
on the parsed syntax tree looks good to me. There are some (rare) cases 
where text-substitution would be more powerful, e.g. dynamic path 
properties and some solution modifiers, but as you say no approach is 
perfect.


Questions:

- would this also pre-bind variables inside of nested SELECTs?
- I assume this can handle blank nodes (e.g. rdf:Lists) as bindings?
- What about bound(?var) and ?var is pre-bound?

Thanks
Holger


On 6/28/15 8:08 PM, Andy Seaborne wrote:

(info / discussion / ...)

In working on JENA-963 (OpAsQuery; reworked handling of SPARQL 
modifiers for GROUP BY), it was easier/better to add the code I had 
for rewriting syntax by transformation, much like the algebra is 
rewritten by the optimizer.  The use case is rewriting the output of 
OpAsQuery to remove unnecessary nesting of levels of "{}" which arise 
during translation for the safety of the translation.


Hence putting in package oaj.sparql.syntax.syntaxtransform, a general 
framework for rewriting syntax, like we have for the SPARQL+ algebra.


It is also capable of being a parameterized query system (PQ).  We 
already ParameterizedSparqlString (PSS) so how do they compare?


Work-in-progress:

https://github.com/afs/jena-workspace/blob/master/src/main/java/syntaxtransform/ParameterizedQuery.java 



PQ is a rewrite of a Query object (the template) with a map of 
variables to constants. That is, it works on the syntax tree after 
parsing and produces a syntax tree.


PSS is a builder with substitution. It builds a string, carefully 
(injection attacks) and is neutral as to what it is working with - 
query or update or something weird.
http://jena.apache.org/documentation/query/parameterized-sparql-strings.html 



Summary:

PQ is only for replacement of a variable in a template.
PSS is a builder that can do that as part of building.

PQ covers cases PSS doesn't - neither is perfect.

PSS works with INSERT DATA.
PQ would use the INSERT { ... } WHERE {} form.

Details:

PSS:
  Can build query, update strings and fragments
  Supports JDBC style positional parameters (a '?')
These must be bound to get a valid query.
Can generate illegal syntax.
  Tests the type of the injected value (string, iri, double etc).
  Has corner cases
 Looks for ?x as a string so ...
   "This is not a ?x as a variable"
   
   "SELECT ?x"
   ns:local\?x (a legal local part)
  Protects against injection by checking.
  Works on INSERT DATA.

PQ:
  Replaces SPARQL variables where identified as variables.
(no extra-syntax positional '?')
  Legal query to legal syntax query.
The query may violate scope rules (example below).
Not a query builder.
  Post parser, so no reparsing to use the query
(for large updates and queries)
  Injection is meaningless - can only inject values, not syntax.
  Can rewrite structurally: "SELECT ?x" => "SELECT  (:value AS ?x)"
which is useful to record the injection variables.
  Works with "INSERT {?s ?p ?o } WHERE { }"

PQ example:

  Query template = QueryFactory.create(.. valid query ..) ;
  Map map = new HashMap<>() ;
  map.put("y", ResourceFactory.createPlainLiteral("Bristol") ;
  Query query = ParameterizedQuery.setVariables(template, map) ;


A perfect system probably needs a "template language" which SPARQL 
extended with a new "template variable" which is only allowed in 
certain places in the query and must be bound before use.


Some examples of hard templates:

(1) Not variables:

"This is not a ?x as a variable"
ns:local\?x

(2) Some places ?x can not be replaced with a value directly.
   SELECT ?x { ?s ?p ?x }



A possible output is:
  SELECT  (:X AS ?x) { ?s ?p :X }
which is nice as it record the substitution but it fails when nested 
again.


SELECT ?x { {SELECT ?x { ?s ?p ?x } } ?s ?p ?o }

This is a bad query:
SELECT (:X AS ?x) { {SELECT (:X AS ?x) { ...

(3) Other places:
SELECT ?x { BIND(1 AS ?x) }
SELECT ?x { VALUES ?x { 123 } }

Andy




Re: Definition of SPARQL variable pre-binding

2015-06-16 Thread Holger Knublauch

On 6/16/2015 22:03, Osma Suominen wrote:
Here's a slightly relevant discussion about how to support something 
like pre-bound variables / parametrized queries in YASQE, a graphical 
SPARQL editor component in the YASGUI suite (and used by Fuseki among 
others): https://github.com/YASGUI/YASQE/issues/24


Thanks for the pointer.



I'm not sure I understand all the issues here very deeply, but it 
would seem useful to have a standard way of expressing and executing 
parametrized SPARQL queries, which could then be applied by YASQE and 
SHACL among others.


Indeed. Maybe the SHACL templates [1] could be one solution to that, 
assuming SHACL becomes a W3C standard. In the current draft you would 
specify a template as


ex:MyTemplate
a sh:Template ;
rdfs:label "My template" ;
rdfs:comment "Gets a list of all people born in a given country" ;
sh:argument [
sh:predicate ex:country ;
sh:valueType schema:Country ;
rdfs:comment "The country to get all people for" ;
] ;
sh:sparql """
SELECT ?person
WHERE {
?person ex:bornIn ?country .
} """ ;
.

This structure provides enough metadata to drive user interfaces, e.g. 
input forms where users select a country from a list. The semantics in 
the current draft are that variables become pre-bound (ex:country -> 
?country). This approach has the advantage that each query can be 
instantiated as a naturally valid RDF instance, e.g.


ex:ExampleQuery
a ex:MyTemplate ;
ex:country ex:Germany .

This can then be used as a high level language for all kinds of query 
calls as constraints, rules or whatever - experts can prepare the SPARQL 
while end users just fill in the blanks.


The semantics are intended to be like inserting a VALUES clause into the 
"beginning" of the query, i.e. they wouldn't be visible in sub-selects 
etc. In contrast to text-substitution algorithms, this also makes sure 
that queries are always syntactically valid and can be pre-compiled.


Holger

[1] http://w3c.github.io/data-shapes/shacl/#templates



-Osma




On 16/06/15 12:51, Andy Seaborne wrote:

On 16/06/15 09:33, Holger Knublauch wrote:

Thanks, Andy.

On 6/16/15 6:03 PM, Andy Seaborne wrote:

On 16/06/15 04:20, Holger Knublauch wrote:

Hi,

(this question is motivated by the ongoing Data Shapes WG, but I 
don't

speak on their behalf).


Ptr?

http://w3c.github.io/data-shapes/shacl/

esp http://w3c.github.io/data-shapes/shacl/#sparql-constraints-prebound

http://www.w3.org/2014/data-shapes/track/issues/68


Thanks.








Jena and other APIs such as Sesame support the concept of pre-binding
variables prior to SPARQL execution, using
QueryExecution.setInitialBinding(). This is convenient to reuse
parameterized queries, especially with blank nodes.

Question: is there any formal basis of this functionality,
formulated so
that it can be implemented by other platforms too? I can see that it
populates the original bindings that are passed through the algebra
objects, but what would be the best way to explain this by means of
concepts from the SPARQL 1.1 spec?

Thanks
Holger



There are two possible explanations - they are not quite the same.

1/ It's a substitution of a variable for a value execution. This is
very like parameterized queries. It's a pre-execution step.


Do you mean syntactic insertion like the ParameterizedQuery class? This
would not support bnodes, and the shapes and focus nodes of a SHACL
constraint will frequently be bnodes. It should also avoid repeated
query parsing, for performance reasons it would be better to operate on
Query objects and their general equivalents (Algebra objects).


Substitution does not have to be in syntax - it's rewriting the AST with
the real, actual bnode.


2/ VALUES

There is a binding as a one row VALUES table and it's join'ed into the
query as usual.


I guess inserting a VALUES clause into the beginning would work, but
then again what about bnodes? I guess instead of the VALUES keyword (as
a string), it would need to rely on the equivalent algebra object?

Just to be clear, this only needs to work in local datasets, not
necessarily with SPARQL endpoints where all we have is a http string
interface. I am looking for a couple of sentences that would provide a
generic implementation strategy that most SPARQL engines either already
have, or could easily add to support SHACL.

Thanks
Holger



Firstly - I'm talking about principles and execution, not syntax. VALUES
is the way to get a data table into a SPARQL execution.
setInitialBinding happens after parsing - injecting the preset row into
execution.

The real (first) issue with blank nodes isn't putting them back in a
query; it's getting them in the first place.

As soon as a blank node is serialized in all W3C formats (RDF, any
SPARQL results), it isn't the same bl

Re: Definition of SPARQL variable pre-binding

2015-06-16 Thread Holger Knublauch

Thanks, Andy.

On 6/16/15 6:03 PM, Andy Seaborne wrote:

On 16/06/15 04:20, Holger Knublauch wrote:

Hi,

(this question is motivated by the ongoing Data Shapes WG, but I don't
speak on their behalf).


Ptr?

http://w3c.github.io/data-shapes/shacl/

esp http://w3c.github.io/data-shapes/shacl/#sparql-constraints-prebound

http://www.w3.org/2014/data-shapes/track/issues/68






Jena and other APIs such as Sesame support the concept of pre-binding
variables prior to SPARQL execution, using
QueryExecution.setInitialBinding(). This is convenient to reuse
parameterized queries, especially with blank nodes.

Question: is there any formal basis of this functionality, formulated so
that it can be implemented by other platforms too? I can see that it
populates the original bindings that are passed through the algebra
objects, but what would be the best way to explain this by means of
concepts from the SPARQL 1.1 spec?

Thanks
Holger



There are two possible explanations - they are not quite the same.

1/ It's a substitution of a variable for a value execution.  This is 
very like parameterized queries. It's a pre-execution step.


Do you mean syntactic insertion like the ParameterizedQuery class? This 
would not support bnodes, and the shapes and focus nodes of a SHACL 
constraint will frequently be bnodes. It should also avoid repeated 
query parsing, for performance reasons it would be better to operate on 
Query objects and their general equivalents (Algebra objects).





2/ VALUES

There is a binding as a one row VALUES table and it's join'ed into the 
query as usual.


I guess inserting a VALUES clause into the beginning would work, but 
then again what about bnodes? I guess instead of the VALUES keyword (as 
a string), it would need to rely on the equivalent algebra object?


Just to be clear, this only needs to work in local datasets, not 
necessarily with SPARQL endpoints where all we have is a http string 
interface. I am looking for a couple of sentences that would provide a 
generic implementation strategy that most SPARQL engines either already 
have, or could easily add to support SHACL.


Thanks
Holger



Differences in these viewpoints can occur in nested patetrns - 
sub-queries (you can have different variables with the same name - a 
textual substitution viewpoint breaks that) and OPTIONALs inside 
OPTIONALs (bottom up execution is not the same as top down execution).


This has existed in ARQ for a very long time.  ARQ actually takes the 
initial binding and seeds the execution from there so it's like (2) 
but not exactly; it does respect non-projected variables inside nested 
SELECTS; it does not complete respect certain cases of 
OPTIONAL-inside-OPTIONAL.


Andy





Definition of SPARQL variable pre-binding

2015-06-15 Thread Holger Knublauch

Hi,

(this question is motivated by the ongoing Data Shapes WG, but I don't 
speak on their behalf).


Jena and other APIs such as Sesame support the concept of pre-binding 
variables prior to SPARQL execution, using 
QueryExecution.setInitialBinding(). This is convenient to reuse 
parameterized queries, especially with blank nodes.


Question: is there any formal basis of this functionality, formulated so 
that it can be implemented by other platforms too? I can see that it 
populates the original bindings that are passed through the algebra 
objects, but what would be the best way to explain this by means of 
concepts from the SPARQL 1.1 spec?


Thanks
Holger



ASKWHERE etc

2015-04-22 Thread Holger Knublauch
I just noticed that Jena successfully parses query strings such as 
"ASKWHERE {}" and "SELECTDISTINCT* WHERE{}". Looking at the Grammar in 
the SPARQL 1.1 spec, shouldn't there be a whitespace character somewhere?


Thanks
Holger



Re: JSON-LD reader not reading namespace prefixes

2014-05-22 Thread Holger Knublauch

Thanks everyone, fix seems to work fine!

Holger


On 5/23/2014 0:16, Rob Vesse wrote:

I figured out the problem, it was two-fold:

1 - We weren't passing the options correctly to the JsonLDReader so
namespaces weren't extracted
2 - JsonLDWriter passed the default namespaces in such a way that it
wouldn't round trip

Should now be fixed in trunk and the tests are updated to include checking
namespace round tripping

Rob

On 22/05/2014 14:06, "Rob Vesse"  wrote:


Huh that is strange, certainly when I applied the equivalent fix to their
code on GitHub I did get prefixes back out

I will take a look at what's going on


Rob

On 22/05/2014 13:29, "Andy Seaborne"  wrote:


Rob,

The fix looks exactly right but when I tried it I didn't get an prefixes.

Putting a breakpoint in JsonLDReader, I don't see the dataset (from
jsonld-java) having any namespaces set (the entrySet exists and is
empty).

The @context is making the URIs come out right, but nothing in the given
com.github.jsonldjava.core.RDFDataset object. (using v0.4 of
jsonld-java).

Did you have a working example?

Andy

 public static void main(String... argv) throws Exception {
// From TR/json-ld spec, example 19.
 String x = StrUtils.strjoinNL
 ("{ \"@context\":"
  ,"{ \"foaf\": \"http://xmlns.com/foaf/0.1/\"; } ,"
  ,"  \"@type\": \"foaf:Person\" ,"
  ,"  \"foaf:name\": \"Rob\""
  ,"}"
  ) ;

 System.out.println(x) ;

 Model m = ModelFactory.createDefaultModel() ;
 RDFDataMgr.read(m, new StringReader(x), null, Lang.JSONLD) ;
 System.out.println("Prefix map:"+ m.getNsPrefixMap()) ;
 System.out.println() ;
 RDFDataMgr.write(System.out, m, Lang.TTL) ;
}













Re: [VOTE] Release jena 2.11.0

2013-09-12 Thread Holger Knublauch
My vote is [0] - I wouldn't want to hold up the release but there is an 
unresolved ticket with regards to the bulk update handler from our 
perspective. We will need to patch our copy of Jena so that it still 
uses the BulkUpdateHandler instead of the GraphUtil methods, because 
otherwise SDB performance will suffer which would badly impact our 
customer's experience.


Holger


On 9/12/2013 23:36, Andy Seaborne wrote:

Hi,

Here is a vote on a release build for Jena 2.11.0

Everyone, not just committers and PMC members, is invited to test and 
vote. (We do need at least 3 PMC +1's.)


Versions:

apache-jena  2.11.0 (the combined  distribution)
apache-jena-libs 2.11.0 (the maven artifact for the core libraries)
jena-fuseki 1.0.0  (separate binary)

including the first releases of:

jena-text1.0.0
jena-spatial 1.0.1
jena-security2.11.0
jena-jdbc1.0.0

Staging repository:
https://repository.apache.org/content/repositories/orgapachejena-035/

Proposed dist/ area:
http://people.apache.org/~andy/jena-2.11.0-RC/

Keys:
https://svn.apache.org/repos/asf/jena/dist/KEYS

SVN tag:
https://svn.apache.org/repos/asf/jena/tags/jena-2.11.0/

Please vote to approve this release:

  [ ] +1 Approve the release
  [ ]  0 Don't care
  [ ] -1 Don't release, because ...

This vote will be open to the end of

Monday 16/September at 23:59 UTC
(96 hours from the same hour tonight UTC).

Andy


Checking needed:

is the GPG signature fine?
is there a source archive?
can the source archive really be built?
is there a correct LICENSE and NOTICE file in each artifact
  (both source and binary artifacts)?
does the NOTICE file contain all necessary attributions?
check the dependencies.
do all the tests work?
if there is a tag in the SCM, does it contain reproducible sources?




Re: Initial Bindings in Update

2013-08-21 Thread Holger Knublauch

Many thanks, Stephen. I have tested this and it appears to work nicely!

Holger


On 8/21/2013 7:32, Stephen Allen wrote:

I've created a patch and attached to JENA-516, which adds back initial
binding support for update queries.  It will only work for
INSERT/DELETE/WHERE and DELETE/WHERE queries.

Please take a look at it, and test it.  If it looks good, I can check it in.

-Stephen


On Wed, Aug 14, 2013 at 7:52 AM, Andy Seaborne  wrote:

List readers : It would be good to hear from organisations that plug in
and provide their own query engines).


Hi Stephen,


On 14/08/13 01:19, Stephen Allen wrote:

I don't believe there is anything necessarily holding back the addition
an
initial binding to the existing UpdateEngine implementations.  It could
be
carried out similarly to QueryEngineBase and then activate only for
DELETE/INSERT/WHERE
and DELETE WHERE query operations.

However, I think the whole design we currently have is not a very good
one.
   It is the wrong level for it.  It meant (until Andy changed it) that
the
substitution occurred after query optimization, which is not really
expected.  It also adds a burden on all QueryEngine/UpdateEngine
implementations
to provide the substitution logic.

I think the better way to do it would be to apply the substitution on
Query
and Update algebra objects before they are passed into the
QueryEngine/UpdateEngine, and remove all initial binding handling from
the
engines.  Users can create a Query or Update objects and call
Substitute.substitute()
themselves (alternatively we can add a helper method to Query,
UpdateModify,
and UpdateDeleteWhere that does the same thing).  This has the added
benefit of removing the ambiguity surrounding which Update types the
binding applies to.


I agree that things could be cleaner - it's the passage of time.

Are you proposing a change to the public API?  If so , what exact because
I don't see how it would work.

Substitute.substitute works on the algebra (we could add operations to
work on queries but that's not how it is currently; operations on Update by
syntax can't be done because of the possibility of streaming).

Query objects do not have an algebra object - this does not happen until
inside QueryEngineBase.  There is no presumption that the algebra as
provided by ARQ will ever be involved.  Algebra creation happens in the
constructor of QueryEngineBase.

QueryExecutionBase holds the initial binding and passes it to the query
engine factory chosen (to produce a Plan - QueryEngineFactory is not
required to use QueryEngineBase).

The binding does still need to be passed the execution because it must
come out the other side of pattern matching for use in template
substitution.  It's easier to send it via the query pattern solutions than
add an addition route for the binding to get to the templating in CONSTRUCT
or an update.

If there is no initial binding set by the app, a root one is passed in
(one row, no solutions), and that is necessary to kick of the execution.  It
forms the common root of all results.

At the moment, in trunk, the creation of the plan does the
Substitute.substitute before calling the optimizer at modifyOp.

queryOp is set in the constructor by QueryEngineBase.createOp (an
extension point).

protected Plan createPlan()
{
 // Decide the algebra to actually execute.
 Op op = queryOp ;


 if ( ! startBinding.isEmpty() ) {
 op = Substitute.substitute(op, startBinding) ;
 context.put(ARQConstants.sysCurrentAlgebra, op) ;
 // Don't reset the startBinding because it also is
 // needed in the output.
 }

 // Optimization happens here.
 op = modifyOp(op) ;

 ...
 evaluate(op, dataset, startBinding, context) ;
 ...




It also moves the feature out of the core API that 3rd party engine
developers need to implement, and that is not relevant in  many
situations (Fuseki).


We may wish to add initial bindings to the protocol.  They work well with
query-by-reference.

So I think it is a feature that Fuseki might wish to use. The cost in
execution is currently one call to "startBinding.isEmpty()".  The common
root binding is always there.



The work involved in this would be to deprecate initial binding support
from QueryEngineBase, QueryExecutionFactory, etc, and then also adding
two
new static methods to Substitute do the substitution of UpdateModify and
UpdateDeleteWhere objects.


I can only see how to do this if it's query->query rewrite, not
algebra->algebra because queries don't have an algebra that early.
Update->update rewrite and streaming don't fit together.

     Andy


-Stephen


On Mon, Aug 12, 2013 at 9:41 PM, Holger Knublauch
wrote:


Hi Andy,

this is in response to the parallel thread. I honestly wasn't aware that
there remain open issues with UPDATE and thought that the best solution
was
to bring back initial bindings. Your email seems to sta

Re: [Discuss] Migrate to git?

2013-08-13 Thread Holger Knublauch
FWIW our company has migrated to Git(Hub) a couple of years ago and after the 
initial transition phase it has certainly improved things a lot compared to 
SVN. If possible, I would recommend looking into preserving the commit history 
from SVN - the commit comments alone contain valuable explanations.

Holger


On Aug 13, 2013, at 6:35 PM, Andy Seaborne wrote:

> I would like to migrate to git.
> 
> 1/ For people contributing to the project, I think it makes it easier for 
> them to work using clone-branch-push/patch [1]
> 
> 2/ For the project, I think it makes it easier for us to work on and record 
> what has been changed by using the light-weight branching, rather than just 
> commits to trunk.
> 
> Many changes are small, for example, JIRA fixes.  We don't seem to use svn 
> branches; my experience of them is that they are cumbersome and not worth the 
> cost for small changes.
> 
> 3/ We're already receiving git style patches.
> 
>   Andy
> 
> [1] "git request-pull"
> 
> On 11/08/13 17:36, Claude Warren wrote:
>> I would like to stay on SVN.
>> 
>> 
>> On Sun, Aug 11, 2013 at 5:17 PM, Andy Seaborne  wrote:
>> 
>>> What are everyone thoughts on migrating to Apache git as the primary SCM
>>> for Jena?
>>> 
>>> There is already a mirror on github: https://github.com/apache/jena
>>> 
>>> And there is
>>> 
>>> git://git.apache.org/jena.git
>>> 
>>> - I don't know what that is; some sort of mapping from svn (live?
>>> mirrored?)
>>> 
>>> The migration process isn't zero-work, judging by reading around other
>>> projects.
>>> 
>>> 1/ It takes time and the svn repo is read-only for a period.
>>>We'll need to make sure a release isn't likely.
>>> 
>>> 2/ The build process doc needs updating (and checking!)
>>> 
>>> 3/ The web site needs updating
>>> 
>>> 4/ We all have to change our personal workflows.
>>> 
>>> 5/ It's not github.
>>> 
>>> The website stays in SVN (pubsub and CMS only work with SVN currently).
>>> 
>>> Andy
>>> 
>>> PS
>>> http://stackoverflow.com/**questions/6235379/how-to-send-**
>>> pull-request-on-git
>>> 
>>> 
>> 
>> 
> 



Re: Initial Bindings in Update

2013-08-12 Thread Holger Knublauch

Hi Andy,

this is in response to the parallel thread. I honestly wasn't aware that 
there remain open issues with UPDATE and thought that the best solution 
was to bring back initial bindings. Your email seems to state that


INSERT DATA { ?param1 foaf:name ?param2 }

is problematic because it's not valid syntax. But why is this a show 
stopper to restoring initial bindings for other updates? Isn't this just 
a matter of expectation management? Of course any query that takes 
initial bindings must also be valid syntactically - if the template 
query is invalid then the template with bindings is also invalid.


This may be one of the advantages of the parameterized SPARQL strings - 
it allows cases such as above because it sits before the string 
compiler. Another advantage of that is that magic properties (property 
functions) can be substituted while initial bindings don't recognize 
those. But this should be in the hand of the developer to decide which 
mechanism to use - the parameterized strings have their own 
disadvantages and limitations too.


In a nutshell, I remain of the opinion that corner cases such as INSERT 
DATA should not lead to dropping an otherwise useful feature for the 
other 99% of use cases (we never use INSERT DATA for example, but have 
hundreds of other UPDATEs). It is inconsistent to support initial 
bindings for Query but not for Update.


Thanks,
Holger


On 8/10/2013 7:19, Andy Seaborne wrote:

On 08/08/13 00:02, Rob Vesse wrote:


On a related topic I looked at Holger's question around injecting
BNodes into SPARQL updates via ParameterizedSparqlString and it
doesn't work in the scenario he describes (an INSERT WHERE) if the
variable is used both in the INSERT template and WHERE since the
template mention is treated as a minting a fresh BNode.  Either he
needs to use the BIND workaround discussed by yourself in another
thread (http://markmail.org/message/3lsnjq7yca4es2wb) which I suspect
is not workable for TQ OR we need to look at restoring initial
bindings for updates.


BIND will work - this is way it works for CONSTRUCT.

All Holger's examples are WHERE based.


I think restoring the feature is going to be the best option, the
documentation just needs to be really clear that initial bindings
only apply to the WHERE portion of updates and not more generally
since that is the only way they were used prior to the feature being
 removed (I went back and looked at the ARW 2.9.4 code).  We can
always look at expanding their scope later as we've discussed in the
 past.


Just to be clear - this is restoring the WHERE based functionality for
update.

I think that is the way forward.  I'd like to hear from Stephen about
the implications for streaming but, superficially at least, it does not
look too bad. The initial binding is applied to each DELETE/INSERT/WHERE
(and DELETE WHERE?), when it is evaluated as a query.  The streaming is
most critical in the DATA parts.

There is a separate issue about <_:id> and anything reserialized and 
sent over the network.


Andy




Re: Initial Bindings in Query Evaluation

2013-08-07 Thread Holger Knublauch

On 8/8/2013 9:27, Rob Vesse wrote:

The way that past versions of ARQ implemented initial bindings for updates
is that the initial bindings were only fed into the WHERE portion of
queries.  Also this was not done via substitution., any restoration of
initial bindings of updates would likely need to not use substitution.
Although this may be moot because I don't know whether ARQ applies query
optimization to WHERE portions of updates.

This meant that the variable having a value injected still existed in both
the WHERE and template portions so if a blank node value is bound it is as
if the value is simply copied from the WHERE portion to the template
portion.  Whereas with the ParameterizedSparqlString the behavior is
substitution for a blank node so the template portion has a blank node in
place of a variable hence generation of a new blank node rather than
copying an existing one.


Good - thanks for clarifying.

Holger



Re: Initial Bindings in Query Evaluation

2013-08-07 Thread Holger Knublauch

On 8/8/2013 9:02, Rob Vesse wrote:

Sorry I am snowed under with work atm.


Sorry me too - working towards a release.


   I have not tested but I would be
tempted to enable and change the tests so that they reflect Holger's
original regression test case.  This would realign the behavior with
previous versions of ARQ and fix the regression


On a related topic I looked at Holger's question around injecting BNodes
into SPARQL updates via ParameterizedSparqlString and it doesn't work in
the scenario he describes (an INSERT WHERE) if the variable is used both
in the INSERT template and WHERE since the template mention is treated as
a minting a fresh BNode.  Either he needs to use the BIND workaround
discussed by yourself in another thread
(http://markmail.org/message/3lsnjq7yca4es2wb) which I suspect is not
workable for TQ OR we need to look at restoring initial bindings for
updates.

I think restoring the feature is going to be the best option, the
documentation just needs to be really clear that initial bindings only
apply to the WHERE portion of updates and not more generally since that is
the only way they were used prior to the feature being removed (I went
back and looked at the ARW 2.9.4 code).  We can always look at expanding
their scope later as we've discussed in the past.


I would appreciate restoring this. I don't understand the last section 
though - the scenario with pre-bound variables in the INSERT *did* work 
in 2.9.2. It will be hard to explain to users that INSERT behaves 
differently from CONSTRUCT. Furthermore, most examples of updates in our 
code base either take URIs or bnodes as parameters, and if I understand 
the limitation correctly then I would need to rewrite them all to copy 
any variable with another BIND.


Thanks,
Holger



Re: Initial Bindings in Query Evaluation

2013-08-05 Thread Holger Knublauch

On 8/5/2013 18:47, Andy Seaborne wrote:


How does TopQuadrant use initial bindings with update?  Please give some
concrete examples.


A reasonably representative random choice from our SPARQL Web Pages library:

---
Input: ?this
INSERT {
?facet a search:PropertyFacet .
?facet search:property ?property .
?facet search:facetIndex 
?facetIndex .

?this search:facet ?facet .
}
WHERE {
BIND 
((COALESCE(spl:objectCount(?this, search:facet), 0) + 1) AS ?facetIndex) .

BIND (BNODE() AS ?facet) .
}
---
Input: ?resource
INSERT {
GRAPH ui:tempGraph {
swa:DestructorMetadata swa:delete ?child .
} .
}
WHERE {
?child (rdfs:subClassOf)* ?resource .
}
---
Input: ?resource
DELETE {
GRAPH ui:tempGraph {
swa:DestructorMetadata swa:delete ?child .
} .
}
WHERE {
?child (rdfs:subClassOf)+ ?resource .
?child (rdfs:subClassOf)+ ?other .
FILTER NOT EXISTS {
GRAPH ui:tempGraph {
swa:DestructorMetadata ?either ?other .
} .
} .
}
---
Input: ?label, ?labelLang, ?uri
INSERT {
?class a ?resourceType .
?class rdfs:label ?prefLabel .
?class rdfs:subClassOf ?contextResource .
}
WHERE {
BIND (IRI(str(?uri)) AS ?class) .
BIND (IF(bound(?labelLang), STRLANG(?label, 
?labelLang), ?label) AS ?prefLabel) .

}
---
Input: ?search, ?searchGraph
INSERT {
GRAPH ?searchGraph {
?search a search:Search .
?search search:rootType ?rootType .
?search search:queryGraph ?queryGraph .
?search swa:loadId ?id .
} .
}
WHERE {
?search search:rootType ?rootType .
}
---
Input: ?searchGraph
DELETE WHERE {
GRAPH ?searchGraph {
?s ?p ?o .
} .
}
---
Input: ?searchGraph, ?first
INSERT {
GRAPH ?sessionGraph {
?resource ?p ?o .
} .
}
WHERE {
?first ?p ?o .
FILTER NOT EXISTS {
( ?rs 0 ) spr:colCells ?other .
FILTER ((?other != ?first) && 
NOT EXISTS {

?other ?p ?o .
}) .
} .
}
---
Input: ?this, ?targetGraph    This is the one that I am 
currently stuck with

INSERT {
GRAPH ?targetGraph {
?this ?p ?o .
} .
}
WHERE {
?this ?p ?o .
}

Not sure if this is what you are looking for.

Holger



Re: Initial Bindings in Query Evaluation

2013-08-04 Thread Holger Knublauch

On 8/3/2013 9:50, Rob Vesse wrote:
Side Note - Initial bindings for updates was removed because it was a 
barrier to streaming updates 
(http://markmail.org/message/bazwh2exmcc5vmoh). Also as others noted 
in the discussion there initial bindings is a little murkier for 
updates since does it apply only to WHERE clauses, to all portions of 
requests, etc? Keeping the API as-is is always an option, if this ends 
up being the preference of the community then we definitely need to 
improve the documentation to note that there can be unintended 
interactions with other parts of the query engine such as the 
optimizer when initial bindings are used. 


Yet I believe the handling of this removal for UPDATEs was a bit rushed. 
I don't have the whole background, so apologies if I miss something 
obvious, but it sounds like there was no real deprecation cycle because 
it would have complicated some *modes* of using UPDATEs (streaming, 
remote). However, these are just some modes among others, and it might 
have been possible to document it away for those modes, and throw an 
exception if someone uses a streaming update with initial bindings. 
Right now the API is inconsistent because bindings are still present for 
Queries, and all our usages of initial bindings with UPDATEs would have 
continued to work. We were unfortunately unable to upgrade to newer Jena 
versions earlier, because there have been some other show stopper bugs 
in the code. I am using the SNAPSHOT now to try to detect those before a 
release, and may have more reports in the next couple of days as I run 
through manual and automated test scenarios.


What you are talking about sounds much more like cached execution 
plans in SQL. I understand the analogy of a SPARQL query to a function 
but SPARQL variables were not intended to be function arguments, that 
you choose to treat them as such and that initial bindings lets you 
treat them as such is a perhaps unintended consequence of ARQ's API. 


Yep, unintended things are often the best, because they open new doors. 
You may not be aware of how central SPARQL (and this feature) has become 
for our software stack and commercial product portfolio. There are 
literally thousands of SPARQL queries with varying complexity in our 
products, and they all execute within contexts of pre-bound variables. 
Even things like user interfaces are constructed with the help of 
running hundreds of queries per page request, so speed is crucial and 
has always been good enough.


So, whatever change is made, I would strongly favor a solution that 
preserves the semantics and doesn't negatively affect performance (of 
running many small queries).


Thanks
Holger



[jira] [Updated] (JENA-500) SPARQL optimizer does not consider pre-bound variables, creates wrong query

2013-08-04 Thread Holger Knublauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holger Knublauch updated JENA-500:
--

Attachment: TestPreboundFilter.java

Test case

> SPARQL optimizer does not consider pre-bound variables, creates wrong query
> ---
>
> Key: JENA-500
> URL: https://issues.apache.org/jira/browse/JENA-500
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: Jena 2.10.2
>Reporter: Holger Knublauch
> Attachments: TestPreboundFilter.java
>
>
> There is a regression bug somewhere between 2.7.2 and 2.10.2.
> See test case which works green in the old version but fails on the
> latest snapshot. I believe the cause is that
> TransformFilterImplicitJoin.testSpecialCaseUnused() incorrectly assumes
> it can ignore the whole query if it encounters unbound variables.
> However, in my case the variables do have values, coming from the
> initial binding of the QueryExecution. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (JENA-500) SPARQL optimizer does not consider pre-bound variables, creates wrong query

2013-08-04 Thread Holger Knublauch (JIRA)
Holger Knublauch created JENA-500:
-

 Summary: SPARQL optimizer does not consider pre-bound variables, 
creates wrong query
 Key: JENA-500
 URL: https://issues.apache.org/jira/browse/JENA-500
 Project: Apache Jena
  Issue Type: Bug
  Components: Optimizer
Affects Versions: Jena 2.10.2
Reporter: Holger Knublauch


There is a regression bug somewhere between 2.7.2 and 2.10.2.
See test case which works green in the old version but fails on the
latest snapshot. I believe the cause is that
TransformFilterImplicitJoin.testSpecialCaseUnused() incorrectly assumes
it can ignore the whole query if it encounters unbound variables.
However, in my case the variables do have values, coming from the
initial binding of the QueryExecution. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Initial Bindings in Query Evaluation

2013-08-02 Thread Holger Knublauch

Folks,

all I did was report a bug, and your response is to delete the whole 
feature! This would completely throw out the baby with the bathwater and 
may mean that we at TopQuadrant have to branch off to our own Jena 
version when this happens.


I would be in favor of this option

5) Keep the API as it is and restore initial bindings for UPDATE as well.

It was IMHO a mistake to switch to parameterized queries. We make very 
heavy use of initial bindings throughout our software stack, SPIN, SWP, 
SPARQLMotion. They are a great feature, allowing users to treat SPARQL 
like a programming language (think about a function call, where the 
parameters are different each time, yet the developer only needs to 
write the logic once).


From what I understand so far, parameterized SPARQL has performance 
issues - we would need to re-parse the string while with initial 
bindings we can reuse the same (compiled) Query object each time. 
Parameterized queries also don't support the bound(?var) operator, and 
probably others.


We have used initial bindings successfully for many years, and although 
there have been occasional bugs (I probably reported one such bug per 
year) it is a very essential feature from our point of view. If the 
overhead of fixing the optimizer is too big, I would be OK with 
switching off this optimizer if initial bindings are present. It worked 
fine without this optimizer for many years.


HTH
Holger


On 8/3/2013 2:52, Rob Vesse wrote:

Hi All

Holger's question 
(http://mail-archives.apache.org/mod_mbox/jena-users/201308.mbox/%3c51fb8b52.7070...@knublauch.com%3e)
 about a regression in ARQs treatment of initial bindings raises an interesting 
disconnect between the interpretation of SPARQL and the Initial Bindings API.

Initial bindings in their current form allows for users to essentially change 
the semantics of a query in a non-intuitive way.  Take his example query:

ASK { FILTER(?a = ?b) }

Intuitively that query MUST always return false yet with initial bindings in 
the mix the query can be made to return true, at least prior to 2.10.2 which 
introduces a new optimizer which includes special case recognition for this.

The problem is that using initial bindings can fundamentally change the 
semantics of queries in non-intuitive ways when I believe the intention of the 
API was merely to allow for improved performance by guiding the engine.

To me this suggests that initial bindings as currently implemented is 
fundamentally flawed and I would suggest that we think about re-architecting 
this feature in a future release (not the next release).  I believe there are 
probably several ways of doing this:

1 – Remove support for initial bindings on queries entirely (as we already did 
for updates) in favor of using ParameterizedSparqlString

2 – Change initial bindings to be a pre-optimization algebra transformation of 
the query

As we've discussed previously in the context of ParameterizedSparqlString there 
is potential to do the substitution at the algebra tree level rather than at 
the textual level.  This allows for stronger syntax checking and actually 
changes the query appropriately.  The problem with this is that it doesn't work 
if we want to inject multiple values for a variable, hence Option 3

3 – Change initial bindings to be done by injection of VALUES clauses

This approach is again by algebra transform and would involve inserting VALUES 
clauses at each leaf of the algebra tree.  So Holger's query with initial 
bindings applied would be rewritten like so:

ASK
{
   VALUES ( ?a ?b ) { ( true true ) }
   FILTER (?a = b)
}

However this approach might get rather complex for larger queries and also runs 
into issues of scope, what if we insert the VALUES clause inside of a sub-query 
which doesn't propagate those initial bindings outside of it etc.

4 – Skip optimization when initial bindings are involved

This is the easiest approach but we can't enforce this on other query engine 
implementations and it could seriously harm performance for those that use 
initial bindings extensively.

There may also be other approaches I haven't thought so please suggest anything 
that makes sense.  Bottom line is that initial bindings in its current form 
seems fundamentally broken to me and we should be thinking of how to fix this 
in the future.

Rob





Re: [DISCUSS] SDB future

2013-06-06 Thread Holger Knublauch

On 6/5/2013 22:13, Andy Seaborne wrote:

On 29/05/13 03:27, Holger Knublauch wrote:

At TopQuadrant we do use and recommend SDB for enterprise solutions,
mainly due to the fact that customers can rely on their existing SQL
infrastructure. Performance is not a direct issue for us because we
apply a caching layer on top of it (currently a home-grown in-memory
cache, but in the future possibly TDB). We do have a growing
installation base of successful deployments (TopBraid EVN) and customers
seem to be happy. Having to rely on commercial alternatives would affect
the overall price tag of our solutions, and having an open source
solution that seamlessly works with Jena is a great asset.


What is cached?


We put a GraphMem over each SDB, and this GraphMem remembers (i.e. 
copies) all triples that have already been queried. So if someone 
queried (ex:A ?p ?o) then we download all those triples. Then, more 
specific queries against (ex:A rdfs:label ?o) can already be answered by 
the cache alone. Writes are passed through into the SDB. In TopBraid 
EVN, where models are typically at most tens of millions of triple 
large, we just cache all at the start of the server, and use SDB as the 
master backup only. This works well for our use cases, and feels safer 
than TDB (where it used to be easy to corrupt data, at least in the 
older versions; we haven't upgraded for a while yet it's easier to 
convince IT departments of relying on their SQL infrastructure than some 
file drives).


Holger



Re: [DISCUSS] SDB future

2013-05-28 Thread Holger Knublauch
At TopQuadrant we do use and recommend SDB for enterprise solutions, 
mainly due to the fact that customers can rely on their existing SQL 
infrastructure. Performance is not a direct issue for us because we 
apply a caching layer on top of it (currently a home-grown in-memory 
cache, but in the future possibly TDB). We do have a growing 
installation base of successful deployments (TopBraid EVN) and customers 
seem to be happy. Having to rely on commercial alternatives would affect 
the overall price tag of our solutions, and having an open source 
solution that seamlessly works with Jena is a great asset.


I have brought up the topic of this thread in our management to see 
whether we can allocate any resources to its future, but I cannot report 
any decision at this stage.


Thanks,
Holger



On 5/28/2013 0:43, Simon Helsen wrote:

Andy, others,

we do not use SDB because it is way too slow for us. Although I'm sure it
can be improved as you suggested below, we do not believe it will ever
come close to TDBs performance because of how SDB is designed. The fact it
is used at all keeps surprising me, but it probably doesn't matter for
simple use cases especially if the dataset remains small. Btw, a little
while back we were reconsidering it because SDB supports multiple vendors
and our use-case was not very performance sensitive, but it turned out
that it was still too slow for our needs

As for motivations why some people may prefer SDB over TDB, I don't think
it is just "SQL" and corporate acceptance. There are some very good
reasons why file-based systems like TDB are difficult to use in commercial
deployments. Corporate Java-based server deployments are almost always
based on one or more app servers (JEE, but sometimes just a regular web
server) where *all* persisted state goes into a relational database.
Storing state on the file system is generally taboo for many reasons
including: The inability to cluster the app server - a critical step to
scale up beyond a few hundred users; the fact that organizations generally
have a hard time understanding and managing file system state as opposed
to standard relational database management. For instance, how do you
perform online backups and when and how does the state corrupt. For
instance, on a DB server, admins are watchful for running out of disk
space. This sort of monitoring is usually less critical on the app server,
but now, suddenly, there is this growing "thing" they have to backup
because it will corrupt if you run out of disk space. On top of that, DB
servers usually use very fast and expensive disk systems (RAIDs, SSDs,
etc.) This is usually not the case for the app server. On top of that,
when customers realize there is this large set of data on the file system,
they have a tendency to put it on larger disks connected via NFS, which is
unfortunately very dangerous because even short network glitches can
corrupt TDB. All of this is manageable if you carefully encapsulate TDB
and provide good administration tools on top of your system, but it is not
trivial, and it doesn't come out of the box. Especially cluster management
is quite tricky as you can imagine.

Just wanted to set that out there as to why SQL-based systems remain
attractive. I would advise though to more clearly state on the SDB
download page that SDB is deprecated and no longer actively supported
(unless that changes of course)

Simon



From:
Andy Seaborne 
To:
dev@jena.apache.org,
Date:
05/25/2013 06:19 AM
Subject:
[DISCUSS] SDB future



Yes - I'm conflicted as well, flip-flopping between opt1 and opt3.

There is enough user@ traffic to suggest it's used.  I'm guessing it's
the "SQL" part makes it easier in corp IT. TDB is faster, scales better
and is better supported (and I'm not corp IT bound).

There are ways to improve it's performance - pushing some filters into
SQL for example - so theres scoep for development.

Option 1 - add to the main distribution, remove if it becomes a block -
means that there is no additional work on a release vote.

Testing SDB using Derby only (that's what the junit does by default) is
easy to setup because it's pulled in by maven.  It only runs embedded,
not as a server but it does check the code generation.  Unliek other
jjava SQL DBs, Derby implements tree join plans (the other onyl do
linear join plans which makes some optional cases impossible - the code
fall back to brute-force-and-ignorance in these cases). Derby is quite
picky about it's SQL 92.  SDB tests without additional setup.

We state on users@ this is the position as a indication that we stil
have option 3 (retirement) available.  Unless we shake the tree a bit,

Proposal: (option 1)

add to apache-jena, remove at the first sign of trouble.
make a clear statement of situation on users@ including
  encouraging people to come forward
option 3 still on the cards.

I've added DISCUSS to the subject line for now to leave open the
possibility of a vote because it affects th

Re: Templated queries and updates

2013-03-26 Thread Holger Knublauch

On 3/25/2013 4:37, Andy Seaborne wrote:
Looking back at the thread with Joshua Taylor on initial bindings and 
update, I wonder if we could do with "proper" templates.



Manipulation of the algebra for query building does not work so well 
remotely as the query is sent in syntax form.  Having query templates 
in SPARQL syntax with template parameters seems more natural.


(read "query or update" for "query" throughout).

ParameterizedSparqlString seems to do two things - correct me if I'm 
wrong Rob - it's a sort of builder of queries (the .append(..) 
methods) and also a bit like JDBC prepared statements (.setXYZ(...)).  
But it does not know the syntax of the query or update.  They are open 
to injection [1] although that's fixable.


My suggestion is to have template queries, which are like, but not 
identical, to JDBC prepared statements.  A template query is a 
superset of SPARQL. Template parameterization is via a new parse item, 
not a legal SPARQL variable (e.g. ?{abc}).  They must be replaced by a 
node (which could be a real variable) to use them. There would 
template.asQuery(..substitutions...).


An alternative was using SPARQL variables, requiring a query/update 
template to declare the template variables and checking when 
converting to a query or update.  But, as below, there are a couple of 
points where it is desirable to parameterize a template that in SPARQL 
do not allow variables.


I have always wondered why the SPARQL 1.1 WG did not decide to allow 
variables in OFFSET and LIMIT - both are common patterns in interactive 
UIs where users page through a result set. I believe it would be great 
if this was supported in ARQ Syntax, and maybe SPARQL 1.2 will align 
with ARQ Syntax just like it did in previous iterations.


I would personally prefer to go with established techniques such as 
pre-binding before inventing another query preprocessor syntax that 
cannot be parsed by SPARQL parsers and that opens the door to other 
problems.


SPIN already has a syntax for templates and a mechanism to declare 
arguments. It has been used for quite some time now and seems to work well.


Holger


  If we're tweaking the syntax anyway, we might as well have template 
variable syntax.


The reason for some checking is so you can't do "SELECT * { ?s ?p ?o 
}" forgetting to replace ?s with a URI, for example.


== Query

LIMIT and OFFSET take fixed integers by syntax and ideally they would 
parameters.


== Update

INSERT DATA / DELETE DATA restrict

It would be good to have template data updates.  But the data part of 
INSERT DATA explicitly forbids variables.



To this end, I have got the machinery for transforms of Element 
objects (c.f. transforms on Ops) working.  By working on the AST, 
injection is harder because the template parameter must be a Node to 
go in the right place in the AST.


Comments and thoughts?

Rob - does this relate to jena-jdbc in some way?

Andy

[1]

public static void injection() {
String str =
 "PREFIX : \nINSERT DATA {   ?var2 . }" ;
ParameterizedSparqlString pss = new ParameterizedSparqlString(str) ;
pss.setIri("var2",
   "hello> } ; DROP ALL ; INSERT DATA {