Re: CMS diff: Jena Full Text Search

2017-12-09 Thread Chris Tomlinson
Hello,

This commit, against the staging version of jena-text doc, corrects the 
documentation to reflect fix JENA-1439 graph queries fail 'lang:xx’ 
.

Thank you,
Chris


> On Dec 9, 2017, at 5:45 PM, Chris Tomlinson  wrote:
> 
> Clone URL (Committers only):
> https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Fquery%2Ftext-query.mdtext
> 
> Chris Tomlinson
> 
> Index: trunk/content/documentation/query/text-query.mdtext
> ===
> --- trunk/content/documentation/query/text-query.mdtext   (revision 
> 1817587)
> +++ trunk/content/documentation/query/text-query.mdtext   (working copy)
> @@ -391,11 +391,6 @@
> will iterate over the graphs in the dataset, searching each in turn for
> matches.
> 
> -Note that there is a known issue when a `lang:xx` argument is included in
> -the above pattern, so that the restriction to given language is not obeyed. 
> -This will be corrected in a future release. However, use of a language tag
> -on the `query string` is not subject to this issue.
> -
> If there is suitable structure to the graphs, e.g., a known `rdf:type` and
> depending on the selectivity of the text query and number of graphs, 
> it may be more performant to express the query as follows:
> @@ -406,9 +401,6 @@
>   graph ?g { ?s a ex:Item } .
> }
> 
> -Note that this form does not have any issue with `lang:xx` as described
> -above, since the graph is extracted after the text search.
> -
>  Queries across multiple `Field`s
> 
> As mentioned earlier, the text index uses the
> 



CMS diff: Jena Full Text Search

2017-12-09 Thread Chris Tomlinson
Clone URL (Committers only):
https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Fquery%2Ftext-query.mdtext

Chris Tomlinson

Index: trunk/content/documentation/query/text-query.mdtext
===
--- trunk/content/documentation/query/text-query.mdtext (revision 1817587)
+++ trunk/content/documentation/query/text-query.mdtext (working copy)
@@ -391,11 +391,6 @@
 will iterate over the graphs in the dataset, searching each in turn for
 matches.
 
-Note that there is a known issue when a `lang:xx` argument is included in
-the above pattern, so that the restriction to given language is not obeyed. 
-This will be corrected in a future release. However, use of a language tag
-on the `query string` is not subject to this issue.
-
 If there is suitable structure to the graphs, e.g., a known `rdf:type` and
 depending on the selectivity of the text query and number of graphs, 
 it may be more performant to express the query as follows:
@@ -406,9 +401,6 @@
   graph ?g { ?s a ex:Item } .
 }
 
-Note that this form does not have any issue with `lang:xx` as described
-above, since the graph is extracted after the text search.
-
  Queries across multiple `Field`s
 
 As mentioned earlier, the text index uses the



[GitHub] jena pull request #325: fix JENA-1439 graph queries fail 'lang:xx'

2017-12-09 Thread xristy
GitHub user xristy opened a pull request:

https://github.com/apache/jena/pull/325

fix JENA-1439  graph queries fail 'lang:xx'

Fixes JENA-1439 graph queries fail to preserve text:query 'lang:xx' arg. 
TextQueryPF.extractArg(...) removed the arg unnecessarily from the input list, 
side-effecting the list which was saved in an OpPropFunc for repeated use on 
graph iterations and thus the argument was only present on the first graph in 
the iteration

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BuddhistDigitalResourceCenter/jena 
JENA-1439-graphs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #325


commit 3d63b3d2d62a30690b7359703fc39aa05001be31
Author: Chris Tomlinson 
Date:   2017-12-09T17:37:51Z

fix JENA-1439  graph queries fail 'lang:xx'

Fixes JENA-1439 graph queries fail to preserve text:query 'lang:xx' arg. 
TextQueryPF.extractArg(...) removed the arg unnecessarily from the input list, 
side-effecting the list which was saved in an OpPropFunc for repeated use on 
graph iterations and thus the argument was only present on the first graph in 
the iteration




---


[jira] [Commented] (JENA-1439) graph queries fail to preserve text:query 'lang:xx' arg

2017-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284851#comment-16284851
 ] 

ASF GitHub Bot commented on JENA-1439:
--

GitHub user xristy opened a pull request:

https://github.com/apache/jena/pull/325

fix JENA-1439  graph queries fail 'lang:xx'

Fixes JENA-1439 graph queries fail to preserve text:query 'lang:xx' arg. 
TextQueryPF.extractArg(...) removed the arg unnecessarily from the input list, 
side-effecting the list which was saved in an OpPropFunc for repeated use on 
graph iterations and thus the argument was only present on the first graph in 
the iteration

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BuddhistDigitalResourceCenter/jena 
JENA-1439-graphs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #325


commit 3d63b3d2d62a30690b7359703fc39aa05001be31
Author: Chris Tomlinson 
Date:   2017-12-09T17:37:51Z

fix JENA-1439  graph queries fail 'lang:xx'

Fixes JENA-1439 graph queries fail to preserve text:query 'lang:xx' arg. 
TextQueryPF.extractArg(...) removed the arg unnecessarily from the input list, 
side-effecting the list which was saved in an OpPropFunc for repeated use on 
graph iterations and thus the argument was only present on the first graph in 
the iteration




> graph queries fail to preserve text:query 'lang:xx' arg
> ---
>
> Key: JENA-1439
> URL: https://issues.apache.org/jira/browse/JENA-1439
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Affects Versions: Jena 3.5.0
> Environment: Jena 3.6.0-SNAPSHOT
>Reporter: Code Ferret
>Assignee: Code Ferret
> Attachments: TEST_TEXT_001.trig, TEST_TEXT_002.trig, 
> TRACING-jena-text-W-cache-W-graph-lang-FUSEKI-3.6.0-SNAPSHOT.txt
>
>
> Jena-text queries that iterate over graphs, such as:
> {code}
> select ?g ?s ?lit
> where {
>graph ?g { (?s ?sc ?lit) text:query (skos:prefLabel "one" "lang:xx") } 
> .
> }
> {code}
> fail to pass the {{lang:xx}} after the first graph leading to erroneous 
> results.
> A file with configuration, query, trace logging and results is attached along 
> with example trig files,



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Pooling Http Client lockup

2017-12-09 Thread Dave Reynolds
Thanks Andy. Yes, I remember mention of things like that on the list 
before but hadn't done a systematic search through.


In this instance the server was localhost which won't have helped but at 
least part of it is/was a straight code defect on my part which allowed 
one class of connections to potentially escape the close().


Dave


On 09/12/17 10:06, Andy Seaborne wrote:
In testing, connection cycling and running out of connections has been a 
recurring and non-deterministic issue.


Under load, the kernel (linux at least) does not clear up connections 
and sockets fast enough.  Its worse when the server is localhost and if 
servers are starting and stopping but it can happen in less extreme, 
more realistic cases, as well.


Th effect can be seen with "netstat -t" - lots of connections in one of 
the wait states.  And this is with all the close() done.


An unclosed connection hangs around for IIRC at least 2 minutes.

The improvements in HttpClient usage don't change this although pooling 
might make it worse.


Makes me appreciate the GC even more.

 Andy

On 08/12/17 23:52, Dave Reynolds wrote:

Hi,

Several test cases later I'm now sure this is my fault, not Jena's.

Thanks for the help, and apologies for the noise.

[There's some code path that can fail to close an exec under certain 
conditions. I can't yet explain why the switch from Jena 3.1.1 to Jena 
3.5.0 would have provoked it when it was so rock solid under Jena 
3.1.1 but I'm sure the problem is on my side.]


Dave

On 08/12/17 22:06, Dave Reynolds wrote:

Hi,

On 08/12/17 19:39, ajs6f wrote:

Dave--

Jena switched HTTP Commons versions a good bit from 3.1.1 to now, so 
that may be part of it. I will look into whether HTTP Commons Client 
changed its behavior under us.


Thanks.

I agree that just upping the maxes and hoping for the best isn't the 
best outcome at all.


Otherwise, this sounds either buggy (if all the execs really are 
getting closed) or at least not very ergonomic (since you aren't 
seeing any warnings).


Do I understand correctly that the behavior you would want would be 
that


1) after the max number of connections have been drawn out of the 
pool and used, the next request should block only until a connection 
is released,
2) and that closing a query execution should definitely return the 
connection underneath it to the pool, more or less immediately?


Yes.

It's almost like something above the client is blocking and not 
letting the connections get released...


Indeed.

I'll attempt to recreate the behaviour with a minimal test case with 
none of our other code stack in the way. Should be able isolate 
whether it some bad practice at our (well, my) end which just happens 
to be safe in earlier versions of http client/jena or whether it 
really is Jena.


If I can get it pinned down to the latter then I'll open a JIRA.

Dave

On Dec 8, 2017, at 12:55 PM, Dave Reynolds 
 wrote:


Hi Rob,

Thanks, that's useful to know but just raising the limit will just 
make it less frequent but not cure it.


The issue is that if you ever get more in-flight than the limit 
then all future requests are blocked. The in-flight requests return 
fine, the execs are closed but the httpclient never recovers.


I will, of course, check again that I'm successfully closing all 
the execs. However, with Jena 3.1.1 this code has successfully run 
for months between reboots with request rates in the millions per 
week (with occasional high bursts). No lock ups at all. As it 
stands I can't switch it to a newer Jena unless I can absolutely 
guarantee an upper limit on the number of concurrent requests. That 
is often possible (through an apache reverse proxy front end with 
request throttling) but feels like an unsatisfying resolution.


Dave

On 08/12/17 16:56, Rob Vesse wrote:
I think this relates to a HttpClient behaviour that limits the 
maximum connections to a given service
At least in how Jena sets it up the default is 5 connections per 
service which is more generous than the HTTP client defaults.  
Jena appears to read this from a JVM property http.maxConnections 
OR you can construct your own client by calling 
setMaxConnPerRoute() and setMaxConnTotal() on the builder to set 
your desired settings.

Rob
On 08/12/2017, 16:44, "Dave Reynolds"  
wrote:

 Hi,
  I've being updating some rather old libraries that use 
Jena to issue

 sparql requests to a remote endpoint and pull back results.
  These work under Jena 3.1.1 but there's a fatal problem 
under Jena 3.2.0

 and later ...
  If I issue 6 concurrent execSelect calls to a remote 
sparql endpoint
 (happens to be fuseki) then 5 will get issued and return 
correctly but
 #6 will not and from then onwards no further remote calls 
will go
 through. This only happens if at least 5 requests are in 
flight with no
 response yet from the remote endpoint before the final one is 

Re: Build failed in Jenkins: Jena_Development_Deploy #1465

2017-12-09 Thread ajs6f
Looks like Elasticsearch startup went over a minute and that timed out the 
build.

We set that to 60 seconds:

https://github.com/apache/jena/blob/master/jena-text-es/pom.xml#L136

Is it worth bumping that up to 90? I don't want to let CI jobs sit in process 
too long on a shared server, but thirty seconds more seems reasonable.

ajs6f

> On Dec 9, 2017, at 6:38 AM, Apache Jenkins Server  
> wrote:
> 
> See 
> 
> 
> Changes:
> 
> [andy] JENA-1430: Read quads for ja:data by filename
> 
> [ajs6f] JENA-1430: Quad loading for in-memory assemblers
> 
> [ajs6f] Correcting dataset type in Fuseki example
> 
> [ajs6f] Better, clearer handling of mixed assembling
> 
> [andy] Put in compatibility for ja:DatasetTxnMem.
> 
> [andy] Improve the message.
> 
> [andy] Use stream.
> 
> [andy] Fix URI
> 
> --
> [...truncated 316.26 KB...]
> [INFO] --- maven-install-plugin:2.5.2:install (default-install) @ jena-text 
> ---
> [INFO] Installing 
> 
>  to 
> /home/jenkins/jenkins-slave/maven-repositories/1/org/apache/jena/jena-text/3.6.0-SNAPSHOT/jena-text-3.6.0-SNAPSHOT.jar
> [INFO] Installing 
>  
> to 
> /home/jenkins/jenkins-slave/maven-repositories/1/org/apache/jena/jena-text/3.6.0-SNAPSHOT/jena-text-3.6.0-SNAPSHOT.pom
> [INFO] Installing 
> 
>  to 
> /home/jenkins/jenkins-slave/maven-repositories/1/org/apache/jena/jena-text/3.6.0-SNAPSHOT/jena-text-3.6.0-SNAPSHOT-sources.jar
> [INFO] Installing 
> 
>  to 
> /home/jenkins/jenkins-slave/maven-repositories/1/org/apache/jena/jena-text/3.6.0-SNAPSHOT/jena-text-3.6.0-SNAPSHOT-javadoc.jar
> [INFO] 
> [INFO] --- maven-deploy-plugin:2.8.2:deploy (default-deploy) @ jena-text ---
> [INFO] Downloading from apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/3.6.0-SNAPSHOT/maven-metadata.xml
> [INFO] Downloaded from apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/3.6.0-SNAPSHOT/maven-metadata.xml
>  (1.2 kB at 2.0 kB/s)
> [INFO] Uploading to apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/3.6.0-SNAPSHOT/jena-text-3.6.0-20171209.113552-28.jar
> [INFO] Uploaded to apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/3.6.0-SNAPSHOT/jena-text-3.6.0-20171209.113552-28.jar
>  (96 kB at 75 kB/s)
> [INFO] Uploading to apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/3.6.0-SNAPSHOT/jena-text-3.6.0-20171209.113552-28.pom
> [INFO] Uploaded to apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/3.6.0-SNAPSHOT/jena-text-3.6.0-20171209.113552-28.pom
>  (5.5 kB at 5.1 kB/s)
> [INFO] Downloading from apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/maven-metadata.xml
> [INFO] Downloaded from apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/maven-metadata.xml
>  (424 B at 751 B/s)
> [INFO] Uploading to apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/3.6.0-SNAPSHOT/maven-metadata.xml
> [INFO] Uploaded to apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/3.6.0-SNAPSHOT/maven-metadata.xml
>  (1.2 kB at 1.1 kB/s)
> [INFO] Uploading to apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/maven-metadata.xml
> [INFO] Uploaded to apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/maven-metadata.xml
>  (424 B at 379 B/s)
> [INFO] Uploading to apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/3.6.0-SNAPSHOT/jena-text-3.6.0-20171209.113552-28-sources.jar
> [INFO] Uploaded to apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/3.6.0-SNAPSHOT/jena-text-3.6.0-20171209.113552-28-sources.jar
>  (75 kB at 59 kB/s)
> [INFO] Uploading to apache.snapshots.https: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-text/3.6.0-SNAPSHOT/maven-metadata.xml
> [INFO] 

Re: Pooling Http Client lockup

2017-12-09 Thread Andy Seaborne
In testing, connection cycling and running out of connections has been a 
recurring and non-deterministic issue.


Under load, the kernel (linux at least) does not clear up connections 
and sockets fast enough.  Its worse when the server is localhost and if 
servers are starting and stopping but it can happen in less extreme, 
more realistic cases, as well.


Th effect can be seen with "netstat -t" - lots of connections in one of 
the wait states.  And this is with all the close() done.


An unclosed connection hangs around for IIRC at least 2 minutes.

The improvements in HttpClient usage don't change this although pooling 
might make it worse.


Makes me appreciate the GC even more.

Andy

On 08/12/17 23:52, Dave Reynolds wrote:

Hi,

Several test cases later I'm now sure this is my fault, not Jena's.

Thanks for the help, and apologies for the noise.

[There's some code path that can fail to close an exec under certain 
conditions. I can't yet explain why the switch from Jena 3.1.1 to Jena 
3.5.0 would have provoked it when it was so rock solid under Jena 3.1.1 
but I'm sure the problem is on my side.]


Dave

On 08/12/17 22:06, Dave Reynolds wrote:

Hi,

On 08/12/17 19:39, ajs6f wrote:

Dave--

Jena switched HTTP Commons versions a good bit from 3.1.1 to now, so 
that may be part of it. I will look into whether HTTP Commons Client 
changed its behavior under us.


Thanks.

I agree that just upping the maxes and hoping for the best isn't the 
best outcome at all.


Otherwise, this sounds either buggy (if all the execs really are 
getting closed) or at least not very ergonomic (since you aren't 
seeing any warnings).


Do I understand correctly that the behavior you would want would be that

1) after the max number of connections have been drawn out of the 
pool and used, the next request should block only until a connection 
is released,
2) and that closing a query execution should definitely return the 
connection underneath it to the pool, more or less immediately?


Yes.

It's almost like something above the client is blocking and not 
letting the connections get released...


Indeed.

I'll attempt to recreate the behaviour with a minimal test case with 
none of our other code stack in the way. Should be able isolate 
whether it some bad practice at our (well, my) end which just happens 
to be safe in earlier versions of http client/jena or whether it 
really is Jena.


If I can get it pinned down to the latter then I'll open a JIRA.

Dave

On Dec 8, 2017, at 12:55 PM, Dave Reynolds 
 wrote:


Hi Rob,

Thanks, that's useful to know but just raising the limit will just 
make it less frequent but not cure it.


The issue is that if you ever get more in-flight than the limit then 
all future requests are blocked. The in-flight requests return fine, 
the execs are closed but the httpclient never recovers.


I will, of course, check again that I'm successfully closing all the 
execs. However, with Jena 3.1.1 this code has successfully run for 
months between reboots with request rates in the millions per week 
(with occasional high bursts). No lock ups at all. As it stands I 
can't switch it to a newer Jena unless I can absolutely guarantee an 
upper limit on the number of concurrent requests. That is often 
possible (through an apache reverse proxy front end with request 
throttling) but feels like an unsatisfying resolution.


Dave

On 08/12/17 16:56, Rob Vesse wrote:
I think this relates to a HttpClient behaviour that limits the 
maximum connections to a given service
At least in how Jena sets it up the default is 5 connections per 
service which is more generous than the HTTP client defaults.  Jena 
appears to read this from a JVM property http.maxConnections OR you 
can construct your own client by calling setMaxConnPerRoute() and 
setMaxConnTotal() on the builder to set your desired settings.

Rob
On 08/12/2017, 16:44, "Dave Reynolds"  
wrote:

 Hi,
  I've being updating some rather old libraries that use 
Jena to issue

 sparql requests to a remote endpoint and pull back results.
  These work under Jena 3.1.1 but there's a fatal problem 
under Jena 3.2.0

 and later ...
  If I issue 6 concurrent execSelect calls to a remote 
sparql endpoint
 (happens to be fuseki) then 5 will get issued and return 
correctly but

 #6 will not and from then onwards no further remote calls will go
 through. This only happens if at least 5 requests are in 
flight with no
 response yet from the remote endpoint before the final one is 
issued.
  It's hard to produce a deterministic, standalone test 
case because by
 definition it depends on an external sparql endpoint being 
reliably

 there and reliably not too fast :)
  Looking at the stack trace I see:
  Unsafe.park(boolean, long) line: not available [native 
method]

 LockSupport.park(Object) line: 175