The code now in SVN is showing stability for heavy repeated use and for
the authentication tests cases. However, the stability test has always
failed non-deterministically so it's not proof it's all working. I have
gone through and tracked down response handling and I hope I have
ensured sterams are closed. If you use HttpOp to get an
(Typed)InputStream, then the caller must close that stream, otherwise it
does run out of OS resources after a while (1000's of calls).
Andy
On 09/08/13 09:24, Andy Seaborne wrote:
The default SystemDefaultHttpClient has a per-route pool of 5 and a
system maximum of 10. We do have to be careful of this lock-up
possibility. Using DefaultHttpClient directly and setting how we want is
probably a better style.
I must look more closely at jena-jdbc - how far though it's tests does
it get? How many connections have been and gone?
The HttpOp in the codebase, when it isn't pased a HttpClient, creates a
new one each time and they don't share a pool. The
SystemDefaultHttpClient is used once so no chances of a lock-up.
It's not what happening in JENA-498 - there, a single threaded tight
loop is running for a non-deterministic number of times then causing an
exception (seems ot be difefrent on different OS's).
There is a chance that Fuseki is not closing it's end properly, or
rather early enough, but when I checked the code, it's all down to Jetty
and that should be pretty well tested. We run Fuseki for many months at
a time.
Andy
On 09/08/13 00:39, Rob Vesse wrote:
The following may be the culprit in JDBC's case:
The PoolingClientConnectionManager will allocate connections based on its
configuration. If all connections for a given route have already been
leased, a request for a connection will block until a connection is
released back to the pool. One can ensure the connection manager does not
block indefinitely in the connection request operation by setting
'http.conn-manager.timeout' to a positive value. If the connection
request
cannot be serviced within the given time period
ConnectionPoolTimeoutException will be thrown.
So HttpClient will block indefinitely until a connection is
available. We
likely want to turn off that behaviour so that when we hit this state
things get a useful error rather than an infinite hang.
Rob
On 8/8/13 4:11 PM, "Andy Seaborne" <[email protected]> wrote:
Maybe related to JENA-498 (many HttpOps overwhelming the system).
But if HttOp uses a shared HttpClient, I was getting lockups. It does
appear to be HTTP error handling (failing to close the input stream of
the response when it's 4xx or 5xx - there may be a body still).
The other part of a shared HttpClient is the authenticator. I haven't
check that yet. I wonder if we need to make it only the HttpClient is
passed in with a HttpAuthenticator alreay set. The DatasetAccessorHttp
could do that. I haven't check the other uses yet; I doubt it's as
clear cut for SPARQL Query etc.
With the old code, creating new SystemDefaultHttpClient was not giving
connection pooling and reuse; only a fast loop caused a problem (20k-40k
iterators).
But I don't know why it works on your interval system and not
AFS/Jenkins. Different versions of ARQ/HttpOp?
Andy
On 08/08/13 23:44, Rob Vesse wrote:
Yes the module that hangs is the driver for remote endpoints and stands
up
a Fuseki server and communicates with it using HTTP which of course now
all goes through HttpOp
Problem is that I never seem to get an actual exception just hangs on
the
build server.
This might also explain why DEBUG level logging makes the build succeed
because HttpClient is very noisy at DEBUG level and all that logging
likely introduces the delays in the right parts of the code to allow
resources to be freed up.
Rob
On 8/8/13 3:40 PM, "Andy Seaborne" <[email protected]> wrote:
On 08/08/13 19:42, Rob Vesse wrote:
So I am officially stumped
Adding the delay still causes the builds to hang so I really don't
understand why the builds fail on Apache Jenkins. Note that I've
been
building the JDBC module on our internal Jenkins server for some time
and
never had an issue there. Plus the builds run fine on a local
machine.
If anyone else can take a look or has any suggestions please jump in
<straw-grasping mode>
Are you using HttpOp? Apache HttpClient?
I'm fairly certain HttpOp can cause resource starvation by improper
use
of HttpClient. However, I haven't managed to find out where for
certain
[HTTP Exceptions are my current best guess]. (I can perturb the
situation by tweaking pooling numbers.)
Andy
Rob
On 8/8/13 11:12 AM, "Rob Vesse" <[email protected]> wrote:
Ok, so turning the log level back down causes the build to go
back to
failing
This starts to look like some kind of timing issue manifesting on
the
build server causing the tests to get into a hung state. Apparently
having the high log level adds sufficient delay into the process to
avoid
this.
My next idea is to simply insert a delay between the tests in
question
and
see if that solves things.
Rob
On 8/8/13 10:55 AM, "Rob Vesse" <[email protected]> wrote:
Ok that is very very weird, after turning up the logging for that
module
the build ran through to success (and generated a ridiculously
large
log
file at the same time).
Next step is to try turning down the log level and see if the build
still
succeeds.
Rob
On 8/8/13 10:35 AM, "Rob Vesse" <[email protected]> wrote:
The problem is that nothing is blowing up, the build just gets
stuck
and
hangs until the build timeout plugin steps in and aborts the build
The hang is in the tests for the remote endpoint driver which are
standing
up Fuseki instances. However if there was some contention for
ports
in
the tests I would expect the tests to just plain fail.
I suspect there may be some deadlock of some sort happening when
running
the tests on the server but it's hard to tell where/what the
deadlock
is.
I am turning the log level for the tests in question to DEBUG and
will
re-run a build to see if that yields anything more useful.
Rob
On 8/8/13 6:53 AM, "Andy Seaborne" <[email protected]> wrote:
On 01/08/13 20:56, Rob Vesse wrote:
I've removed it from the main build for now. For some reason it
is
getting stuck (but not crashing) on the Apache build server.
This
is
despite it building fine locally and on our internal build
servers.
Not sure how to proceed on this - is it worth setting up a
separate
build
for JDBC on the Apache build servers to help try and isolate the
problem?
Rob
What exactly is blowing up?
The Apache build servers have all sorts of things on them and a
wide
range of plugins, which itself can a problem.
Andy
On 8/1/13 11:45 AM, "Rob Vesse" <[email protected]> wrote:
I've moved Jena JDBC from Experimental into Trunk and added it
to
the
main build. The builds are a little nosier that some of the
other
modules so may want some tweaking to avoid spurious build
output.
I haven't attempted to figure out how to add it to the distro
because
I
know nothing about Maven Assembly plugin
Rob