I just had an ah-ha moment after reading this and looking over your latest
commits.

The issue is that the code makes a lot of queries, queries create their
own HttpClient instances because otherwise they can't apply timeouts to
remote requests since timeouts are a global parameter setting on a HTTP
client.  I am going to try having the query route (HttpQuery) pass the
HttpClient it creates internally up to the QueryEngineHTTP and have that
explicitly shutdown the client when the use closes the query execution.

QueryEngineHTTP already ensure that they close the TypedInputStream when
the query execution is closed.

I will test this out and see if it resolves the issue on Jenkins.

Rob


On 8/9/13 6:29 AM, "Andy Seaborne" <[email protected]> wrote:

>The code now in SVN is showing stability for heavy repeated use and for
>the authentication tests cases.  However, the stability test has always
>failed non-deterministically so it's not proof it's all working.  I have
>gone through and tracked down response handling and I hope I have
>ensured sterams are closed.  If you use HttpOp to get an
>(Typed)InputStream, then the caller must close that stream, otherwise it
>does run out of OS resources after a while (1000's of calls).
>
>       Andy
>
>On 09/08/13 09:24, Andy Seaborne wrote:
>> The default SystemDefaultHttpClient has a per-route pool of 5 and a
>> system maximum of 10.  We do have to be careful of this lock-up
>> possibility. Using DefaultHttpClient directly and setting how we want is
>> probably a better style.
>>
>> I must look more closely at jena-jdbc - how far though it's tests does
>> it get?  How many connections have been and gone?
>>
>> The HttpOp in the codebase, when it isn't pased a HttpClient, creates a
>> new one each time and they don't share a pool.  The
>> SystemDefaultHttpClient is used once so no chances of a lock-up.
>>
>> It's not what happening in JENA-498 - there, a single threaded tight
>> loop is running for a non-deterministic number of times then causing an
>> exception (seems ot be difefrent on different OS's).
>>
>> There is a chance that Fuseki is not closing it's end properly, or
>> rather early enough, but when I checked the code, it's all down to Jetty
>> and that should be pretty well tested.  We run Fuseki for many months at
>> a time.
>>
>>      Andy
>>
>> On 09/08/13 00:39, Rob Vesse wrote:
>>> The following may be the culprit in JDBC's case:
>>>
>>> The PoolingClientConnectionManager will allocate connections based on
>>>its
>>> configuration. If all connections for a given route have already been
>>> leased, a request for a connection will block until a connection is
>>> released back to the pool. One can ensure the connection manager does
>>>not
>>> block indefinitely in the connection request operation by setting
>>> 'http.conn-manager.timeout' to a positive value. If the connection
>>> request
>>> cannot be serviced within the given time period
>>> ConnectionPoolTimeoutException will be thrown.
>>>
>>>
>>> So HttpClient will block indefinitely until a connection is
>>> available.  We
>>> likely want to turn off that behaviour so that when we hit this state
>>> things get a useful error rather than an infinite hang.
>>>
>>> Rob
>>>
>>>
>>> On 8/8/13 4:11 PM, "Andy Seaborne" <[email protected]> wrote:
>>>
>>>> Maybe related to JENA-498 (many HttpOps overwhelming the system).
>>>>
>>>> But if HttOp uses a shared HttpClient, I was getting lockups.  It does
>>>> appear to be HTTP error handling (failing to close the input stream of
>>>> the response when it's 4xx or 5xx - there may be a body still).
>>>>
>>>> The other part of a shared HttpClient is the authenticator.  I haven't
>>>> check that yet. I wonder if we need to make it only the HttpClient is
>>>> passed in with a HttpAuthenticator alreay set.  The
>>>>DatasetAccessorHttp
>>>> could do that.  I haven't check the other uses yet; I doubt it's as
>>>> clear cut for SPARQL Query etc.
>>>>
>>>> With the old code, creating new SystemDefaultHttpClient was not giving
>>>> connection pooling and reuse; only a fast loop caused a problem
>>>>(20k-40k
>>>> iterators).
>>>>
>>>> But I don't know why it works on your interval system and not
>>>> AFS/Jenkins.  Different versions of ARQ/HttpOp?
>>>>
>>>>     Andy
>>>>
>>>> On 08/08/13 23:44, Rob Vesse wrote:
>>>>> Yes the module that hangs is the driver for remote endpoints and
>>>>>stands
>>>>> up
>>>>> a Fuseki server and communicates with it using HTTP which of course
>>>>>now
>>>>> all goes through HttpOp
>>>>>
>>>>> Problem is that I never seem to get an actual exception just hangs on
>>>>> the
>>>>> build server.
>>>>>
>>>>> This might also explain why DEBUG level logging makes the build
>>>>>succeed
>>>>> because HttpClient is very noisy at DEBUG level and all that logging
>>>>> likely introduces the delays in the right parts of the code to allow
>>>>> resources to be freed up.
>>>>>
>>>>> Rob
>>>>>
>>>>>
>>>>> On 8/8/13 3:40 PM, "Andy Seaborne" <[email protected]> wrote:
>>>>>
>>>>>> On 08/08/13 19:42, Rob Vesse wrote:
>>>>>>> So I am officially stumped
>>>>>>>
>>>>>>> Adding the delay still causes the builds to hang so I really don't
>>>>>>> understand why the builds fail on Apache Jenkins.  Note that I've
>>>>>>> been
>>>>>>> building the JDBC module on our internal Jenkins server for some
>>>>>>>time
>>>>>>> and
>>>>>>> never had an issue there.  Plus the builds run fine on a local
>>>>>>> machine.
>>>>>>>
>>>>>>> If anyone else can take a look or has any suggestions please jump
>>>>>>>in
>>>>>>
>>>>>> <straw-grasping mode>
>>>>>>
>>>>>> Are you using HttpOp?  Apache HttpClient?
>>>>>>
>>>>>> I'm fairly certain HttpOp can cause resource starvation by improper
>>>>>> use
>>>>>> of HttpClient.  However, I haven't managed to find out where for
>>>>>> certain
>>>>>> [HTTP Exceptions are my current best guess]. (I can perturb the
>>>>>> situation by tweaking pooling numbers.)
>>>>>>
>>>>>>     Andy
>>>>>>
>>>>>>>
>>>>>>> Rob
>>>>>>>
>>>>>>>
>>>>>>> On 8/8/13 11:12 AM, "Rob Vesse" <[email protected]> wrote:
>>>>>>>
>>>>>>>> Ok, so turning the log level back down causes the build to go
>>>>>>>> back to
>>>>>>>> failing
>>>>>>>>
>>>>>>>> This starts to look like some kind of timing issue manifesting on
>>>>>>>> the
>>>>>>>> build server causing the tests to get into a hung state.
>>>>>>>>Apparently
>>>>>>>> having the high log level adds sufficient delay into the process
>>>>>>>>to
>>>>>>>> avoid
>>>>>>>> this.
>>>>>>>>
>>>>>>>> My next idea is to simply insert a delay between the tests in
>>>>>>>> question
>>>>>>>> and
>>>>>>>> see if that solves things.
>>>>>>>>
>>>>>>>> Rob
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/8/13 10:55 AM, "Rob Vesse" <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Ok that is very very weird, after turning up the logging for that
>>>>>>>>> module
>>>>>>>>> the build ran through to success (and generated a ridiculously
>>>>>>>>> large
>>>>>>>>> log
>>>>>>>>> file at the same time).
>>>>>>>>>
>>>>>>>>> Next step is to try turning down the log level and see if the
>>>>>>>>>build
>>>>>>>>> still
>>>>>>>>> succeeds.
>>>>>>>>>
>>>>>>>>> Rob
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/8/13 10:35 AM, "Rob Vesse" <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> The problem is that nothing is blowing up, the build just gets
>>>>>>>>>> stuck
>>>>>>>>>> and
>>>>>>>>>> hangs until the build timeout plugin steps in and aborts the
>>>>>>>>>>build
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The hang is in the tests for the remote endpoint driver which
>>>>>>>>>>are
>>>>>>>>>> standing
>>>>>>>>>> up Fuseki instances.  However if there was some contention for
>>>>>>>>>> ports
>>>>>>>>>> in
>>>>>>>>>> the tests I would expect the tests to just plain fail.
>>>>>>>>>>
>>>>>>>>>> I suspect there may be some deadlock of some sort happening when
>>>>>>>>>> running
>>>>>>>>>> the tests on the server but it's hard to tell where/what the
>>>>>>>>>> deadlock
>>>>>>>>>> is.
>>>>>>>>>> I am turning the log level for the tests in question to DEBUG
>>>>>>>>>>and
>>>>>>>>>> will
>>>>>>>>>> re-run a build to see if that yields anything more useful.
>>>>>>>>>>
>>>>>>>>>> Rob
>>>>>>>>>>
>>>>>>>>>> On 8/8/13 6:53 AM, "Andy Seaborne" <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> On 01/08/13 20:56, Rob Vesse wrote:
>>>>>>>>>>>> I've removed it from the main build for now.  For some reason
>>>>>>>>>>>>it
>>>>>>>>>>>> is
>>>>>>>>>>>> getting stuck (but not crashing) on the Apache build server.
>>>>>>>>>>>> This
>>>>>>>>>>>> is
>>>>>>>>>>>> despite it building fine locally and on our internal build
>>>>>>>>>>>> servers.
>>>>>>>>>>>>
>>>>>>>>>>>> Not sure how to proceed on this - is it worth setting up a
>>>>>>>>>>>> separate
>>>>>>>>>>>> build
>>>>>>>>>>>> for JDBC on the Apache build servers to help try and isolate
>>>>>>>>>>>>the
>>>>>>>>>>>> problem?
>>>>>>>>>>>>
>>>>>>>>>>>> Rob
>>>>>>>>>>>
>>>>>>>>>>> What exactly is blowing up?
>>>>>>>>>>>
>>>>>>>>>>> The Apache build servers have all sorts of things on them and a
>>>>>>>>>>> wide
>>>>>>>>>>> range of plugins, which itself can a problem.
>>>>>>>>>>>
>>>>>>>>>>>     Andy
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 8/1/13 11:45 AM, "Rob Vesse" <[email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I've moved Jena JDBC from Experimental into Trunk and added
>>>>>>>>>>>>>it
>>>>>>>>>>>>> to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> main build.  The builds are a little nosier that some of the
>>>>>>>>>>>>> other
>>>>>>>>>>>>> modules so may want some tweaking to avoid spurious build
>>>>>>>>>>>>> output.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I haven't attempted to figure out how to add it to the distro
>>>>>>>>>>>>> because
>>>>>>>>>>>>> I
>>>>>>>>>>>>> know nothing about Maven Assembly plugin
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rob
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to