Bruno - thank you for the reviews
Andy
On 23/09/2021 21:38, Andy Seaborne wrote:
To add:
I've checked by compiling RDF Delta against a build of this branch.
Part of the changes is switching from Apache HttpClient v4 to use
java.net.http.
Apache HttpClient is still a dependency at the moment.
The biggest change is HttpOp used in test classes. The class has
intentionally been renamed as HttpOp1 (think: an extreme @deprecated)
and a new, smaller more focused HttpOp exists in a different package.
Andy
On 23/09/2021 21:15, Andy Seaborne wrote:
It is time to consider putting this in.
> Suggestion: so as not to make this a hard-to-reverse change,
> + Create a new branch in git with this PR
> + Change the overnight development build to build from this branch.
>
> so the main branch is left in the current state.
that is all very well but it makes it tough for external contributions
(e.g. hopefully JENA-2169) that go anywhere near the changed files.
I've been resolving conflicts but it is fiddly and error prone.
Is it Jena5?
There isn't an intention to be that scale and I hope the majority of
users are not affected; maybe some deprecations appear. Fuseki users
aren't impacted.
Downstream systems digging into the internals of Jena will be more
impacted where very old code has been updated, but whether they would
test before a release is, as we know, quite unlikely.
Andy
On 29/07/2021 14:36, Andy Seaborne wrote:
This is about ready.
It's big.
== tl;dr
+ Put on new branch
+ Switch the development Jenkins build the new branch so people can
test it prior to release (and smooth out any unexpected bumps)
+ Applications using HTTP authentication need to change.
==
It will affect application that use of HTTP auth because up until
now, they have had to configure an AHC HttpClient ("AHC" = "Apache
HttpComponents") externally to Jena.
There are builders for all local and HTTP forms of RDFConnection,
QueryExecution and UpdateExecution. In fact, using builders is
preferred to the old factories. The factories remain, they use
builder methods, where they still make sense.
== Initial bindings => substitution
"substitution" is rewrite a query/update replacing a variables for
values. It is a before-execution rewrite.
QueryExecution still allows initial bindings and timouts to be set
although that is better done by the companion builder.
Query and update rewrite by substitution is uniformly provided. While
initialBinding is still there for local operation (never supported
for remote operations), substitution is now available for local and
remote.
Substitution does not always give the same answers, thought it does
in most cases, so this is a long term migration.
== java.net.http.HttpClient
java.net.http.HttpClient is different to AHC HttpClient.
java.net.http.HttpClient:
* It is more like a combination of AHC HttpClient and HttpContext.
* There are no pool or cache options - connection reuse is inside the
JDK code; there is no support for caching (application
responsibility), which is fine for Jena'a use.
* It supports one form of basic auth usage (a pattern useful with
microservices).
* has HTTP/2 support.
AHC HttpClient has not been removed as a dependency. Some old code
remains (HttpQuery, QueryEngineHTTP) to lessen the immediate changes.
jena-jdbc-remote is not changed and still uses AHC HttpClient.
== Authentication
The PR code adds authentication options
1/ The java.net.http.HttpClient with auth model.
2/ Challenge-based basic authentication.
3/ Challenge-based digest authentication.
4/ user@ form of userinfo in URLs (specifically for SERVICE)
5/ user:password@ form of userinfo in URLs (specifically for SERVICE)
All these are best done over https but that depends on the server end.
(5) is not a good idea but we have to live with it.
There is now a thin abstraction to manage username/password and the
applications no longer have to deal with HttpClient directly.
== HttpOp, HttpRDF
HttpOp - a library of packed up usage of HTTP request for ways that
Jena uses HTTP - is moved to HttpOpAHC.
There is a new HttpOP, in a new package, together with a companion
HttpRDF, for java.net.http.HttpClient and the new ways that Jena uses
HTTP.
There is a separate GSP client so all the code for GSP is in one place.
The HttpOp the thing that has impacted RDF Delta, and then mostly in
tests where test code does direct HTTP actions to validate the server
behaviour.
== From here
It would be a good idea to have this exposed before a Jena release.
We can at least give people the chance to see what impact, if any, it
has on them. Local usage is not supposed to be impacted.
But unusual/unexpected usage patterns may not work with zero change.
Some of this code is very old.
Suggestion: so as not to make this a hard-to-reverse change,
+ Create a new branch in git with this PR
+ Change the overnight development build to build from this branch.
so the main branch is left in the current state.
== Other
This is also a chance to improve naming of API
functions/methods/classes through deprecation migration. Suggestions
welcome.
On 09/07/2021 17:37, Andy Seaborne wrote:
Epic JENA-2125 to track this with tickets for each part.
> ResultSet(resources) - RowSet (Nodes)
> RDFConnection - RDFLink
> QueryExecution - QueryExec
Andy
On 28/06/2021 18:00, Andy Seaborne wrote:
Jena currently uses Apache HttpClient v4 for HTTP.
This supports HTTP 1.1.
Apache HttpClient v5 supports HTTP/2 and there is a migration path
from v4 to new style v5 but the path is not seamless. It is at
least package renaming followed by API changes.
https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/index.html
and
https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/migration-to-classic.html
For most Jena users, there are no application changes needed
because SPARQL operations are packed up into the Jena APIs. But if
an application is doing detailed HTTP setup - most importantly,that
includes authentication - there is going to be a migration impact.
Java11 now has a API java.net.http an all-new way to work with HTTP
including HTTP/2. (And there are other HTTP clients - I haven't
used any of those others).
Should we update to java.net.http or Apache HttpClient v5 or other?
Given the JDK has a decent HTTP client, my preference is to switch
to use java.net.http unless there is a positive reason to use a
specific external one.
The JDK provided one means dependencies, is always present, and
gets fixes/improvements (if any) come by updating the JVM used.
----
And also if HTTP support in Jena is being upgraded ... the code
could do with some work. Some of it is really old and is showing
its age.
Areas:
RDFConnection,
SPARQL/HTTP QueryExecution and UpdateProcessor,
Graph Store Protocol
SERVICE.
== Improvements
+ Builder style for constructing the more complicated
(e.g. anything HTTP!)
+ Both Model and Graph / Statement and Triple level APIs
(Model-level being adapters of Graph level engines)
ResultSet(resources) - RowSet (Nodes)
RDFConnection - RDFLink
QueryExecution - QueryExec
(not an issue with UpdateProcessor)
+ Deprecation of QueryExecution.setTimeout and setIntialBinding
(use a builder)
+ Switch to rewrite for initial bindings
This will work for remote usage which currently is unsupported,
+ Explicit GSP engine - include support for quads operations.
. SERVICE rewrite to use the new classes.
- HttpOp : Direct use of java.net.http covers the complex cases so
this class can be smaller and focused on the common cases.
(I doubt it's used much directly)
+ Utilities: HttpRDF, AsyncHttpRDF, HttpOp
AsyncHttpRDF should at least cover async GET so apps can
gather data from several places in parallel.
== Migration
If we leave the old code for SPARQL execution (QueryEngineHTTP and
HttpQuery) in-place, with Apache HttpClient4, apply copious
deprecations
then, mostly, we have less sudden change. We then remove in a
couple of releases time.
Deprecate all QueryExecutionFactory.sparqlService,
createServiceRequest and refer to (new) QueryExecutionHTTPBuilder
Deprecate of QueryExecution.setTimeout and setIntialBindings - they
should not be where they are.
Update documentation
== Improvements
Code:
https://github.com/afs/jena-http
which at the moment needs a custom Jena build because of misc
cleanup and things found while writing jena-http and not PR'ed to
Jena.
Using a different HttpClient should not be too difficult as it
internally encapsulates HttpClient usage. But a switchable
HttpClient isn't so easy and also not invisible to users because of
authentication setup is implementation-specific. We can't abstract
authentication without significant costs in support and maintenance
to the project.
Andy