Bruno - thank you for the reviews

    Andy

On 23/09/2021 21:38, Andy Seaborne wrote:
To add:

I've checked by compiling RDF Delta against a build of this branch.

Part of the changes is switching from Apache HttpClient v4 to use java.net.http.

Apache HttpClient is still a dependency at the moment.

The biggest change is HttpOp used in test classes. The class has intentionally been renamed as HttpOp1 (think: an extreme @deprecated) and a new, smaller more focused HttpOp exists in a different package.

     Andy

On 23/09/2021 21:15, Andy Seaborne wrote:
It is time to consider putting this in.

 > Suggestion: so as not to make this a hard-to-reverse change,
 > + Create a new branch in git with this PR
 > + Change the overnight development build to build from this branch.
 >
 > so the main branch is left in the current state.

that is all very well but it makes it tough for external contributions (e.g. hopefully JENA-2169) that go anywhere near the changed files. I've been resolving conflicts but it is fiddly and error prone.

Is it Jena5?
There isn't an intention to be that scale and I hope the majority of users are not affected; maybe some deprecations appear. Fuseki users aren't impacted.

Downstream systems digging into the internals of Jena will be more impacted where very old code has been updated, but whether they would test before a release is, as we know, quite unlikely.

     Andy

On 29/07/2021 14:36, Andy Seaborne wrote:
This is about ready.

It's big.

== tl;dr

+ Put on new branch
+ Switch the development Jenkins build the new branch so people can test it prior to release (and smooth out any unexpected bumps)
+ Applications using HTTP authentication need to change.

==

It will affect application that use of HTTP auth because up until now, they have had to configure an AHC HttpClient ("AHC" = "Apache HttpComponents") externally to Jena.

There are builders for all local and HTTP forms of RDFConnection, QueryExecution and UpdateExecution. In fact, using builders is preferred to the old factories. The factories remain, they use builder methods, where they still make sense.

== Initial bindings => substitution

"substitution" is rewrite a query/update replacing a variables for values. It is a before-execution rewrite.

QueryExecution still allows initial bindings and timouts to be set although that is better done by the companion builder.

Query and update rewrite by substitution is uniformly provided. While initialBinding is still there for local operation (never supported for remote operations), substitution is now available for local and remote.

Substitution does not always give the same answers, thought it does in most cases, so this is a long term migration.

== java.net.http.HttpClient

java.net.http.HttpClient is different to AHC HttpClient.

java.net.http.HttpClient:
* It is more like a combination of AHC HttpClient and HttpContext.
* There are no pool or cache options - connection reuse is inside the JDK code; there is no support for caching (application responsibility), which is fine for Jena'a use. * It supports one form of basic auth usage (a pattern useful with microservices).
* has HTTP/2 support.

AHC HttpClient has not been removed as a dependency. Some old code remains (HttpQuery, QueryEngineHTTP) to lessen the immediate changes. jena-jdbc-remote is not changed and still uses AHC HttpClient.

== Authentication

The PR code adds authentication options

1/ The java.net.http.HttpClient with auth model.
2/ Challenge-based basic authentication.
3/ Challenge-based digest authentication.
4/ user@ form of userinfo in URLs (specifically for SERVICE)
5/ user:password@ form of userinfo in URLs (specifically for SERVICE)

All these are best done over https but that depends on the server end.
(5) is not a good idea but we have to live with it.

There is now a thin abstraction to manage username/password and the applications no longer have to deal with HttpClient directly.

== HttpOp, HttpRDF

HttpOp - a library of packed up usage of HTTP request for ways that Jena uses HTTP - is moved to HttpOpAHC.

There is a new HttpOP, in a new package, together with a companion HttpRDF, for java.net.http.HttpClient and the new ways that Jena uses HTTP.

There is a separate GSP client so all the code for GSP is in one place.

The HttpOp the thing that has impacted RDF Delta, and then mostly in tests where test code does direct HTTP actions to validate the server behaviour.

== From here

It would be a good idea to have this exposed before a Jena release. We can at least give people the chance to see what impact, if any, it has on them. Local usage is not supposed to be impacted.

But unusual/unexpected usage patterns may not work with zero change. Some of this code is very old.

Suggestion: so as not to make this a hard-to-reverse change,
+ Create a new branch in git with this PR
+ Change the overnight development build to build from this branch.

so the main branch is left in the current state.


== Other

This is also a chance to improve naming of API functions/methods/classes through deprecation migration. Suggestions welcome.


On 09/07/2021 17:37, Andy Seaborne wrote:
Epic JENA-2125 to track this with tickets for each part.

 >       ResultSet(resources) - RowSet (Nodes)
 >       RDFConnection - RDFLink
 >       QueryExecution - QueryExec

     Andy

On 28/06/2021 18:00, Andy Seaborne wrote:
Jena currently uses Apache HttpClient v4 for HTTP.
This supports HTTP 1.1.

Apache HttpClient v5 supports HTTP/2 and there is a migration path from v4 to new style v5 but the path is not seamless. It is at least package renaming followed by API changes.

https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/index.html
   and
https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/migration-to-classic.html


For most Jena users, there are no application changes needed because SPARQL operations are packed up into the Jena APIs. But if an application is doing detailed HTTP setup - most importantly,that includes authentication - there is going to be a migration impact.

Java11 now has a API java.net.http an all-new way to work with HTTP including HTTP/2. (And there are other HTTP clients - I haven't used any of those others).


Should we update to java.net.http or Apache HttpClient v5 or other?


Given the JDK has a decent HTTP client, my preference is to switch to use java.net.http unless there is a positive reason to use a specific external one.

The JDK provided one means dependencies, is always present, and gets fixes/improvements (if any) come by updating the JVM used.

----

And also if HTTP support in Jena is being upgraded ... the code could do with some work. Some of it is really old and is showing its age.

Areas:
    RDFConnection,
    SPARQL/HTTP QueryExecution and UpdateProcessor,
    Graph Store Protocol
    SERVICE.

== Improvements

+ Builder style for constructing the more complicated
   (e.g. anything HTTP!)
+ Both Model and Graph / Statement and Triple level APIs
   (Model-level being adapters of Graph level engines)

      ResultSet(resources) - RowSet (Nodes)
      RDFConnection - RDFLink
      QueryExecution - QueryExec
      (not an issue with UpdateProcessor)

+ Deprecation of QueryExecution.setTimeout and setIntialBinding
   (use a builder)
+ Switch to rewrite for initial bindings
   This will work for remote usage which currently is unsupported,
+ Explicit GSP engine - include support for quads operations.

. SERVICE rewrite to use the new classes.

- HttpOp : Direct use of java.net.http covers the complex cases so
   this class can be smaller and focused on the common cases.
   (I doubt it's used much directly)

+ Utilities: HttpRDF, AsyncHttpRDF, HttpOp
   AsyncHttpRDF should at least cover async GET so apps can
   gather data from several places in parallel.

== Migration

If we leave the old code for SPARQL execution (QueryEngineHTTP and HttpQuery) in-place, with Apache HttpClient4, apply copious deprecations then, mostly, we have less sudden change. We then remove in a couple of releases time.

Deprecate all QueryExecutionFactory.sparqlService, createServiceRequest and refer to (new) QueryExecutionHTTPBuilder

Deprecate of QueryExecution.setTimeout and setIntialBindings - they should not be where they are.

Update documentation

== Improvements

Code:

   https://github.com/afs/jena-http

which at the moment needs a custom Jena build because of misc cleanup and things found while writing jena-http and not PR'ed to Jena.

Using a different HttpClient should not be too difficult as it internally encapsulates HttpClient usage. But a switchable HttpClient isn't so easy and also not invisible to users because of authentication setup is implementation-specific. We can't abstract authentication without significant costs in support and maintenance to the project.

     Andy

Reply via email to