Aklakan commented on PR #3184:
URL: https://github.com/apache/jena/pull/3184#issuecomment-3291792984
The goal of `DatasetGraphOverSparql` is to act as a bridge between ARQ and
external SPARQL-capabale stores.
The design of the sparql dispatcher system proposed by this PR allows both
`QueryExec.dataset(dsgOverSparql)` and `RDFLink.connect(dsgOverSparql)` to
efficiently proxy queries and updates to the backend.
However, for graph-store-protocol (GSP) operations, ARQ relies on the
DatasetGraph API, which is indeed far from ideal for use with a SPARQL backend.
It is consistent however, because the class name is `DatasetGraphOverSparql`
and not `DatasetGraphOverSparqlAndGSP` - so the protocol is fixed to SPARQL.
For SPARQL, the current design already allows configuration of protocol
matters on the RDFLink level.
The snippet below is a variation of this PR's
`ExampleDBpediaViaRemoteDataset.java`:
```java
Creator<RDFLink> linkCreator = () -> RDFLinkHTTP.newBuilder()
.destination("http://dbpedia.org/sparql")
// Request using thrift instead of the default
application/sparql-results+json.
.acceptHeaderSelectQuery(WebContent.contentTypeResultsThrift)
.build()
DatasetGraph dsg = new DatasetGraphOverRDFLink(linkCreator);
QueryExec.dataset(dsg)...; // Queries will be dispatched to the link and
// execution won't use the DatasetGraph API.
```
The fundamental issue is that DatasetGraph is central to most parts of Jena
(up to Fuseki).
At some point in the future - in the appropriate places - it might be worth
superseding DatasetGraph with a more general `RDFLinkSource` (a factory of
RDFLinks - similar to JDBC's DataSource).
This way, Fuseki could forward GSP requests to vender-specific driver
implementations - but I feel that these changes are outside the scope of this
PR.
The idea of a JDBC-like DataSource idea was briefly mentioned in
https://github.com/apache/jena/pull/1390#issuecomment-1165821801
> Transforming update requests via
`StreamRDFToUpdateRequest.sendGraphTriplesToStream`
I think in principle the abstraction with a configurable update sink is ok,
but I agree that a custom `graph-to-update-requests` mapper should not require
subclassing!
Also, the default strategy should be to put the inserts into a single
request instead of performing some magic splitting that will break blank nodes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]