[jira] [Created] (JENA-2006) Dataset prefixes

Andy Seaborne (Jira) Mon, 30 Nov 2020 04:02:35 -0800

Andy Seaborne created JENA-2006:
-----------------------------------

             Summary: Dataset prefixes
                 Key: JENA-2006
                 URL: https://issues.apache.org/jira/browse/JENA-2006
             Project: Apache Jena
          Issue Type: Improvement
            Reporter: Andy Seaborne



Summary:

Add API calls:

{{DatasetGraph.prefixes()}} -> {{PrefixMap}}
{{ Dataset.getPrefixMapping()}} -> {{PrefixMapping}}

Rework internal implementation code to reflect this. 

Clearup the different handling of prefixes; switch to a consistent provision of 
a dataset prefix map. Remove {{DatasetPrefixStorage}} (multiple prefix maps per 
dataset). 

My first attempt of this work was to use {{DatasetPrefixStorage}} consistently 
but it ended up as a lot of classes mirroring PrefixMap implementations. 
Because input formats only have prefixes by datasets, not individual graph, the 
extra feature of multiple prefix maps can only be used by API and it just 
doesn't seem worth the effort and extra code. It was quicker doing the final 
form - "one prefix map per dataset" than the more complicated form.

More details:

"TDB" means both TDB1 anbd TDB2.

The main use case for prefixes is set as part of data parsing and use for 
output to abbreviate URIs.

For output, we know that URI->prefixed name is a performance critical 
operation. It is optimized in {{PrefixMapStd}}. This does not change. The 
writers copy prefixes into a {{PrefixMapStd} which has a fast-path for the 
common case of split at last "/" or "#" and a reverse map from URI to prefix.

Mostly, up to now, implementation has been "store the prefixes in the default 
graph" and while TDB stores multiple set of prefixes for each dataset so that 
here is the possibility of graphs in the same dataset having different 
prefixes, it used the default graph as well. Output has never made use of 
multiple prefixes per dataset.

The {{PrefixMapping}} API presumes a reverse mapping and the API contract is 
part of the Model API (Model extends PrefixMapping). The other odd feature of 
{{PrefixMapping}} is that there is no direct access to the prefixes as a map, 
only a copy form.

{{PrefixMap}} is simpler with the needs of parsers and storage implementation 
in mind.

The idea is that {{PrefixMapping}} is to be considered to be part of the 
Dataset/Model/Statement/Resource APIs. There is a legacy quirk that Graph has 
"getPrefixMapping" but otherwise {{PrefixMap}} is the internal abstraction for 
the Model API.

There will be adapters between the two viewpoints. Aside from the implicit 
contract of {{PrefixMapping}} following XML qname rules, while Turtle is less 
restrictive, the functionality can be mapped both ways.

Mostly the XML-rules contract has been moved into the writers themselves in 
previous iterations of implementation improvement. The adapters are lightweight 
objects, with no state other than the object that adapt and "double adapting" 
actually removes wrappers and returns the underlying prefixes object.

The improved way:

Basic datasets (DatasetGraphMap and DatasetGraphMapLink) - dataset prefixes are 
the default graph prefixes.

TIM: All graphs in the dataset have the same prefix map. The PrefixMap is 
thread-safe but isn't transactional (possible future work if needed).

TDB1, TDB2: These have there own, more general prefix storage but the 
additional feature is not exposed. All graphs in the dataset have the same 
prefix map. There is no change to on-disk format.

SDB: As before. There is no change to on-disk format.

The nulls (DatasetGraphZero and DatasetGraphSink): Sink is "forget updates", 
Zero is "empty, no updates":  Suitably misbehaved implemented of the 
{{PrefixMap}} API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (JENA-2006) Dataset prefixes

Reply via email to