Andy Seaborne created JENA-2006:
-----------------------------------
Summary: Dataset prefixes
Key: JENA-2006
URL: https://issues.apache.org/jira/browse/JENA-2006
Project: Apache Jena
Issue Type: Improvement
Reporter: Andy Seaborne
Summary:
Add API calls:
{{DatasetGraph.prefixes()}} -> {{PrefixMap}}
{{ Dataset.getPrefixMapping()}} -> {{PrefixMapping}}
Rework internal implementation code to reflect this.
Clearup the different handling of prefixes; switch to a consistent provision of
a dataset prefix map. Remove {{DatasetPrefixStorage}} (multiple prefix maps per
dataset).
My first attempt of this work was to use {{DatasetPrefixStorage}} consistently
but it ended up as a lot of classes mirroring PrefixMap implementations.
Because input formats only have prefixes by datasets, not individual graph, the
extra feature of multiple prefix maps can only be used by API and it just
doesn't seem worth the effort and extra code. It was quicker doing the final
form - "one prefix map per dataset" than the more complicated form.
More details:
"TDB" means both TDB1 anbd TDB2.
The main use case for prefixes is set as part of data parsing and use for
output to abbreviate URIs.
For output, we know that URI->prefixed name is a performance critical
operation. It is optimized in {{PrefixMapStd}}. This does not change. The
writers copy prefixes into a {{PrefixMapStd} which has a fast-path for the
common case of split at last "/" or "#" and a reverse map from URI to prefix.
Mostly, up to now, implementation has been "store the prefixes in the default
graph" and while TDB stores multiple set of prefixes for each dataset so that
here is the possibility of graphs in the same dataset having different
prefixes, it used the default graph as well. Output has never made use of
multiple prefixes per dataset.
The {{PrefixMapping}} API presumes a reverse mapping and the API contract is
part of the Model API (Model extends PrefixMapping). The other odd feature of
{{PrefixMapping}} is that there is no direct access to the prefixes as a map,
only a copy form.
{{PrefixMap}} is simpler with the needs of parsers and storage implementation
in mind.
The idea is that {{PrefixMapping}} is to be considered to be part of the
Dataset/Model/Statement/Resource APIs. There is a legacy quirk that Graph has
"getPrefixMapping" but otherwise {{PrefixMap}} is the internal abstraction for
the Model API.
There will be adapters between the two viewpoints. Aside from the implicit
contract of {{PrefixMapping}} following XML qname rules, while Turtle is less
restrictive, the functionality can be mapped both ways.
Mostly the XML-rules contract has been moved into the writers themselves in
previous iterations of implementation improvement. The adapters are lightweight
objects, with no state other than the object that adapt and "double adapting"
actually removes wrappers and returns the underlying prefixes object.
The improved way:
Basic datasets (DatasetGraphMap and DatasetGraphMapLink) - dataset prefixes are
the default graph prefixes.
TIM: All graphs in the dataset have the same prefix map. The PrefixMap is
thread-safe but isn't transactional (possible future work if needed).
TDB1, TDB2: These have there own, more general prefix storage but the
additional feature is not exposed. All graphs in the dataset have the same
prefix map. There is no change to on-disk format.
SDB: As before. There is no change to on-disk format.
The nulls (DatasetGraphZero and DatasetGraphSink): Sink is "forget updates",
Zero is "empty, no updates": Suitably misbehaved implemented of the
{{PrefixMap}} API.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)