[jira] [Commented] (JENA-2006) Dataset prefixes

ASF subversion and git services (Jira) Fri, 04 Dec 2020 06:53:35 -0800


    [ 
https://issues.apache.org/jira/browse/JENA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244038#comment-17244038
 ]


ASF subversion and git services commented on JENA-2006:
-------------------------------------------------------

Commit 3ca8d0346049cae0bd764e66388b1991c7cd110b in jena's branch 
refs/heads/master from Andy Seaborne
[ https://gitbox.apache.org/repos/asf?p=jena.git;h=3ca8d03 ]

Merge pull request #880 from afs/dataset-prefixes

JENA-2006: DatasetGraph prefixes

> Dataset prefixes
> ----------------
>
>                 Key: JENA-2006
>                 URL: https://issues.apache.org/jira/browse/JENA-2006
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>            Priority: Major
>             Fix For: Jena 3.18.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Summary:
> Add API calls:
> {{DatasetGraph.prefixes()}} -> {{PrefixMap}}
>  {{Dataset.getPrefixMapping()}} -> {{PrefixMapping}}
> Rework internal implementation code to reflect this.
> Clearup the different handling of prefixes; switch to a consistent provision 
> of a dataset prefix map. Remove {{DatasetPrefixStorage}} (multiple prefix 
> maps per dataset).
> My first attempt of this work was to use {{DatasetPrefixStorage}} 
> consistently but it ended up as a lot of classes mirroring PrefixMap 
> implementations. Because input formats only have prefixes by datasets, not 
> individual graph, the extra feature of multiple prefix maps can only be used 
> by API and it just doesn't seem worth the effort and extra code. It was 
> quicker doing the final form - "one prefix map per dataset" than the more 
> complicated form.
> More details:
> "TDB" means both TDB1 and TDB2.
> The main use case for prefixes is set as part of data parsing and use for 
> output to abbreviate URIs.
> For output, we know that URI->prefixed name is a performance critical 
> operation. It is optimized in {{PrefixMapStd}}. This does not change. The 
> writers copy prefixes into a {{PrefixMapStd}} which has a fast-path for the 
> common case of split at last "/" or "#" and a reverse map from URI to prefix.
> Mostly, up to now, implementation has been "store the prefixes in the default 
> graph" and while TDB stores multiple set of prefixes for each dataset so that 
> here is the possibility of graphs in the same dataset having different 
> prefixes, it used the default graph as well. Output has never made use of 
> multiple prefixes per dataset.
> The {{PrefixMapping}} API presumes a reverse mapping and the API contract is 
> part of the Model API (Model extends PrefixMapping). The other odd feature of 
> {{PrefixMapping}} is that there is no direct access to the prefixes as a map, 
> only a copy form.
> {{PrefixMap}} is simpler with the needs of parsers and storage implementation 
> in mind.
> The idea is that {{PrefixMapping}} is to be considered to be part of the 
> Dataset/Model/Statement/Resource APIs. There is a legacy quirk that Graph has 
> "getPrefixMapping".
> There will be adapters between the two viewpoints. Aside from the implicit 
> contract of {{PrefixMapping}} following XML qname rules, while Turtle is less 
> restrictive, the functionality can be mapped both ways.
> Mostly the XML-rules contract has been moved into the writers themselves in 
> previous iterations of implementation improvement. The adapters are 
> lightweight objects, with no state other than the object that adapt and 
> "double adapting" actually removes wrappers and returns the underlying 
> prefixes object.
> The improved way:
> Basic datasets (DatasetGraphMap and DatasetGraphMapLink) - dataset prefixes 
> are the default graph prefixes.
> TIM: All graphs in the dataset have the same prefix map. The PrefixMap is 
> thread-safe but isn't transactional (possible future work if needed).
> TDB1, TDB2: These have there own, more general prefix storage but the 
> additional feature is not exposed. All graphs in the dataset have the same 
> prefix map. There is no change to on-disk format.
> SDB: As before. There is no change to on-disk format.
> The nulls (DatasetGraphZero and DatasetGraphSink): Sink is "forget updates", 
> Zero is "empty, no updates": Suitably misbehaved implemented of the 
> {{PrefixMap}} API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (JENA-2006) Dataset prefixes

Reply via email to