[
https://issues.apache.org/jira/browse/JENA-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509057#comment-13509057
]
Andy Seaborne commented on JENA-352:
------------------------------------
Alternative approach:
RIOT now handles the pseudo-URIs of the form <_:XYZ> using XYZas the internal
identifier for the bnode.
This has two uses:
1/ Use with dumps to restore exactly the old data (NB RIOT writes bnodes as
_:BXYZ i.e. leading "B" and also an encoded label).
2/ Processing large loads - either so the data can be split or simple to load a
very large file with bNodes.
Does not apply to RDF/XML.
As this is only partial solution, I've left the JIRA left open.
The seed+XOR the label (i.e. option 1) is better.
> Vast numbers of bNodes can overwhelm the parser
> -----------------------------------------------
>
> Key: JENA-352
> URL: https://issues.apache.org/jira/browse/JENA-352
> Project: Apache Jena
> Issue Type: Bug
> Components: RIOT, TDB
> Reporter: Andy Seaborne
> Priority: Minor
>
> The parsers need to keep a bNode label to bNode map which (unusual data) can
> grow too large. As it takes unusual data, rated as "minor".
> outline of solution:
> 1/ which to a bNode allocation scheme which is a large random number per
> file, and concat or XOR with the claimed bNode label to generate a unique
> label without state build up.
> 2/ (Turtle) don't remember [] bnodes past their usage scope.
> 3/ Partial - keep a sliding window of bNodes label amppings
> e.g.
> http://mail-archives.apache.org/mod_mbox/jena-users/201112.mbox/%[email protected]%3E
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira