afs commented on issue #3226:
URL: https://github.com/apache/jena/issues/3226#issuecomment-2922838832

   > extend the RoaringTripleStore to support the following indexing strategies:
   
   There is also a mode of "MINIMAL" - it just has the TripleSet and any find 
pattern is done as a scan.
   
   This would be for MANUAL-like uses where the purpose is to collect triples, 
but does allow testing at small scale by `find(s,p,o)`.
   
   
   Indexing:
   
   TDB indexing code is general even if normally indexing is complete. It 
always runs, it may involve a scan (partial or total). It chooses a partial 
index (e.g. pattern SP?, index SOP -> scan over S).
   
   One interesting indexing structure is `POS`+`PSO` (for merge joins) based on 
patterns that always have a fixed predicate, and and `POS` and `SPO` for 
slightly better coverage but no merge join.  Both miss certain cases (OSP, SOP 
in the first case; OSP and a partial SOP in the second).
   
   > The following methods should be added to the graph / triple store:
   
   Not sure changing the Graph API for all graphs is a good idea. An interface 
that implementations can opt-into would be better. c.f. `BufferingCtl`
   
   > the memory footprint of the graph is only a fraction, compared to any 
other GraphMem2 implementation
   
   How big are the indexes compared to the triple set?
   
   A parser run does a good job of intern'ing terms (AKA using a dictionary) - 
there is a single-threaded slot-replacement cache of 5000 slots. It is a 
slot-replacement cache because it has to be fast - and gives means the same 
java object is used in triples for cache hits. It's in effect a form of sliding 
window. `FactoryRDFCaching`. 
   
   As a same-subject in adjacent triples and vocabularies for properties, the 
space saving for what is quite a small cache is good - 30+%.
   
   An alternative is a StreamRDF to do that and keep that hanging around; works 
across graph as well and for API data generation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to