afs commented on issue #3226: URL: https://github.com/apache/jena/issues/3226#issuecomment-2922838832
> extend the RoaringTripleStore to support the following indexing strategies: There is also a mode of "MINIMAL" - it just has the TripleSet and any find pattern is done as a scan. This would be for MANUAL-like uses where the purpose is to collect triples, but does allow testing at small scale by `find(s,p,o)`. Indexing: TDB indexing code is general even if normally indexing is complete. It always runs, it may involve a scan (partial or total). It chooses a partial index (e.g. pattern SP?, index SOP -> scan over S). One interesting indexing structure is `POS`+`PSO` (for merge joins) based on patterns that always have a fixed predicate, and and `POS` and `SPO` for slightly better coverage but no merge join. Both miss certain cases (OSP, SOP in the first case; OSP and a partial SOP in the second). > The following methods should be added to the graph / triple store: Not sure changing the Graph API for all graphs is a good idea. An interface that implementations can opt-into would be better. c.f. `BufferingCtl` > the memory footprint of the graph is only a fraction, compared to any other GraphMem2 implementation How big are the indexes compared to the triple set? A parser run does a good job of intern'ing terms (AKA using a dictionary) - there is a single-threaded slot-replacement cache of 5000 slots. It is a slot-replacement cache because it has to be fast - and gives means the same java object is used in triples for cache hits. It's in effect a form of sliding window. `FactoryRDFCaching`. As a same-subject in adjacent triples and vocabularies for properties, the space saving for what is quite a small cache is good - 30+%. An alternative is a StreamRDF to do that and keep that hanging around; works across graph as well and for API data generation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
