[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...

bwmcbride Sun, 24 Jan 2016 15:25:26 -0800

Github user bwmcbride commented on the pull request:

    https://github.com/apache/jena/pull/123#issuecomment-174351980
  
    Thanks for all the comments.  I've updated the pull request implementing 
all the proposed changes and made a few other minor amendments.
    
    However, looking at @afs comments again, I realise I am heading off down 
the wrong path.  What @afs had suggested was not memoizing the text index 
objects in TextDataset factory, which is what this PR currently does, but 
memoizing  dataset objects.
    
    My bad.  I thought the suggestion was to memoize the TextDataSetLucene 
objects in TextDatasetFactory.  That does I think solve JENA-1122 (as I 
understand it) but, as Andy says, it opens up other nasty problems.
    
    So I propose to switch to memoizing datasets in the assembler.  
    
    I'm sorry the rest of this is so long.  The short version is, what is the 
best way to go about managing state for memoizing datasets in assemblers?
    
    The simple option is to just move the code in this pull request for 
memoizing TextIndexLucene datasets from TextDatasetFactory to 
TextIndexLuceneAssembler.  That addresses my immediate problem, but a solution 
that worked for datasets in general would be better.
    
    A problem is if/when/how to clear out the memoized state.
    
    In use cases where an application or a service starts up, assembles its 
components and then does what it does until it terminates, there may not be 
much of an issue leaving state around in the assembler maps used to relate 
nodes to reusable objects.
    
    In use cases where an application is repeatedly building and tearing down 
assemblies during its lifetime then not clearing out the memoizing state can 
lead to failures when a component is reused between 'builds'.  The current 
TextIndexLucene test cases do this and they fail if the memoizing map is not 
cleared out between tests.  Maybe the tests are broken.
    
    The current code in this pull request clears a TextIndexLucene object out 
of the memoizingmap when the TextIndexLucene object is closed.  But that is 
basically a kludge.  I don't think it a good idea to go round changing all 
existing dataset implementations to support event handling callbacks on close.
    
    The way I would naturally expect things to work is for assemblers to have 
some notion of a build.  A build defines a context in which state like the 
memoizing map can be built up.  The build context can be thrown away at the end 
of the build and a new one created for the next build.  This would prevent 
reuse across builds.  So if you do:
    
      foo = assembler.build(R);
      bar = assembler.build(R);
    
    you will get two different assemblies, typically with no sharing between 
them.  (You would still get a lock failure if R had a TextIndexLucene component 
that was not closed between creating the two assemblies.)
    
    As far as I can see assemblers are not designed to work like this.  Nor can 
I see how to add this notion of a build context without affecting many existing 
assemblers.  I may be missing something.
    
    I have raised the question in the hope that someone will suggest an 
approach to a more general solution.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...

Reply via email to