[ 
https://issues.apache.org/jira/browse/JENA-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114591#comment-15114591
 ] 

ASF GitHub Bot commented on JENA-1122:
--------------------------------------

Github user bwmcbride commented on the pull request:

    https://github.com/apache/jena/pull/123#issuecomment-174351980
  
    Thanks for all the comments.  I've updated the pull request implementing 
all the proposed changes and made a few other minor amendments.
    
    However, looking at @afs comments again, I realise I am heading off down 
the wrong path.  What @afs had suggested was not memoizing the text index 
objects in TextDataset factory, which is what this PR currently does, but 
memoizing  dataset objects.
    
    My bad.  I thought the suggestion was to memoize the TextDataSetLucene 
objects in TextDatasetFactory.  That does I think solve JENA-1122 (as I 
understand it) but, as Andy says, it opens up other nasty problems.
    
    So I propose to switch to memoizing datasets in the assembler.  
    
    I'm sorry the rest of this is so long.  The short version is, what is the 
best way to go about managing state for memoizing datasets in assemblers?
    
    The simple option is to just move the code in this pull request for 
memoizing TextIndexLucene datasets from TextDatasetFactory to 
TextIndexLuceneAssembler.  That addresses my immediate problem, but a solution 
that worked for datasets in general would be better.
    
    A problem is if/when/how to clear out the memoized state.
    
    In use cases where an application or a service starts up, assembles its 
components and then does what it does until it terminates, there may not be 
much of an issue leaving state around in the assembler maps used to relate 
nodes to reusable objects.
    
    In use cases where an application is repeatedly building and tearing down 
assemblies during its lifetime then not clearing out the memoizing state can 
lead to failures when a component is reused between 'builds'.  The current 
TextIndexLucene test cases do this and they fail if the memoizing map is not 
cleared out between tests.  Maybe the tests are broken.
    
    The current code in this pull request clears a TextIndexLucene object out 
of the memoizingmap when the TextIndexLucene object is closed.  But that is 
basically a kludge.  I don't think it a good idea to go round changing all 
existing dataset implementations to support event handling callbacks on close.
    
    The way I would naturally expect things to work is for assemblers to have 
some notion of a build.  A build defines a context in which state like the 
memoizing map can be built up.  The build context can be thrown away at the end 
of the build and a new one created for the next build.  This would prevent 
reuse across builds.  So if you do:
    
      foo = assembler.build(R);
      bar = assembler.build(R);
    
    you will get two different assemblies, typically with no sharing between 
them.  (You would still get a lock failure if R had a TextIndexLucene component 
that was not closed between creating the two assemblies.)
    
    As far as I can see assemblers are not designed to work like this.  Nor can 
I see how to add this notion of a build context without affecting many existing 
assemblers.  I may be missing something.
    
    I have raised the question in the hope that someone will suggest an 
approach to a more general solution. 
    
    
    
    
    
    
    
    
    
    
    
    
    



> Fuseki fails to start if configured with two services that share the same 
> dataset with a lucene index.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-1122
>                 URL: https://issues.apache.org/jira/browse/JENA-1122
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Text
>    Affects Versions: Jena 3.0.0, Fuseki 2.3.0
>            Reporter: Brian McBride
>
> This problem arises when the assemblers for the two services run.  For each 
> service, a separate TextIndexLucene object is created.  Both of those objects 
> try to lock the same Lucene index directory and one fails.
> A proposed fix is to modify the TextDatasetFactory to only create one 
> TextIndexLucene object per on disk directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to