[
https://issues.apache.org/jira/browse/JENA-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114591#comment-15114591
]
ASF GitHub Bot commented on JENA-1122:
--------------------------------------
Github user bwmcbride commented on the pull request:
https://github.com/apache/jena/pull/123#issuecomment-174351980
Thanks for all the comments. I've updated the pull request implementing
all the proposed changes and made a few other minor amendments.
However, looking at @afs comments again, I realise I am heading off down
the wrong path. What @afs had suggested was not memoizing the text index
objects in TextDataset factory, which is what this PR currently does, but
memoizing dataset objects.
My bad. I thought the suggestion was to memoize the TextDataSetLucene
objects in TextDatasetFactory. That does I think solve JENA-1122 (as I
understand it) but, as Andy says, it opens up other nasty problems.
So I propose to switch to memoizing datasets in the assembler.
I'm sorry the rest of this is so long. The short version is, what is the
best way to go about managing state for memoizing datasets in assemblers?
The simple option is to just move the code in this pull request for
memoizing TextIndexLucene datasets from TextDatasetFactory to
TextIndexLuceneAssembler. That addresses my immediate problem, but a solution
that worked for datasets in general would be better.
A problem is if/when/how to clear out the memoized state.
In use cases where an application or a service starts up, assembles its
components and then does what it does until it terminates, there may not be
much of an issue leaving state around in the assembler maps used to relate
nodes to reusable objects.
In use cases where an application is repeatedly building and tearing down
assemblies during its lifetime then not clearing out the memoizing state can
lead to failures when a component is reused between 'builds'. The current
TextIndexLucene test cases do this and they fail if the memoizing map is not
cleared out between tests. Maybe the tests are broken.
The current code in this pull request clears a TextIndexLucene object out
of the memoizingmap when the TextIndexLucene object is closed. But that is
basically a kludge. I don't think it a good idea to go round changing all
existing dataset implementations to support event handling callbacks on close.
The way I would naturally expect things to work is for assemblers to have
some notion of a build. A build defines a context in which state like the
memoizing map can be built up. The build context can be thrown away at the end
of the build and a new one created for the next build. This would prevent
reuse across builds. So if you do:
foo = assembler.build(R);
bar = assembler.build(R);
you will get two different assemblies, typically with no sharing between
them. (You would still get a lock failure if R had a TextIndexLucene component
that was not closed between creating the two assemblies.)
As far as I can see assemblers are not designed to work like this. Nor can
I see how to add this notion of a build context without affecting many existing
assemblers. I may be missing something.
I have raised the question in the hope that someone will suggest an
approach to a more general solution.
> Fuseki fails to start if configured with two services that share the same
> dataset with a lucene index.
> ------------------------------------------------------------------------------------------------------
>
> Key: JENA-1122
> URL: https://issues.apache.org/jira/browse/JENA-1122
> Project: Apache Jena
> Issue Type: Bug
> Components: Text
> Affects Versions: Jena 3.0.0, Fuseki 2.3.0
> Reporter: Brian McBride
>
> This problem arises when the assemblers for the two services run. For each
> service, a separate TextIndexLucene object is created. Both of those objects
> try to lock the same Lucene index directory and one fails.
> A proposed fix is to modify the TextDatasetFactory to only create one
> TextIndexLucene object per on disk directory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)