[jira] [Commented] (JENA-1122) Fuseki fails to start if configured with two services that share the same dataset with a lucene index.

ASF GitHub Bot (JIRA) Fri, 22 Jan 2016 14:00:00 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113186#comment-15113186
 ]


ASF GitHub Bot commented on JENA-1122:
--------------------------------------

Github user afs commented on the pull request:

    https://github.com/apache/jena/pull/123#issuecomment-174066342
  
    **Design**
    
    Protecting the text index this way sort of works for TDB specifically 
because of an internal feature of TDB (it manages storage to stop duplication) 
which is not a guaranteed feature. Other dataset implementations will not work 
out so nicely. It will be like two separate datasets and one index will and 
probably lead to corruption or inconsistent reading (c.f. email ["transactions 
and 
docProducers"](http://mail-archives.apache.org/mod_mbox/jena-users/201601.mbox/%3C568FD70B.8060301%40epimorphics.com%3E)).
    
    On [JENA-1122](https://issues.apache.org/jira/browse/JENA-1122) I 
summarized discussions up to here as two options suggested:
    
    1. Internal static state in `TextDatasetFactory` that the same datasets 
object is returned each time. c.f. TDB's StoreConnection. Extends sharing of 
text datasets to work with java/API uses but not "any dataset" in Fuseki 
configurations.
    2. Fuseki (or in `DatasetAssembler` maybe) assembling datasets deals with 
sharing using the graph structure. This copes with any dataset but not API use. 
    
    The first one looks hard because of choosing the key to include the dataset 
in the general case.
    
    The second one is easier to do because there is a natural key of the 
resource (URI, bnode) for the dataset. Bonus would a similar per-text index 
assembler check on reuse 
[JENA-1104](https://issues.apache.org/jira/browse/JENA-1104).
    
    There is one minor point - Fuseki can have multiple assembler files and 
badly chosen, clashing dataset URIs (solution - keep a list of all URIs acorss 
assembler configs - useful check anyway)
    
    The ideal for [JENA-1122](https://issues.apache.org/jira/browse/JENA-1122) 
is this PR (simplified?) to protect text indexes and (2) above to allow complex 
configurations.


> Fuseki fails to start if configured with two services that share the same 
> dataset with a lucene index.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-1122
>                 URL: https://issues.apache.org/jira/browse/JENA-1122
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Text
>    Affects Versions: Jena 3.0.0, Fuseki 2.3.0
>            Reporter: Brian McBride
>
> This problem arises when the assemblers for the two services run.  For each 
> service, a separate TextIndexLucene object is created.  Both of those objects 
> try to lock the same Lucene index directory and one fails.
> A proposed fix is to modify the TextDatasetFactory to only create one 
> TextIndexLucene object per on disk directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JENA-1122) Fuseki fails to start if configured with two services that share the same dataset with a lucene index.

Reply via email to