[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user afs commented on the pull request: https://github.com/apache/jena/pull/123#issuecomment-177246334 Starting from the JENA-1122 description: > Two Fuseki services, linking to the same dataset description. Fuseki only calls assemblers once. No other system is (legitimately) calling Fuseki service building. The configuration file processing puts service access points into the server-wide state. There is no service assembler (it could be done but it isn't, it serves no purpose); it is done by custom processing during walking the datastructure which is happening anyway. In the Fuseki case, we want shared datasets descriptions, that is, same name, to yield the same dataset. Processing dataset descriptions is driven by assemblers and they have names for keys using the root resource. A general "dataset sharing" outside assemblers is hard because of the lack of key. In other cases, I can imagine that a shared description alone does not always imply a shared object - in-memory datasets for example. The more general area is not clearly defined. The solution I see is that Fuseki handles the process step for the link: ``` fuseki:dataset <#dataset> <#dataset> rdf:type ja:RDFDataset(OrSubType) ``` This happens in `Builder.buildDataService` as it calls `Assembler.general.open(datasetDesc)`. It looks to me that if sharing is provided here, the problem statement of JENA-1122 is addressed. One matter arising: Service descriptions can be in multiple files (it is the preferred pattern to use `configuration/`). The template system behind the UI uses relative URIs so names of descriptions are unique across the server. If a user manually writes two configuration files, but uses the same absolute URI and they meant it to be different, we have a problem and this could be made to cause an error (safe choice to force shared datasets to be in the server `config.ttl`). `FusekiConfig.initializeDataAccessPoints` is the driver, it calls `readConfigurationDirectory` and the others places service descriptions can be and so needs checking. For now, just solving this for two services in the server configuration file, with entries in the `fuseki:services` list links is a good start. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user afs commented on the pull request: https://github.com/apache/jena/pull/123#issuecomment-177243620 This PR protects against multiple creation of a text index (JENA-1104), not against two calls to create the same dataset for two services in Fuseki. By chance, TDB is less prone to problems if that happens but that is luck. General datasets e.g. with inference graphs, SDB or plain in-memory datasets are likely exposed to problems. Let's solve the immediate issue described in JENA-1122, then see if JENA-1104 needs addressing or whether the situations where it can still happen are uninteresting or have other problems in which case the application must be responsible for creating the index only once. For the record, there are some specific items with the current PR that I would like clarified or refuted before this code is used to address JENA-1107, if that is still needed. 1: `TextIndexLucene.close` is not reference counted. * Create text index -> T1 * Create text index -> T2 (which is T1, shared) * Close T2. * Any T1 code will now crash - the index is closed. 2: Using WeakReferences and managing `close()` seems to be duplicating lifecycle management. I am not clear that the WeakReference to the Lucene index helps because there are no finalizers, so GC fnalization does not tidy up lucene. A freed WeakReference would cause a new attempt to create the index but it will hit the state lock. 3: Creation by `createLuceneIndex` is not thread safe. It has a get-create-put timing hole. 4: Need to be clear on the contract for "same `Directory`, different Lucene configuration (`TextIndexConfig`)". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on the pull request: https://github.com/apache/jena/pull/123#issuecomment-174658906 Is there any understanding about the sharing of components between the results of calls to the various forms of `Assembler::open`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user bwmcbride commented on the pull request: https://github.com/apache/jena/pull/123#issuecomment-174351980 Thanks for all the comments. I've updated the pull request implementing all the proposed changes and made a few other minor amendments. However, looking at @afs comments again, I realise I am heading off down the wrong path. What @afs had suggested was not memoizing the text index objects in TextDataset factory, which is what this PR currently does, but memoizing dataset objects. My bad. I thought the suggestion was to memoize the TextDataSetLucene objects in TextDatasetFactory. That does I think solve JENA-1122 (as I understand it) but, as Andy says, it opens up other nasty problems. So I propose to switch to memoizing datasets in the assembler. I'm sorry the rest of this is so long. The short version is, what is the best way to go about managing state for memoizing datasets in assemblers? The simple option is to just move the code in this pull request for memoizing TextIndexLucene datasets from TextDatasetFactory to TextIndexLuceneAssembler. That addresses my immediate problem, but a solution that worked for datasets in general would be better. A problem is if/when/how to clear out the memoized state. In use cases where an application or a service starts up, assembles its components and then does what it does until it terminates, there may not be much of an issue leaving state around in the assembler maps used to relate nodes to reusable objects. In use cases where an application is repeatedly building and tearing down assemblies during its lifetime then not clearing out the memoizing state can lead to failures when a component is reused between 'builds'. The current TextIndexLucene test cases do this and they fail if the memoizing map is not cleared out between tests. Maybe the tests are broken. The current code in this pull request clears a TextIndexLucene object out of the memoizingmap when the TextIndexLucene object is closed. But that is basically a kludge. I don't think it a good idea to go round changing all existing dataset implementations to support event handling callbacks on close. The way I would naturally expect things to work is for assemblers to have some notion of a build. A build defines a context in which state like the memoizing map can be built up. The build context can be thrown away at the end of the build and a new one created for the next build. This would prevent reuse across builds. So if you do: foo = assembler.build(R); bar = assembler.build(R); you will get two different assemblies, typically with no sharing between them. (You would still get a lock failure if R had a TextIndexLucene component that was not closed between creating the two assemblies.) As far as I can see assemblers are not designed to work like this. Nor can I see how to add this notion of a build context without affecting many existing assemblers. I may be missing something. I have raised the question in the hope that someone will suggest an approach to a more general solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on the pull request: https://github.com/apache/jena/pull/123#issuecomment-174311885 There has been some [discussion](http://markmail.org/message/6uab6glusw7lqpmz) about the commonalities between `jena-text` and `jena-spatial`. Is the same problem likely to come up with `jena-spatial`? Maybe it's worth doing the assembler-based fix because of that? (Sorry if I'm really off-base here, just trying to avoid any potential future problems.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on the pull request: https://github.com/apache/jena/pull/123#issuecomment-174311642 Thanks for the clarification re: `Mode`. Should the Javadocs for `Mode` say that something like that it "advises implementations of user intention, but does not constrain implementation behavior"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user afs commented on the pull request: https://github.com/apache/jena/pull/123#issuecomment-174309243 As far as I can see, only an assembler solution will address [JENA-1122](https://issues.apache.org/jira/browse/JENA-1122). @bwmcbride - do you believe this PR alone will address JENA-1122? Has this PR been run under load? By protecting the text index alone, I see a new class of concurrency issues arising. A shared text index between two different TDB datasets will go wrong on update. At the moment, you can't create that setup because of JENA-1104. Other setups will have concurrency problems. *If* this works in a "one dataset, must be TDB, only in Fuseki" setup, it only protects against [JENA-1104](https://issues.apache.org/jira/browse/JENA-1104). If then use of TDB happens to work at the moment, it is based on internal implementation handling in Fuseki. That is very fragile. This PR does no harm though it is a bit more complicated than needed because a solution to the root cause [JENA-1122](https://issues.apache.org/jira/browse/JENA-1122) may well remove the need for some of the machinery. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user afs commented on the pull request: https://github.com/apache/jena/pull/123#issuecomment-174306928 `org.apache.jena.assembler.Mode` does not help unfortunately. It assists with passing down a requirement to share for certain setups that work as tightly coordinated groups but that isn't good enough here (it's a hint of unclear meaning really, not an instruction). It should not be in the primary interface method but it got there a long time ago. 1. How do you decide when to set it in the first place? 1. When set, why would assembler know another should be shared? It assumes knowledge about implementation but the target may be a subtype or specialization. e.g. TDB internally manages sharing because only TDB knows what must be shared. But a TDB dataset assembler is a subclass of DatasetRDF assembler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user bwmcbride commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50635465 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextDatasetFactory.java --- @@ -191,5 +213,50 @@ public static DatasetGraph createSolrIndex(DatasetGraph base, SolrServer server, TextIndex index = createSolrIndex(server, entMap) ; return create(base, index, true) ; } + +protected static class LuceneIndexMap { + + private static HashMap> map = new HashMap>(); + + protected synchronized TextIndex get(Directory directory) { --- End diff -- Thanks for the sketch. I've tried something based on it and does look better. Will push shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on the pull request: https://github.com/apache/jena/pull/123#issuecomment-174299109 Just as a sidenote, isn't `org.apache.jena.assembler.Mode` intended to control whether new things are created or old things are reused during the assembly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user bwmcbride commented on the pull request: https://github.com/apache/jena/pull/123#issuecomment-174298772 The main problem I had was figuring out how to manage the static state so it was consistent with the lifecyle implemented by TextIndexLucene and avoid memory leaks on long running systems. Hence the call backs. That is an issue that is common to both API and assembler construction. The assembler mechanism is not designed to handle graph (or even lattice) structured composition of components. According to Chris, it is by design that It does a recursive decent over the structure and assembles a new thing each time a node is encountered. Perhaps there is a need for an assembly mechanism designed to handle graph structures. Would also be useful to have some notion of an assember 'session', that is the process of building a complete structure, during which session state could be built up and thrown away at the end. Maybe there is one and I didn;t find it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user afs commented on the pull request: https://github.com/apache/jena/pull/123#issuecomment-174066342 **Design** Protecting the text index this way sort of works for TDB specifically because of an internal feature of TDB (it manages storage to stop duplication) which is not a guaranteed feature. Other dataset implementations will not work out so nicely. It will be like two separate datasets and one index will and probably lead to corruption or inconsistent reading (c.f. email ["transactions and docProducers"](http://mail-archives.apache.org/mod_mbox/jena-users/201601.mbox/%3C568FD70B.8060301%40epimorphics.com%3E)). On [JENA-1122](https://issues.apache.org/jira/browse/JENA-1122) I summarized discussions up to here as two options suggested: 1. Internal static state in `TextDatasetFactory` that the same datasets object is returned each time. c.f. TDB's StoreConnection. Extends sharing of text datasets to work with java/API uses but not "any dataset" in Fuseki configurations. 2. Fuseki (or in `DatasetAssembler` maybe) assembling datasets deals with sharing using the graph structure. This copes with any dataset but not API use. The first one looks hard because of choosing the key to include the dataset in the general case. The second one is easier to do because there is a natural key of the resource (URI, bnode) for the dataset. Bonus would a similar per-text index assembler check on reuse [JENA-1104](https://issues.apache.org/jira/browse/JENA-1104). There is one minor point - Fuseki can have multiple assembler files and badly chosen, clashing dataset URIs (solution - keep a list of all URIs acorss assembler configs - useful check anyway) The ideal for [JENA-1122](https://issues.apache.org/jira/browse/JENA-1122) is this PR (simplified?) to protect text indexes and (2) above to allow complex configurations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user bwmcbride commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50553029 --- Diff: jena-text/src/test/java/org/apache/jena/query/text/assembler/TestTextIndexLuceneAssembler.java --- @@ -97,6 +97,14 @@ index.close(); } } + +@Test public void testOneIndexPerDirectoryMem() { --- End diff -- Yes. Will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50548291 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextDatasetFactory.java --- @@ -191,5 +213,50 @@ public static DatasetGraph createSolrIndex(DatasetGraph base, SolrServer server, TextIndex index = createSolrIndex(server, entMap) ; return create(base, index, true) ; } + +protected static class LuceneIndexMap { + + private static HashMap> map = new HashMap>(); + + protected synchronized TextIndex get(Directory directory) { --- End diff -- Fair enough. I will try and sketch something quickly and if you want (if you think it is clearer), you can use it, or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user bwmcbride commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50546853 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextDatasetFactory.java --- @@ -191,5 +213,50 @@ public static DatasetGraph createSolrIndex(DatasetGraph base, SolrServer server, TextIndex index = createSolrIndex(server, entMap) ; return create(base, index, true) ; } + +protected static class LuceneIndexMap { + + private static HashMap> map = new HashMap>(); + + protected synchronized TextIndex get(Directory directory) { + Object key = key(directory); + WeakReference ref = map.get(key); + if (ref == null) + return null; + TextIndex result = ref.get(); + if (result == null) + map.put(key, null); + return result; + } + + protected synchronized void put(Directory directory, TextIndex index) { + map.put(key(directory), new WeakReference(index)); + } + + protected synchronized void remove(Directory directory) { + map.put(key(directory), null); + } + + private Object key(Directory directory) { + // uglyness alert + if (directory instanceof FSDirectory) { + return getFilePath( ( (FSDirectory) directory).getDirectory() ); + } else if (directory instanceof CompoundFileDirectory) { + return getFilePath( ( (FSDirectory) directory).getDirectory() ); --- End diff -- Yes - thank you. Will fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user bwmcbride commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50546729 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextDatasetFactory.java --- @@ -191,5 +213,50 @@ public static DatasetGraph createSolrIndex(DatasetGraph base, SolrServer server, TextIndex index = createSolrIndex(server, entMap) ; return create(base, index, true) ; } + +protected static class LuceneIndexMap { + + private static HashMap> map = new HashMap>(); + + protected synchronized TextIndex get(Directory directory) { --- End diff -- Yes, there is some commonality. It may be possible to factor something out from both. There are differences. One uses a WeakHashMap and the other a normal HashMap and the handling of keys is a little different. I reckoned the common functionality was not worth factoring out. Ymmv. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50546299 --- Diff: jena-text/src/test/java/org/apache/jena/query/text/assembler/TestTextIndexLuceneAssembler.java --- @@ -97,6 +97,14 @@ index.close(); } } + +@Test public void testOneIndexPerDirectoryMem() { --- End diff -- Should there be a corresponding test case that validates that when you have two different nodes specified as RAM indexes you get two different `TextIndexLucene` instances? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50546226 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/assembler/TextIndexLuceneAssembler.java --- @@ -61,7 +74,12 @@ public TextIndex open(Assembler a, Resource root, Mode mode) { if ( n.isLiteral() ) { String literalValue = n.asLiteral().getLexicalForm() ; if (literalValue.equals("mem")) { -directory = new RAMDirectory() ; + // on RAMDirectory per node, not one per invocation + directory = ramDirectories.get(root); + if (directory == null) { +directory = new RAMDirectory() ; +ramDirectories.put(root, (RAMDirectory) directory); --- End diff -- Sorry-- didn't look hard enough! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user bwmcbride commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50546204 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java --- @@ -399,4 +404,24 @@ private Node entryToNode(String v) { // TEMP return NodeFactoryExtra.createLiteralNode(v, null, null) ; } + + +// set up event handlers + +public enum Event { CLOSED }; + +public static interface EventHandler { + void handle(TextIndexLucene index); +} + +private Map> eventHandlers = new HashMap>(); + +public void addEventHandler(Event event, EventHandler handler) { + List handlers = eventHandlers.get(event); --- End diff -- Yes it is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50546051 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextDatasetFactory.java --- @@ -191,5 +213,50 @@ public static DatasetGraph createSolrIndex(DatasetGraph base, SolrServer server, TextIndex index = createSolrIndex(server, entMap) ; return create(base, index, true) ; } + +protected static class LuceneIndexMap { + + private static HashMap> map = new HashMap>(); + + protected synchronized TextIndex get(Directory directory) { + Object key = key(directory); + WeakReference ref = map.get(key); + if (ref == null) + return null; + TextIndex result = ref.get(); + if (result == null) + map.put(key, null); + return result; + } + + protected synchronized void put(Directory directory, TextIndex index) { + map.put(key(directory), new WeakReference(index)); + } + + protected synchronized void remove(Directory directory) { + map.put(key(directory), null); + } + + private Object key(Directory directory) { + // uglyness alert + if (directory instanceof FSDirectory) { + return getFilePath( ( (FSDirectory) directory).getDirectory() ); + } else if (directory instanceof CompoundFileDirectory) { + return getFilePath( ( (FSDirectory) directory).getDirectory() ); --- End diff -- Isn't this an invalid cast, you check for `instanceof CompoundFileDirectory` and then proceed to cast to `FSDirectory` which is going to fail Unless `CompoundFileDirectory` is actally a type derived from `FSDirectory` in which case the previous branch of the `if` would have caught it already and this branch can never be reached --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user bwmcbride commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50545658 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/assembler/TextIndexLuceneAssembler.java --- @@ -61,7 +74,12 @@ public TextIndex open(Assembler a, Resource root, Mode mode) { if ( n.isLiteral() ) { String literalValue = n.asLiteral().getLexicalForm() ; if (literalValue.equals("mem")) { -directory = new RAMDirectory() ; + // on RAMDirectory per node, not one per invocation + directory = ramDirectories.get(root); + if (directory == null) { +directory = new RAMDirectory() ; +ramDirectories.put(root, (RAMDirectory) directory); --- End diff -- ... and would give a compilation error on line 85. Its easy to get rid of the cast if its a problem, but not like that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50542029 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextDatasetFactory.java --- @@ -191,5 +213,50 @@ public static DatasetGraph createSolrIndex(DatasetGraph base, SolrServer server, TextIndex index = createSolrIndex(server, entMap) ; return create(base, index, true) ; } + +protected static class LuceneIndexMap { + + private static HashMap> map = new HashMap>(); + + protected synchronized TextIndex get(Directory directory) { --- End diff -- It seems like there is common logic between this method and `RAMDirectoryMap::get`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50541229 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/assembler/TextIndexLuceneAssembler.java --- @@ -61,7 +74,12 @@ public TextIndex open(Assembler a, Resource root, Mode mode) { if ( n.isLiteral() ) { String literalValue = n.asLiteral().getLexicalForm() ; if (literalValue.equals("mem")) { -directory = new RAMDirectory() ; + // on RAMDirectory per node, not one per invocation + directory = ramDirectories.get(root); + if (directory == null) { +directory = new RAMDirectory() ; +ramDirectories.put(root, (RAMDirectory) directory); --- End diff -- Changing `Directory directory;` above to `RAMDirectory directory;` would remove the cast. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50540841 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/assembler/TextIndexLuceneAssembler.java --- @@ -123,10 +141,35 @@ public TextIndex open(Assembler a, Resource root, Mode mode) { config.setMultilingualSupport(isMultilingualSupport); config.setValueStored(storeValues); -return TextDatasetFactory.createLuceneIndex(directory, config) ; +TextIndexLucene index = TextDatasetFactory.createLuceneIndex(directory, config) ; +index.addEventHandler(TextIndexLucene.Event.CLOSED, + ( i ) -> ramDirectories.remove(root)); +return index; } catch (IOException e) { IO.exception(e) ; return null ; } } +protected static class RAMDirectoryMap { + + private static WeakHashMap> map = new WeakHashMap>(); + + protected synchronized RAMDirectory get(RDFNode node) { + WeakReference ref = map.get(node); + if (ref == null) + return null; + RAMDirectory result = ref.get(); + if (result == null) + map.put(node, null); + return result; + } + + protected synchronized void put(RDFNode node, RAMDirectory directory) { + map.put(node, new WeakReference(directory)); --- End diff -- Could be little shorter with `new WeakReference<>(directory)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50538211 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java --- @@ -399,4 +404,24 @@ private Node entryToNode(String v) { // TEMP return NodeFactoryExtra.createLiteralNode(v, null, null) ; } + + +// set up event handlers + +public enum Event { CLOSED }; + +public static interface EventHandler { + void handle(TextIndexLucene index); +} + +private Map> eventHandlers = new HashMap>(); --- End diff -- You can use just `new HashMap<>()` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50538146 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/assembler/TextIndexLuceneAssembler.java --- @@ -47,8 +59,9 @@ text:entityMap <#endMap> ; . */ + + private static RAMDirectoryMap ramDirectories = new RAMDirectoryMap();; --- End diff -- Just a typo, but two semicolons here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/123#discussion_r50537882 --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java --- @@ -399,4 +404,24 @@ private Node entryToNode(String v) { // TEMP return NodeFactoryExtra.createLiteralNode(v, null, null) ; } + + +// set up event handlers + +public enum Event { CLOSED }; + +public static interface EventHandler { + void handle(TextIndexLucene index); +} + +private Map> eventHandlers = new HashMap>(); + +public void addEventHandler(Event event, EventHandler handler) { + List handlers = eventHandlers.get(event); --- End diff -- I think this and the `if-then` below might just be `eventHandlers.computeIfAbsent(event, e -> new ArrayList<>())`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add memoizing of LuceneTextIndexes so that ther...
GitHub user bwmcbride opened a pull request: https://github.com/apache/jena/pull/123 Add memoizing of LuceneTextIndexes so that there is one TextIndexLucene JENA-1122 These changes memoize LuceneTextIndexes so that there is one per directory, and Lucene RAMDirectories created by the Lucene assembler so there is one RAMDirectory per node in the configuration graph. One issue is when to forget a memoized object. The policy implemented in this code is to forget the object when it is closed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/epimorphics/jena JENA-1122 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/123.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #123 commit d0451f1fda636aa9fa0915e4bedd3507cea54e7a Author: Brian McBride Date: 2016-01-21T12:32:20Z Add memoizing of LuceneTextIndexes so that there is one TextIndexLucene object per directory. Similary so the Lucene assembler only creates one RAMDirectory per node. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---