[jira] [Commented] (NIFI-2681) Avoid caching Provenance Index Searchers
[ https://issues.apache.org/jira/browse/NIFI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15467818#comment-15467818 ] ASF subversion and git services commented on NIFI-2681: --- Commit 088125451b7f15684752a914a3a83a834fac1a50 in nifi's branch refs/heads/master from [~markap14] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=0881254 ] NIFI-2681: Refactored IndexManager into an interface and renamed the existing implementation to CachingIndexManager. Implemented a new SimpleIndexManager that performs no caching of IndexSearchers. This closes #958. Signed-off-by: Bryan Bende > Avoid caching Provenance Index Searchers > > > Key: NIFI-2681 > URL: https://issues.apache.org/jira/browse/NIFI-2681 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Critical > Fix For: 1.1.0 > > > In NIFI-2600 and NIFI-2452, we addressed two bugs where the Provenance > Repository closes a cached IndexSearcher too soon. The IndexManager keeps the > searchers cached in an effort to offer better performance when performing a > Provenance Query. This was done because it was recommended in the Lucene > documentation. However, we occasionally still see nodes crashing with > segfaults due to the Lucene Searching. We should update the Persistent > Provenance Repository to stop caching Index Searchers in order to trade a > slight performance improvement for significantly better reliability. > Playing around with the idea in order to test it out shows very favorable > results. On a system where I could cause a seg fault almost every time that I > ran a large provenance query, I updated the code to no longer cache the > readers and saw perfect stability with no noticeable performance degradation. > I will cleanup the code and submit a PR for these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2681) Avoid caching Provenance Index Searchers
[ https://issues.apache.org/jira/browse/NIFI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15467820#comment-15467820 ] ASF GitHub Bot commented on NIFI-2681: -- Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/958 > Avoid caching Provenance Index Searchers > > > Key: NIFI-2681 > URL: https://issues.apache.org/jira/browse/NIFI-2681 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Critical > Fix For: 1.1.0 > > > In NIFI-2600 and NIFI-2452, we addressed two bugs where the Provenance > Repository closes a cached IndexSearcher too soon. The IndexManager keeps the > searchers cached in an effort to offer better performance when performing a > Provenance Query. This was done because it was recommended in the Lucene > documentation. However, we occasionally still see nodes crashing with > segfaults due to the Lucene Searching. We should update the Persistent > Provenance Repository to stop caching Index Searchers in order to trade a > slight performance improvement for significantly better reliability. > Playing around with the idea in order to test it out shows very favorable > results. On a system where I could cause a seg fault almost every time that I > ran a large provenance query, I updated the code to no longer cache the > readers and saw perfect stability with no noticeable performance degradation. > I will cleanup the code and submit a PR for these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2681) Avoid caching Provenance Index Searchers
[ https://issues.apache.org/jira/browse/NIFI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15467814#comment-15467814 ] ASF GitHub Bot commented on NIFI-2681: -- Github user bbende commented on the issue: https://github.com/apache/nifi/pull/958 +1 full build passes and provenance appears to work fine, will merge to master > Avoid caching Provenance Index Searchers > > > Key: NIFI-2681 > URL: https://issues.apache.org/jira/browse/NIFI-2681 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Critical > Fix For: 1.1.0 > > > In NIFI-2600 and NIFI-2452, we addressed two bugs where the Provenance > Repository closes a cached IndexSearcher too soon. The IndexManager keeps the > searchers cached in an effort to offer better performance when performing a > Provenance Query. This was done because it was recommended in the Lucene > documentation. However, we occasionally still see nodes crashing with > segfaults due to the Lucene Searching. We should update the Persistent > Provenance Repository to stop caching Index Searchers in order to trade a > slight performance improvement for significantly better reliability. > Playing around with the idea in order to test it out shows very favorable > results. On a system where I could cause a seg fault almost every time that I > ran a large provenance query, I updated the code to no longer cache the > readers and saw perfect stability with no noticeable performance degradation. > I will cleanup the code and submit a PR for these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2681) Avoid caching Provenance Index Searchers
[ https://issues.apache.org/jira/browse/NIFI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15467763#comment-15467763 ] ASF GitHub Bot commented on NIFI-2681: -- Github user markap14 commented on a diff in the pull request: https://github.com/apache/nifi/pull/958#discussion_r77664996 --- Diff: nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/lucene/CachingIndexManager.java --- @@ -0,0 +1,535 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.provenance.lucene; + +import java.io.Closeable; +import java.io.File; +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReentrantLock; + +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.standard.StandardAnalyzer; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriter; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FSDirectory; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class CachingIndexManager implements Closeable, IndexManager { +private static final Logger logger = LoggerFactory.getLogger(CachingIndexManager.class); + +private final Lock lock = new ReentrantLock(); +private final Map writerCounts = new HashMap<>(); +private final Map> activeSearchers = new HashMap<>(); + + +public void removeIndex(final File indexDirectory) { +final File absoluteFile = indexDirectory.getAbsoluteFile(); +logger.info("Removing index {}", indexDirectory); + +lock.lock(); +try { +final IndexWriterCount count = writerCounts.remove(absoluteFile); +if ( count != null ) { +try { +count.close(); +} catch (final IOException ioe) { +logger.warn("Failed to close Index Writer {} for {}", count.getWriter(), absoluteFile); +if ( logger.isDebugEnabled() ) { +logger.warn("", ioe); +} +} +} + +for ( final List searcherList : activeSearchers.values() ) { --- End diff -- @bbende wow - great catch. That definitely should have only closed the index searchers for the abosluteFile. Addressed and pushed. > Avoid caching Provenance Index Searchers > > > Key: NIFI-2681 > URL: https://issues.apache.org/jira/browse/NIFI-2681 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Critical > Fix For: 1.1.0 > > > In NIFI-2600 and NIFI-2452, we addressed two bugs where the Provenance > Repository closes a cached IndexSearcher too soon. The IndexManager keeps the > searchers cached in an effort to offer better performance when performing a > Provenance Query. This was done because it was recommended in the Lucene > documentation. However, we occasionally still see nodes crashing with > segfaults due to the Lucene Searching. We should update the Persistent > Provenance Repository to stop caching Index Searchers in order to trade a > slight performance improvement for significantly better reliability. > Playing around with the idea in order to test it out shows very favorable > results. On a system where I could cause a seg fault almost every
[jira] [Commented] (NIFI-2681) Avoid caching Provenance Index Searchers
[ https://issues.apache.org/jira/browse/NIFI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15467730#comment-15467730 ] ASF GitHub Bot commented on NIFI-2681: -- Github user bbende commented on a diff in the pull request: https://github.com/apache/nifi/pull/958#discussion_r77661605 --- Diff: nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/lucene/CachingIndexManager.java --- @@ -0,0 +1,535 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.provenance.lucene; + +import java.io.Closeable; +import java.io.File; +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReentrantLock; + +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.standard.StandardAnalyzer; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriter; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FSDirectory; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class CachingIndexManager implements Closeable, IndexManager { +private static final Logger logger = LoggerFactory.getLogger(CachingIndexManager.class); + +private final Lock lock = new ReentrantLock(); +private final Map writerCounts = new HashMap<>(); +private final Map> activeSearchers = new HashMap<>(); + + +public void removeIndex(final File indexDirectory) { +final File absoluteFile = indexDirectory.getAbsoluteFile(); +logger.info("Removing index {}", indexDirectory); + +lock.lock(); +try { +final IndexWriterCount count = writerCounts.remove(absoluteFile); +if ( count != null ) { +try { +count.close(); +} catch (final IOException ioe) { +logger.warn("Failed to close Index Writer {} for {}", count.getWriter(), absoluteFile); +if ( logger.isDebugEnabled() ) { +logger.warn("", ioe); +} +} +} + +for ( final List searcherList : activeSearchers.values() ) { --- End diff -- Wouldn't we want to get the List for the absoluteFile, rather than every active searcher? > Avoid caching Provenance Index Searchers > > > Key: NIFI-2681 > URL: https://issues.apache.org/jira/browse/NIFI-2681 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Critical > Fix For: 1.1.0 > > > In NIFI-2600 and NIFI-2452, we addressed two bugs where the Provenance > Repository closes a cached IndexSearcher too soon. The IndexManager keeps the > searchers cached in an effort to offer better performance when performing a > Provenance Query. This was done because it was recommended in the Lucene > documentation. However, we occasionally still see nodes crashing with > segfaults due to the Lucene Searching. We should update the Persistent > Provenance Repository to stop caching Index Searchers in order to trade a > slight performance improvement for significantly better reliability. > Playing around with the idea in order to test it out shows very favorable > results. On a system where I could cause a seg fault almost every time that I > ran a large provenance que
[jira] [Commented] (NIFI-2681) Avoid caching Provenance Index Searchers
[ https://issues.apache.org/jira/browse/NIFI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15467648#comment-15467648 ] ASF GitHub Bot commented on NIFI-2681: -- Github user bbende commented on the issue: https://github.com/apache/nifi/pull/958 Reviewing... > Avoid caching Provenance Index Searchers > > > Key: NIFI-2681 > URL: https://issues.apache.org/jira/browse/NIFI-2681 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Critical > Fix For: 1.1.0 > > > In NIFI-2600 and NIFI-2452, we addressed two bugs where the Provenance > Repository closes a cached IndexSearcher too soon. The IndexManager keeps the > searchers cached in an effort to offer better performance when performing a > Provenance Query. This was done because it was recommended in the Lucene > documentation. However, we occasionally still see nodes crashing with > segfaults due to the Lucene Searching. We should update the Persistent > Provenance Repository to stop caching Index Searchers in order to trade a > slight performance improvement for significantly better reliability. > Playing around with the idea in order to test it out shows very favorable > results. On a system where I could cause a seg fault almost every time that I > ran a large provenance query, I updated the code to no longer cache the > readers and saw perfect stability with no noticeable performance degradation. > I will cleanup the code and submit a PR for these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2681) Avoid caching Provenance Index Searchers
[ https://issues.apache.org/jira/browse/NIFI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15442158#comment-15442158 ] ASF GitHub Bot commented on NIFI-2681: -- GitHub user markap14 opened a pull request: https://github.com/apache/nifi/pull/958 NIFI-2681: Refactored IndexManager into an interface and renamed the … …existing implementation to CachingIndexManager. Implemented a new SimpleIndexManager that performs no caching of IndexSearchers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/markap14/nifi NIFI-2681 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/958.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #958 commit 58c715cefaeaa9ef2b7d89cd751456da73f42953 Author: Mark Payne Date: 2016-08-27T00:03:16Z NIFI-2681: Refactored IndexManager into an interface and renamed the existing implementation to CachingIndexManager. Implemented a new SimpleIndexManager that performs no caching of IndexSearchers. > Avoid caching Provenance Index Searchers > > > Key: NIFI-2681 > URL: https://issues.apache.org/jira/browse/NIFI-2681 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Critical > Fix For: 1.1.0 > > > In NIFI-2600 and NIFI-2452, we addressed two bugs where the Provenance > Repository closes a cached IndexSearcher too soon. The IndexManager keeps the > searchers cached in an effort to offer better performance when performing a > Provenance Query. This was done because it was recommended in the Lucene > documentation. However, we occasionally still see nodes crashing with > segfaults due to the Lucene Searching. We should update the Persistent > Provenance Repository to stop caching Index Searchers in order to trade a > slight performance improvement for significantly better reliability. > Playing around with the idea in order to test it out shows very favorable > results. On a system where I could cause a seg fault almost every time that I > ran a large provenance query, I updated the code to no longer cache the > readers and saw perfect stability with no noticeable performance degradation. > I will cleanup the code and submit a PR for these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)