[
https://issues.apache.org/jira/browse/LUCENENET-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477706#comment-16477706
]
Shad Storhaug commented on LUCENENET-600:
-----------------------------------------
Thanks for the report, and sorry for the late reply.
In short, it is not considered to be a bad practice in Java to throw an
exception for control flow as it is in .NET. As a result, the Lucene codebase
is frequently heavily dependent on this practice as part of the design, and
since most of the port is line-by-line this bad practice made it into the C#
code.
In certain cases where the code could be easily isolated, it was refactored to
use Try-Function logic rather than throwing exceptions, but in other areas such
as the IndexWriter, IndexReader, and QueryParser it would generally require a
major refactor of the design in order to change from using exceptions to
another form of control flow because the exceptions travel through several
layers of the call stack before they are finally caught.
That said, with QueryParser, there are a couple of possibilities to fix this:
# Create a refactored query parser that doesn't throw exceptions manually. See
the following for examples (probably out of date)
**
[Exceptionless.LuceneQueryParser|https://github.com/Xamarui/Exceptionless.LuceneQueryParser]
** [Patch for QueryParser to avoid throwing lots of exceptions that slows down
the debugger|https://github.com/apache/lucenenet/pull/131]
# The QueryParser in Java is generated based on a template using JFlex, if a
similar generator exists for .NET, then the template could be used to generate
the QueryParser in C#. See [https://stackoverflow.com/q/2974630].
In the first case, we should probably make it a separate project/NuGet package
for it (possibly an unofficial one).
In the second case, we technically would be following suit with Lucene so we
could probably replace QueryParser with the generated one, but we should give
it some thorough testing before doing so and provide a way for people to use
the original one (renamed) if they need to. Of course, that assumes that the
tool used will generate the equivalent business logic and will not catch
exceptions as part of the control flow, both of which are unknowns.
If you are analyzing the Lucene.NET code and find any obvious ways to optimize
it without causing negative effects, please feel free to suggest or open a PR.
> Creating an IndexWriter with a RAMDirectory causes two exceptions to be thrown
> ------------------------------------------------------------------------------
>
> Key: LUCENENET-600
> URL: https://issues.apache.org/jira/browse/LUCENENET-600
> Project: Lucene.Net
> Issue Type: Bug
> Components: Lucene.Net Core
> Affects Versions: Lucene.Net 4.8.0
> Reporter: Howard van Rooijen
> Priority: Minor
>
> I have a document scoring algorithm built on top of Lucene. I've just
> upgraded it to the 4.8.0-beta00005 packages (great job by the way).
> We essentially create an in memory index for a single document in order to do
> some parsing / processing / scoring / classification.
> I noticed while running our test suite that the CPU was spiking and also
> noticed that a large number of first chance exceptions were being generated
> by these two lines of code:
> {{var directory = new RAMDirectory();}}
> {{var indexWriter = new IndexWriter(directory, new
> IndexWriterConfig(LuceneVersion.LUCENE_48, new
> ScorableDocumentAnalyzer(LuceneVersion.LUCENE_48)));}}
> The first exception is:
> {{'System.IO.FileNotFoundException' in Lucene.Net.dll ("segments.gen"). }}
> The second exception is:
> {{'Lucene.Net.Index.IndexNotFoundException' in Lucene.Net.dll ("no segments*
> file found in RAMDirectory@21af1a5
> lockFactory=Lucene.Net.Store.SingleInstanceLockFactory:}}
> Based on reading / research, I believer this is because the RAMDirectory is
> initialised to be null, and when the IndexWriter is created it tries to query
> the RAMDirectory and FileNotFoundException is thrown.
> Is it possible to either initialized as empty rather than null - i.e. reading
> the directory would not throw an exception - this might involve trying to add
> an "segments.gen" entry and a matching "segments_n" segmentinfo entry,
> alternatively is it possible not to throw an exception in this use case?
> Or do you have a suggestion for how it would be possible to manually
> initialise the RAMDirectory before passing it to the IndexWriter?
> Because these two lines are being called per request - we're seeing 2
> exceptions per request - this seems like an expensive way of initialising an
> IndexWriter. We've already had to replace QueryParser with SimpleQueryParser
> because QueryParser was throwing 50+ exception internally when being
> instantiated.
> If anyone can point me in the right direction, I'd be more than happy to try
> and create a fix / PR. But I'm wondering as RAMDirectory is often used for
> unit testing scenarios - does anyone have any deep knowledge about why this
> current behaviour is the default behaviour?
> Many Thanks,
> Howard
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)