GitHub user paulirwin created a discussion: NeoLuke: The next generation of 
Luke for Lucene.NET

I did a thing over the weekend, you can check it out here: 
https://github.com/paulirwin/neoluke

(Note: this is a personal side project on my personal GitHub, and not an 
official artifact of Apache Lucene.NET.)

I have been a user of the Luke toolkit app for Lucene for many years. I often 
use it to ensure compatibility between Lucene and Lucene.NET, and perhaps most 
often to kick the tires on different analyzers and see what tokens they emit to 
debug issues either I am having myself or other people are having on GitHub 
issues. Unfortunately, to use a Lucene 4.8 index, you have to use a pretty old 
version of Luke, and its old Java Swing UI is pretty rough nowadays on 
high-resolution modern OSes.

I have tried to create my own version of "Luke.NET" no less than three times... 
never got around to finishing the project any of those times. (You can see one 
failed start [here](https://github.com/paulirwin/Luke.Net) from 12 years ago.) 
Thankfully, this time I have Claude Code to help do a lot of the grunt work and 
save my wrists. So, this weekend I decided to just get it done!

Before I get into the details, I want to give a shout-out to two other .NET 
ports of Luke:
- https://github.com/shemanaev/Luke.Net
- https://github.com/tareq2/luke.net

Both of those use WinForms, which won't work on my macOS laptop, and both 
target the older versions of Lucene.NET. So while possibly they could be 
updated to use some cross-platform GUI framework on modern .NET and target 
Lucene.NET 4.8, I decided that would likely be too much work, and just started 
from scratch.

NeoLuke is the next generation of a Lucene.NET port of Luke. It is fully 
cross-platform, and tested to work on macOS, Linux (Ubuntu, at least), and 
Windows. It uses Avalonia for a WPF-like GUI. It even supports dark mode! I 
mostly tracked the latest Lucene trunk (v11.0) Luke UI, although I deviated in 
a few ways that I felt were awkward and could be improved. It is built using 
the .NET 10 RC SDK, which will be GA in just a few short weeks.

It also has a decent bit of unit and integration test coverage. The integration 
tests use Avalonia's headless testing mode, so they are run in the GitHub CI 
pipelines as well on all three OSes. The repo also includes a demo index 
generator that uses Bogus to create random data. This is handy if you don't 
already have an index to kick the tires on. I might add some additional 
generators to pull publicly-available data too.

Almost all of the Luke v11 functionality is implemented, where possible with 
Lucene.NET 4.8. Some things like adding documents and More Like This are a 
little rough around the edges still in the UI, but browsing the index 
term/document data, searching, check index, optimize index, export terms, 
viewing segment files, and using the analysis tab all work great. 

In particular the feature I use most in Luke (and now NeoLuke) is Analysis, as 
I like seeing how terms get stemmed or ignored by the analyzer. NeoLuke allows 
you to easily select from _45_ (!) included analyzers in Lucene.NET for either 
searching or token analysis! (It's fun playing around with them and seeing that 
e.g. the SpanishAnalyzer emits just "dos" for "¿Por qué no los dos?" as input.) 
This is also handy on the Search tab to see how the Query Parser + Analyzer 
come together to parse your query text.

The following analyzers are not yet supported because their constructors are 
parameterized and I need to add support for that: 

- SnowballAnalyzer
- QueryAutoStopWordAnalyzer
- LimitTokenCountAnalyzer
- PatternAnalyzer
- PerFieldAnalyzerWrapper

Additionally, some analyzers like StopAnalyzer currently load and show in the 
list but do not currently work; I'll be working on fixing those. I also hope to 
add support for building a custom analyzer like you can in Luke with built-in 
components, as well as maybe even allowing dynamic external assembly loading to 
load in your custom analyzer from a DLL.

Also, a natural warning: don't use this on production indexes without taking a 
backup first! This app hasn't been thoroughly tested in production scenarios, 
especially for index modification/optimization.

It is possible that NeoLuke could form the basis (or even the direct 
implementation) of a .NET port of Luke for a future version of Lucene.NET. 
Lucene added Luke to their repo in Lucene 8.1, so once we catch up to that 
version, it would be good for us to have a port, and I'm open to NeoLuke 
becoming that port if the community is interested in that. In the meantime, 
this can serve as an external, experimental tool to kick the tires on whether 
that has value.

I welcome feedback and any issues for bug reports or feature requests you might 
have. Thanks, and enjoy!

GitHub link: https://github.com/apache/lucenenet/discussions/1210

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to