Hi George, 

Thanks for your input on this and setting the context straight. Quick question 
before I dive in.

How much is the www.artinsoft.com software that you would need? Maybe I can 
help look for sponsorship. I looked at the website and it's a bit of a mess. I 
don't know exactly which product you would need and what it would cost.

As for #2/#3 what is required to get started? Do you know if the documentation 
must be on the Apache site? Could we create a mirror CodePlex.com project that 
just houses releases and docs, but the official site would remain on Apache? 
The reason I ask is that CodePlex.com provides support for documentation using 
the MetaWeblog API (and thus Windows Live Writer) which is a fantastic tool for 
documentation. It lowers the bar to documentation and I've been using it very 
productively for my projects.

If it must stay on Apache, a little nudge in the right direction and I'll see 
what I can do to help.

Phil

-----Original Message-----
From: George Aroush [mailto:geo...@aroush.net] 
Sent: Monday, November 01, 2010 1:56 PM
To: lucene-net-u...@lucene.apache.org; lucene-net-...@incubator.apache.org
Subject: RE: Lucene.NET Community Status

Let me jump in here and offer some perspective about Lucene.Net (btw, it's not 
Lucene.NET :-) ).  This is based on my past involvement with the project -- 
since 2003 when it was on SourceForge.net and called dotLucene.

1) Up until early this year, I have been porting and supporting Lucene.Net 
since ver 1.4 (back in 2004 on SourceForge.net) to the current release on trunk 
ver. 2.9.2.  This is in NO WAY to say that others have not helped or 
contributed.  I'm just saying that I know the history and have the experience 
(I wrote and worked on search engines from 1998 to 2002).

2) Doing an initial port of a new Java Lucene release to C# Lucene is very 
hard; it's the most complex part of the port even using automated tools such as 
JLCA and my own customize scripts which I use pre-and port JLCA (you can search 
the listing on how I do the port).  What used to take me about 1 months with 
90% of tests passing took me well over 4 months (for 2.9.x) with only 10% of 
tests passing.  This was no easy effort and won't be easier now since Java 
Lucene is using new Java language features that JLCA is not aware of (MS is not 
maintaining JLCA).  Put another way, porting is hard especially when you are 
dealing with > 5.6 GB source code consistent of > 610 source files.  You will 
know this ONLY if you have tried it out and maintained it -- this is why no one 
has stepped up to do an initial port otherwise there would be a port by now not 
only of Java Lucene but other projects too.

3) To simplify ports of new release, maintaining as small as possible delta 
between release is very important. This was a main pain point when I ported 
from 2.4 to 2.9.  The in-between ports were never done due to lack of time on 
my end.  See point #2.

4) Diverging away from Java Lucene, both API base and algorithm is risky and 
will just make point #2 more evidence.  Not only will you now need a deep 
knowledge of search engines to catch bugs, but also a deep knowledge of 
Lucene's internals.  Also, you risk compatibility as well as books and existing 
resources on the web that cover Lucene -- hack, one can take any Java Lucene 
example and easily read it as a Lucene.Net code or use Luke to debug an index.  
Keep in mind, the current port model that we have for Lucene.Net keeps the API  
one-to-one in sync with Java Lucene; just upper case method names.   
Yes, it's not fully .NET'es, but if you are looking for a search engine that is 
compatible with the open source search engine standard, and it is available in 
C#, Lucene.Net is it.

5) Beside making the port simpler, and per point #3 above, doing a 
line-per-line port, and maintaining API naming as well as the algorithm and 
file format of Java Lucene in C# Lucene means a Lucene index created by Java 
Lucene is usable, concurrently, by C# Lucene.  I have worked on one such 
project where a Java and C# code accessing the same index.  I'm not too 
interested in making Lucene.Net .NET'es and end up adding more risk to the 
project.

6) If anyone wants a different flavor of Lucene.Net, the code is on Apache, 
just fork it and start a new project.  Make it more .NET'es, use the latest 
that .NET has to offer, and all.  However, until when you have first hand 
experience with the port, and a good knowledge of Lucene and search engines, 
and the cycles to work on it, I really don't want to exercise this idea it will 
die as I know few folks have tried.

7) I can't speak for the other committers or those who contributed, but for me, 
I do this totally during my own time.  Each hour I spent on Lucene.Net is an 
hour away from my family or anything else.  I don't get paid, and I hardly get 
much off my Luene.Net work on the side.  As you may know, I was active with 
Lucene.Net till about early this year, (I had a family emergency).  I want to 
step up again, but we need more participation than just an offer to help or 
request divergence from the goal of the project, per the points that I made 
above.

I can go on, but the above are to clarify some of the issues and background of 
Lucene.Net.  Please keep those in mind when thinking about this project and how 
you can contribute -- especially comments about making Lucene.Net more .NET'es 
-- can't start that till when you first achieve commit-per-commit port of Java 
Lucene to C# Lucene.

If you agree with the above, and it makes sense to you, my suggestion is as 
follows:

1) Lucene.Net goes back into incubation and start all over again.
2) Start with cleaning up the webpage and make it more like other Apache 
project site.
3) Put together an official Lucene.Net 2.9.2 and get it released.
4) Start working on the next port.

#2, #3 can happen right away, and all that it takes to do them is coming up to 
speed on how-to using existing Apache documentation.  Who is up to this task?

#4 is a bit more complicated.  I don't want to go through the port pain that I 
had with 2.9.0 -- it was too much.  JLCA that comes with VS 2005 is out of 
date; I would love to try out a newer version from www.artinsoft.com, but it is 
$$.

I hope the above helps and I have not offended or discouraged anyone as it 
isn't my intention.  I just want to clarify few things about Lucene.Net

PS: One final point.  Look at CLucene, NLucene and few other variation of Java 
Lucene ports that were done at Lucene internal level with the goal of 
maintaining language look feature and look-and-fell, such as  
C++, those projects are either way out of date in terms of release
version support or offer only partial support (index read only).  I don't want 
to use this to bad mouth another project, but to make a point that porting is 
hard if you diverge from the core.  As is, Lucene.Net is not dead, it's slow 
and needs contributors who will step-up.

Thanks,

-- George


Reply via email to