Everyone -
I feel I have to chip my 2 cents in regarding the 'throw' issue. The
exception throwing inside Lucene, particularly during indexing
operations and on a smaller scale when using QueryParser can be safely
altered without affecting either of the 2 goals you list - making the
index cross compatible with Java and maintaining consistent [external]
API.
The indexes we maintain are constantly being updated as they contain
millions of small documents with relatively volatile data. Seeing
upwards of 8000/exceptions per second while maintaining those indexes
prompted us to dig into the internals of Lucene.NET to alter the
throws. We also modified the internal data structures to use generic
collections rather than synchronized arraylists and hashtables to cut
down on the large amount of small object creation we were seeing in a
profiler. The end result cut the exceptions to 0 and significantly
increased performance during index time. All modifications we have made
still result in passing unit tests.
I would venture to say that the vast majority of Lucene.NET users would
not greatly benefit from these performance improvements unless they are
working on a _very_ high-volume application such as we are. We
currently maintain our own branch of Lucene.NET, incorporating any
changes made to the SubVersion trunk into our branch. As it appears
these changes are not desired in the official Lucene.NET releases, the
changes are not difficult for anyone to make on their own should they
choose to do so. One of the advantages of open source
Thanks,
Michael
PS: if you have experience with Lucene.NET, high volume server
applications, live in the Los Angeles area, and are looking for a new
job, please email me off the list at mgarski[at]mac[dot]com with a
recent resume... we are hiring.
George Aroush wrote:
Hi Michael, Ciaran and all,
Ciaran: welcome aboard to the mailing list and I am glad to see your email
generated some interest; I welcome any help you or anyone can offer working
on Lucene.Net.
My goal of Lucene.Net are to meet the followings:
1) Index is cross compatible with Java's Lucene such that you can read/write
to the same index concurrently using C# of Java Lucene.
2) The APIs are consistent between C# and Java Lucene. This is why I use
"GetXYZ()" instead of C# prosperities.
Up to release 2.0, I kept Lucene.Net on .NET 1.1 because I wanted to support
more .NET installation as possible. With Lucene.Net 2.1 release it's time
to move to .NET 2.0 -- I don't think anyone has any objection to this, but
Mono may have some issues.
As for the code clean up, this maybe difficult and it depends on what clean
up you mean. Take a look at open JIRA issues against Lucene.Net and you
will see few about over using "throw". Those, unfortunately, we can't fix.
Why? Because those "throw" are also present in Java Lucene and trying to
'fix' them in Lucene.Net may in effect alter the behavior of Lucene.Net.
This said, any extra code or "throw" introduced into Lucene.Net due to
conversion mistakes should be fixed.
As for the warnings, I don't have direct experience looking at them using
VS.NET 2005 (I still use VS.NET 2003) But in VS.NET 2003, most of those
warnings are from comments -- i.e.: the class and API XML documentation that
don't get converted correctly from Java to C#. If you can think of a tool
to clean them up, please let me know. If it's something else you are
talking about, please let me know.
Finally, making the Lucene.Net code more compliant to .NET / C# standard
would be, in my opinion, a nice thing to have. But before we can do so, we
must get the port working and keep in mind my goal #2 above.
Lets discuss this topic further. Next week, I expect to release an early
release of Lucene.Net 2.1. If folks can help to finish off the conversion,
then we can get this out much sooner then previous release.
Regards,
-- George Aroush
-----Original Message-----
From: Michael Mitiaguin [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 27, 2007 9:19 PM
To: [email protected]
Subject: Re: Lucene.Net project involvement
Ciaran,
What I can't understand if core of synchronising versions with Java
Lucene is Java Language Conversion Assistant, how all this cleaning
up/revising is going to work.
Would it be possible to build automated procedure which preserve all .Net
improvements after conversion from major upgrade from Java ? I am not
sure.
Even if to track somehow only changed/added Java classes still for each
such class merging new/revised Java functionality with previous manual
changes to utilise .Net capabalities is required.
You used term component , but Lucene is rather API with fine grained classes
and a simple change may propagate into several classes ( files in Java
) .
I don't know how George is coping with that and what would be the plan if
say tomorrow Lucene Java 3 will be realeased.
Michael
Ciaran Roarty wrote:
Michael
I've been in touch with George about getting involved and he said to
post to
the mailing list.
I reckon there's a fair amount of work could be done in changing the
codebase without affecting the published interface and I reckon that's
where
the bulk of the initial work would take place; as we know, the code is
not
yet optimised for .NET.
Now, balanced against that, in my opinion are the following factors:
- The code currently compiles against 1.1 and 2.0 (albeit with some
obsolence); any change to move Lucene.Net to 2.0 would leave the
1.1codebase behind.
- There are different types of contribution to the codebase: cleaning up
code; revising methods and classes to benefit .NET standards and
capabilities is a good thing. However, Lucene is a powerful IR
component and
if the core development of those capabilities happens in the Java version
then we will need to follow that.
That's my thoughts for the moment. Maybe we could take a specific part of
the component and revise that. Learning lessons about the process and the
codebase from that exercise, we can move into the guts of the
component......
Any thoughts?
Ciaran
On 27/03/07, Michael Mitiaguin <[EMAIL PROTECTED]> wrote:
Ciaran,
The only active contributor to the project is George Aroush and perhaps
he is the only person who will give you the most definite answer.
I am also interested only in Net2/3 codebase . Currently vesion 2.0.4
still uses VS 2003 projects and my main concern are warning messages
about deprecated and obsolete methods when compiled under Net2.
Supposedly it 'll be fixed in 2.1
Also Java Lucene is more mature project with a lot of people involved
and it would be safer to crosstranslate new things from there taking
into consideration .Net specifics.
From other hand in my case if Lucene will be part of a project where
all warning messages considered to be the errors which must be
eliminated , it it beyond my competency what can be done to achieve
that. ( JavaCC generated code crosstranslation creates a lot of them )
Michael
Ciaran Roarty wrote:
Anthony
I too have used Lucene.Net with C# 2.0 to great effect. However, I am
discussing the use of .Net 2.0 in the codebase itself; and, if not,
the
optimisation of the codebase for .Net in general.
Ciaran
On 26/03/07, tony njedeh <[EMAIL PROTECTED]> wrote:
I set up my lucene to a .net 2.0 framework, using VB and it works
well in
that environment.
Anthony
Ciaran Roarty <[EMAIL PROTECTED]> wrote:
George et al
I have been using Lucene.Net in a proof-of-concept environment for
the
last
couple of months - with my colleague Guy Steel - and we wanted to get
involved in its development.
I am a .NET developer for a large consultancy company and would
like to
get
involved in making Lucene.Net more aligned to .NET and .NET 2/3 in
particular. However, I am not sure if that is something which is
initially
planned for Lucene.Net. As I understand it, the majority of the
conversion
has been done, initially, using the Java Language Conversion
Assistant.
Some
of the Java codebase uses patterns that are not best practice for
.NET
-
such as using Exceptions for non-exceptional circumstances. This is
not to
denigrate Lucene.Net, it is one of the best pieces of software I have
used.
So, this email should be considered an introduction and a request
to be
allowed to get involved. I have never worked on an Open Source
project
before so I'll need some guidance but I am willing to learn. I do
have
a
couple of questions to start with:
- Is there a roadmap for the product? Is there a roadmap for Lucene
that
we
will try and follow?
- Is there a preferred version of the .NET Framework that it is
planned to
support?
Enough for now, just wanted to introduce myself and get involved.
Cheers,
Ciaran