Hello,
I agree with George that to maintain a version of Lucene.NET that fully
takes advantage of the .NET platform ('purer', if you will) while
maintaining identical external interfaces and file format as Java Lucene
will be a manual process that needs to be continually maintained. I
have the luxury of being able to work on Lucene.NET during my day job,
and would love to assist in this. I've already been digging into
internal implementations to discover ways to improve search performance
and so far have only hit the tip of the iceberg.
George, I have a few questions:
If anyone else is interested in contributing to make this work, will you
be coordinating the work to avoid duplication of effort?
Will 2.1 be bumped up to VS 2005?
Thanks!
Michael
George Aroush wrote:
Hi everyone,
I will try to response to this tread of email on this subject as one
response by trying to highlight few things and summarize this subject.
Lets take my current effort of porting Java Lucene 2.1 to C#, which I am
about to start (over this coming weekend.)
I use JLCA to convert Java Lucene 2.1 to C# as a starting point; I never use
this generated code as a base. The files that JLCA generate for me, I only
bother to take in those that actually changed in the Java version from 2.0
to 2.1. This way, there is less files I have to deal with -- stuff that I
don't have to re-clean-up due to JLCA's poor job. In addition to those two
diff's, I also look at the diff's of raw JLCA generated code in Lucene.Net
2.0 and 2.1.
As you can see, those diff's give me a baseline to start with; they allow me
to filter out any repeated clean-up that I have to do. Why? JLCA does a
very poor job at conversion. Not only it doesn't know how to convert a good
number of Java code, it generates, in few instances, buggy code and it
creates a lot of code, and I mean a lot, in SupportClass.cs such that you
will see 100's of lines using SuportClass methods -- this pollutes the code
badly.
If you haven't already, I urge you to give JLCA a quick try to get feel for
what I mean. And no, JLCA doesn't target .NET 1.1 or 2.0, what comes with
VS.NET 2005 is really the same beta JLCA that Microsoft released for VS.NET
2000/3 years ago. Finally, Microsoft is dropping support for it with VS.NET
2008.
Because of this complexity of conversion, I don't like the idea of making
the code 'purer' -- at least not now. However, I am all for it **if and
only if** we achieve a port level where the SVN of Lucene.Net is in par with
the SVN of Java Lucene. When we achieve this milestone, then we can port
Java code to C#, say on a weekly basic **by hand** and it will be easy to do
and just make the change on the C# end.
With the recent activates on Lucene.Net mailing list and interest, I believe
there is enough interest to achieve this milestone. Don't you agree?
-- George Aroush
-----Original Message-----
From: Ayende Rahien [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 28, 2007 7:42 PM
To: [email protected]
Subject: Re: Lucene.Net project involvement
I am not familiar enough with the internals of Lucene to talk, I am afraid.
On 3/29/07, Ciaran Roarty <[EMAIL PROTECTED]> wrote:
Ayende
In your opinion, would you say that taking Lucene.Net 2.1 as a
baseline and making it 'pure' .NET would be a sensible thing to do?
Ciaran
On 28/03/07, Ayende Rahien <[EMAIL PROTECTED]> wrote:
I have some experience with porting projects from Java to C#, most
often,
the port is done once, similar to the way it is done on Lucene, and
porting new features is done on a per case basis, mostly by hand.
This allows to take greater advantage on the capabilities of the
.Net platform, as well as add additional behavior that may not
exists in the original platform
On 3/28/07, Michael Garski <[EMAIL PROTECTED]> wrote:
Everyone -
I feel I have to chip my 2 cents in regarding the 'throw' issue.
The exception throwing inside Lucene, particularly during indexing
operations and on a smaller scale when using QueryParser can be
safely altered without affecting either of the 2 goals you list -
making the index cross compatible with Java and maintaining
consistent [external] API.
The indexes we maintain are constantly being updated as they
contain millions of small documents with relatively volatile data.
Seeing upwards of 8000/exceptions per second while maintaining
those indexes prompted us to dig into the internals of Lucene.NET
to alter the throws. We also modified the internal data
structures to use generic collections rather than synchronized
arraylists and hashtables to cut down on the large amount of small
object creation we were seeing in a profiler. The end result cut
the exceptions to 0 and significantly increased performance during
index time. All modifications we have
made
still result in passing unit tests.
I would venture to say that the vast majority of Lucene.NET users
would
not greatly benefit from these performance improvements unless
they
are
working on a _very_ high-volume application such as we are. We
currently maintain our own branch of Lucene.NET, incorporating any
changes made to the SubVersion trunk into our branch. As it
appears these changes are not desired in the official Lucene.NET
releases, the changes are not difficult for anyone to make on
their own should they choose to do so. One of the advantages of
open source
Thanks,
Michael
PS: if you have experience with Lucene.NET, high volume server
applications, live in the Los Angeles area, and are looking for a
new job, please email me off the list at mgarski[at]mac[dot]com
with a recent resume... we are hiring.
George Aroush wrote:
Hi Michael, Ciaran and all,
Ciaran: welcome aboard to the mailing list and I am glad to see
your
email
generated some interest; I welcome any help you or anyone can
offer
working
on Lucene.Net.
My goal of Lucene.Net are to meet the followings:
1) Index is cross compatible with Java's Lucene such that you
can
read/write
to the same index concurrently using C# of Java Lucene.
2) The APIs are consistent between C# and Java Lucene. This is
why
I
use
"GetXYZ()" instead of C# prosperities.
Up to release 2.0, I kept Lucene.Net on .NET 1.1 because I
wanted to
support
more .NET installation as possible. With Lucene.Net 2.1 release
it's
time
to move to .NET 2.0 -- I don't think anyone has any objection to
this,
but
Mono may have some issues.
As for the code clean up, this maybe difficult and it depends on
what
clean
up you mean. Take a look at open JIRA issues against Lucene.Net
and
you
will see few about over using "throw". Those, unfortunately, we
can't
fix.
Why? Because those "throw" are also present in Java Lucene and
trying
to
'fix' them in Lucene.Net may in effect alter the behavior of
Lucene.Net.
This said, any extra code or "throw" introduced into Lucene.Net
due
to
conversion mistakes should be fixed.
As for the warnings, I don't have direct experience looking at
them
using
VS.NET 2005 (I still use VS.NET 2003) But in VS.NET 2003, most
of
those
warnings are from comments -- i.e.: the class and API XML
documentation
that
don't get converted correctly from Java to C#. If you can think
of
a
tool
to clean them up, please let me know. If it's something else
you
are
talking about, please let me know.
Finally, making the Lucene.Net code more compliant to .NET / C#
standard
would be, in my opinion, a nice thing to have. But before we
can do
so,
we
must get the port working and keep in mind my goal #2 above.
Lets discuss this topic further. Next week, I expect to release
an
early
release of Lucene.Net 2.1. If folks can help to finish off the
conversion,
then we can get this out much sooner then previous release.
Regards,
-- George Aroush
-----Original Message-----
From: Michael Mitiaguin [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 27, 2007 9:19 PM
To: [email protected]
Subject: Re: Lucene.Net project involvement
Ciaran,
What I can't understand if core of synchronising versions with Java
Lucene is Java Language Conversion Assistant, how all this
cleaning
up/revising is going to work.
Would it be possible to build automated procedure which
preserve
all
.Net
improvements after conversion from major upgrade from Java ? I
am
not
sure.
Even if to track somehow only changed/added Java classes still
for
each
such class merging new/revised Java functionality with previous
manual
changes to utilise .Net capabalities is required.
You used term component , but Lucene is rather API with fine
grained
classes
and a simple change may propagate into several classes (
files in Java
) .
I don't know how George is coping with that and what would be
the
plan
if
say tomorrow Lucene Java 3 will be realeased.
Michael
Ciaran Roarty wrote:
Michael
I've been in touch with George about getting involved and he
said
to
post to
the mailing list.
I reckon there's a fair amount of work could be done in
changing
the
codebase without affecting the published interface and I reckon
that's
where
the bulk of the initial work would take place; as we know, the
code
is
not
yet optimised for .NET.
Now, balanced against that, in my opinion are the following
factors:
- The code currently compiles against 1.1 and 2.0 (albeit with
some obsolence); any change to move Lucene.Net to 2.0 would
leave the 1.1codebase behind.
- There are different types of contribution to the codebase:
cleaning
up
code; revising methods and classes to benefit .NET standards
and capabilities is a good thing. However, Lucene is a powerful
IR component and if the core development of those capabilities
happens in the Java
version
then we will need to follow that.
That's my thoughts for the moment. Maybe we could take a
specific
part
of
the component and revise that. Learning lessons about the
process
and
the
codebase from that exercise, we can move into the guts of the
component......
Any thoughts?
Ciaran
On 27/03/07, Michael Mitiaguin <[EMAIL PROTECTED]> wrote:
Ciaran,
The only active contributor to the project is George Aroush
and
perhaps
he is the only person who will give you the most definite answer.
I am also interested only in Net2/3 codebase . Currently
vesion
2.0.4
still uses VS 2003 projects and my main concern are warning
messages
about deprecated and obsolete methods when compiled under Net2.
Supposedly it 'll be fixed in 2.1 Also Java Lucene is more
mature project with a lot of people
involved
and it would be safer to crosstranslate new things from there
taking
into consideration .Net specifics.
From other hand in my case if Lucene will be part of a
project
where
all warning messages considered to be the errors which must be
eliminated , it it beyond my competency what can be done to
achieve
that. ( JavaCC generated code crosstranslation creates a lot
of
them
)
Michael
Ciaran Roarty wrote:
Anthony
I too have used Lucene.Net with C# 2.0 to great effect.
However,
I
am
discussing the use of .Net 2.0 in the codebase itself; and,
if
not,
the
optimisation of the codebase for .Net in general.
Ciaran
On 26/03/07, tony njedeh <[EMAIL PROTECTED]> wrote:
I set up my lucene to a .net 2.0 framework, using VB and it
works
well in
that environment.
Anthony
Ciaran Roarty <[EMAIL PROTECTED]> wrote:
George et al
I have been using Lucene.Net in a proof-of-concept
environment
for
the
last
couple of months - with my colleague Guy Steel - and we
wanted
to
get
involved in its development.
I am a .NET developer for a large consultancy company and
would
like to
get
involved in making Lucene.Net more aligned to .NET and .NET
2/3
in
particular. However, I am not sure if that is something
which is initially planned for Lucene.Net. As I understand
it, the majority of the conversion has been done, initially,
using the Java Language Conversion
Assistant.
Some
of the Java codebase uses patterns that are not best
practice
for
.NET
-
such as using Exceptions for non-exceptional circumstances.
This
is
not to
denigrate Lucene.Net, it is one of the best pieces of
software I
have
used.
So, this email should be considered an introduction and a
request
to be
allowed to get involved. I have never worked on an Open
Source
project
before so I'll need some guidance but I am willing to learn.
I
do
have
a
couple of questions to start with:
- Is there a roadmap for the product? Is there a roadmap for
Lucene
that
we
will try and follow?
- Is there a preferred version of the .NET Framework that it
is planned to support?
Enough for now, just wanted to introduce myself and get
involved.
Cheers,
Ciaran