Hello,

I agree with George that to maintain a version of Lucene.NET that fully takes advantage of the .NET platform ('purer', if you will) while maintaining identical external interfaces and file format as Java Lucene will be a manual process that needs to be continually maintained. I have the luxury of being able to work on Lucene.NET during my day job, and would love to assist in this. I've already been digging into internal implementations to discover ways to improve search performance and so far have only hit the tip of the iceberg.

George, I have a few questions:
If anyone else is interested in contributing to make this work, will you be coordinating the work to avoid duplication of effort?
Will 2.1 be bumped up to VS 2005?

Thanks!

Michael

George Aroush wrote:
Hi everyone,

I will try to response to this tread of email on this subject as one
response by trying to highlight few things and summarize this subject.

Lets take my current effort of porting Java Lucene 2.1 to C#, which I am
about to start (over this coming weekend.)

I use JLCA to convert Java Lucene 2.1 to C# as a starting point; I never use
this generated code as a base.  The files that JLCA generate for me, I only
bother to take in those that actually changed in the Java version from 2.0
to 2.1.  This way, there is less files I have to deal with -- stuff that I
don't have to re-clean-up due to JLCA's poor job.  In addition to those two
diff's, I also look at the diff's of raw JLCA generated code in Lucene.Net
2.0 and 2.1.
As you can see, those diff's give me a baseline to start with; they allow me
to filter out any repeated clean-up that I have to do.  Why?  JLCA does a
very poor job at conversion.  Not only it doesn't know how to convert a good
number of Java code, it generates, in few instances, buggy code and it
creates a lot of code, and I mean a lot, in SupportClass.cs such that you
will see 100's of lines using SuportClass methods -- this pollutes the code
badly.

If you haven't already, I urge you to give JLCA a quick try to get feel for
what I mean.  And no, JLCA doesn't target .NET 1.1 or 2.0, what comes with
VS.NET 2005 is really the same beta JLCA that Microsoft released for VS.NET
2000/3 years ago.  Finally, Microsoft is dropping support for it with VS.NET
2008.

Because of this complexity of conversion, I don't like the idea of making
the code 'purer' -- at least not now.  However, I am all for it **if and
only if** we achieve a port level where the SVN of Lucene.Net is in par with
the SVN of Java Lucene.  When we achieve this milestone, then we can port
Java code to C#, say on a weekly basic **by hand** and it will be easy to do
and just make the change on the C# end.

With the recent activates on Lucene.Net mailing list and interest, I believe
there is enough interest to achieve this milestone.  Don't you agree?

-- George Aroush

-----Original Message-----
From: Ayende Rahien [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 28, 2007 7:42 PM
To: [email protected]
Subject: Re: Lucene.Net project involvement

I am not familiar enough with the internals of Lucene to talk, I am afraid.

On 3/29/07, Ciaran Roarty <[EMAIL PROTECTED]> wrote:
Ayende

In your opinion, would you say that taking Lucene.Net 2.1 as a baseline and making it 'pure' .NET would be a sensible thing to do?

Ciaran


On 28/03/07, Ayende Rahien <[EMAIL PROTECTED]> wrote:
I have some experience with porting projects from Java to C#, most
often,
the port is done once, similar to the way it is done on Lucene, and porting new features is done on a per case basis, mostly by hand. This allows to take greater advantage on the capabilities of the .Net platform, as well as add additional behavior that may not exists in the original platform

On 3/28/07, Michael Garski <[EMAIL PROTECTED]> wrote:
Everyone -

I feel I have to chip my 2 cents in regarding the 'throw' issue. The exception throwing inside Lucene, particularly during indexing operations and on a smaller scale when using QueryParser can be safely altered without affecting either of the 2 goals you list - making the index cross compatible with Java and maintaining consistent [external] API.

The indexes we maintain are constantly being updated as they contain millions of small documents with relatively volatile data. Seeing upwards of 8000/exceptions per second while maintaining those indexes prompted us to dig into the internals of Lucene.NET to alter the throws. We also modified the internal data structures to use generic collections rather than synchronized arraylists and hashtables to cut down on the large amount of small object creation we were seeing in a profiler. The end result cut the exceptions to 0 and significantly increased performance during index time. All modifications we have
made
still result in passing unit tests.

I would venture to say that the vast majority of Lucene.NET users
would
not greatly benefit from these performance improvements unless they
are
working on a _very_ high-volume application such as we are. We currently maintain our own branch of Lucene.NET, incorporating any changes made to the SubVersion trunk into our branch. As it appears these changes are not desired in the official Lucene.NET releases, the changes are not difficult for anyone to make on their own should they choose to do so. One of the advantages of open source

Thanks,

Michael

PS: if you have experience with Lucene.NET, high volume server applications, live in the Los Angeles area, and are looking for a new job, please email me off the list at mgarski[at]mac[dot]com with a recent resume... we are hiring.

George Aroush wrote:
Hi Michael, Ciaran and all,

Ciaran: welcome aboard to the mailing list and I am glad to see your
email
generated some interest; I welcome any help you or anyone can offer
working
on Lucene.Net.

My goal of Lucene.Net are to meet the followings:
1) Index is cross compatible with Java's Lucene such that you can
read/write
to the same index concurrently using C# of Java Lucene.
2) The APIs are consistent between C# and Java Lucene. This is why
I
use
"GetXYZ()" instead of C# prosperities.

Up to release 2.0, I kept Lucene.Net on .NET 1.1 because I wanted to
support
more .NET installation as possible.  With Lucene.Net 2.1 release
it's
time
to move to .NET 2.0 -- I don't think anyone has any objection to
this,
but
Mono may have some issues.

As for the code clean up, this maybe difficult and it depends on
what
clean
up you mean. Take a look at open JIRA issues against Lucene.Net and
you
will see few about over using "throw".  Those, unfortunately, we
can't
fix.
Why?  Because those "throw" are also present in Java Lucene and
trying
to
'fix' them in Lucene.Net may in effect alter the behavior of
Lucene.Net.
This said, any extra code or "throw" introduced into Lucene.Net due
to
conversion mistakes should be fixed.

As for the warnings, I don't have direct experience looking at them
using
VS.NET 2005 (I still use VS.NET 2003) But in VS.NET 2003, most of
those
warnings are from comments -- i.e.: the class and API XML
documentation
that
don't get converted correctly from Java to C#. If you can think of
a
tool
to clean them up, please let me know. If it's something else you
are
talking about, please let me know.

Finally, making the Lucene.Net code more compliant to .NET / C#
standard
would be, in my opinion, a nice thing to have. But before we can do
so,
we
must get the port working and keep in mind my goal #2 above.

Lets discuss this topic further. Next week, I expect to release an
early
release of Lucene.Net 2.1.  If folks can help to finish off the
conversion,
then we can get this out much sooner then previous release.

Regards,

-- George Aroush


-----Original Message-----
From: Michael Mitiaguin [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 27, 2007 9:19 PM
To: [email protected]
Subject: Re: Lucene.Net project involvement

Ciaran,

What I can't understand if core of synchronising versions with Java
Lucene is   Java Language Conversion Assistant, how all this
cleaning
up/revising  is going to work.
Would it be possible to build automated procedure which preserve
all
.Net
improvements after conversion from major upgrade from Java ? I am
not
sure.
Even if to track somehow only changed/added Java classes still for
each
such class merging new/revised Java  functionality with previous
manual
changes to utilise  .Net capabalities is required.
You used term component , but Lucene is rather API with fine grained
classes
and a simple change may propagate into  several  classes  (
files  in  Java
) .
I don't know how George is coping with that and what would be the
plan
if
say tomorrow Lucene Java 3 will be realeased.

Michael

Ciaran Roarty wrote:


Michael

I've been in touch with George about getting involved and he said
to
post to
the mailing list.

I reckon there's a fair amount of work could be done in changing
the
codebase without affecting the published interface and I reckon
that's
where
the bulk of the initial work would take place; as we know, the code
is
not
yet optimised for .NET.

Now, balanced against that, in my opinion are the following
factors:
- The code currently compiles against 1.1 and 2.0 (albeit with some obsolence); any change to move Lucene.Net to 2.0 would leave the 1.1codebase behind.
- There are different types of contribution to the codebase:
cleaning
up
code; revising methods and classes to benefit .NET standards and capabilities is a good thing. However, Lucene is a powerful IR component and if the core development of those capabilities happens in the Java
version
then we will need to follow that.

That's my thoughts for the moment. Maybe we could take a specific
part
of
the component and revise that. Learning lessons about the process
and
the
codebase from that exercise, we can move into the guts of the component......

Any thoughts?

Ciaran

On 27/03/07, Michael Mitiaguin <[EMAIL PROTECTED]> wrote:


Ciaran,

The only active contributor to the project is George Aroush and
perhaps
he is the only person who will give you the most definite answer.
I am also interested only in Net2/3 codebase . Currently vesion
2.0.4
still uses VS 2003 projects and my main concern are warning
messages
about deprecated and obsolete methods when compiled under Net2.
Supposedly it 'll be fixed in 2.1 Also Java Lucene is more mature project with a lot of people
involved
and it would be safer to crosstranslate new things from there
taking
into consideration  .Net specifics.
From other hand in my case if Lucene will be part of a project
where
all warning messages considered to be the errors which must be eliminated , it it beyond my competency what can be done to
achieve
that. ( JavaCC generated code crosstranslation creates a lot of
them
)
Michael

Ciaran Roarty wrote:


Anthony

I too have used Lucene.Net with C# 2.0 to great effect. However,
I
am
discussing the use of .Net 2.0 in the codebase itself; and, if
not,
the

optimisation of the codebase for .Net in general.

Ciaran


On 26/03/07, tony njedeh <[EMAIL PROTECTED]> wrote:


I set up my lucene to a .net 2.0 framework, using VB and it
works
well in
that environment.

Anthony

Ciaran Roarty <[EMAIL PROTECTED]> wrote:
George et al

I have been using Lucene.Net in a proof-of-concept environment
for
the

last
couple of months - with my colleague Guy Steel - and we wanted
to
get
involved in its development.

I am a .NET developer for a large consultancy company and would

like to

get
involved in making Lucene.Net more aligned to .NET and .NET 2/3
in
particular. However, I am not sure if that is something which is initially planned for Lucene.Net. As I understand it, the majority of the conversion has been done, initially, using the Java Language Conversion

Assistant.

Some
of the Java codebase uses patterns that are not best practice
for
.NET
-

such as using Exceptions for non-exceptional circumstances. This
is
not to
denigrate Lucene.Net, it is one of the best pieces of software I
have
used.

So, this email should be considered an introduction and a
request
to be

allowed to get involved. I have never worked on an Open Source

project

before so I'll need some guidance but I am willing to learn. I
do
have
a

couple of questions to start with:

- Is there a roadmap for the product? Is there a roadmap for
Lucene
that

we
will try and follow?
- Is there a preferred version of the .NET Framework that it is planned to support?

Enough for now, just wanted to introduce myself and get
involved.
Cheers,
Ciaran




Reply via email to