Bug#256283: [pylucene-dev] Re: pylucene-dev Digest, Vol 4, Issue 2

2004-09-25 Thread Andi Vajda



The value of a naked gcj-compiled lucene package is limited [...]
because of the GC issues


Are you sure? Do you think a naked gcj-compiled lucene package would
be valuable for people creating swig bindings for languages such as
OCaml, Perl, PHP[1], etc.? Do you think C application programmers who
want to use lucene are better off using CLucene[2] instead of a gcj
library? Why did pylucene choose the latter?


Other language bindings would have to solve the GC issue too. What I did 
relies on python's ref counting and would be applicable to any other language 
with similar memory management.

There has been talk of changing PyLucene into something more generic like
SWIGLucene, extending the idea to more languages supported by SWIG.
There also has been talk of bringing in all lucene ports under a future Apache 
Lucene project umbrella (currently Java Lucene is an Apache Jakarta project).

Both are good things and I expect them to happen in the long term.

Why did I choose to do PyLucene instead of using CLucene ? CLucene is a port 
of Java Lucene, it is behind (as most ports are), and comes with its own set 
of bugs. PyLucene is not a port, it is built on the latest released Java 
Lucene library but is affected by GCJ's bugs, a worthwhile tradeoff since 
the GCJ project is very active and one of the most exciting developments in 
Java land at the moment.



gcj compiling a java package is pretty trivial


It's not trivial for everyone. I don't currently know how to use gcj
to create a shared library. Also, if a naked gcj-compiled lucene
library is useful, I can imagine other Debian packages in the future
will need it as a dependency.


In a way, it is even easier than compiling a bunch of C files since gcj can 
take a .jar file as input. But I need to patch the sources so I'm not using 
the .jar file but the source .java files. The following yields one lucene.o 
file from compiling all the lucene java files.


gcj --encoding=UTF-8 -O2 -c -o lucene.o `find lucene-1.4.1/src/java -name 
'*.java' -print`


I can help you with non Debian-specific PyLucene or GCJ issues. What
do you need ?


I have almost no gcj experience, and have never worked with a gcj
shared library. I need your (Andi) help with the judgement call as to
whether this is worthwhile or not.


I think that it would be worthwhile to have a PyLucene debian package provided 
the stock Debian gcc compiler is used to build it and is at least at version 
3.4.1.

Currently, this is problematic on all platforms PyLucene is supported on:
  - on Mac OS X, I have to build a custom gcc/gcj 3.4.1, Apple's gcc doesn't
even come with gcj at the moment
  - on Red Hat 8, 9 or Fedora Core 2, I also build gcc 3.4.1
  - on Windows, I use mingw 3.1 augmented with gcc/gcj 3.4.1

As to having a naked gcj compiled lucene package, given the unresolved gcj 
issues, I wouldn't even trust one unless I had built it myself on a compiler I 
had built too. There are just too many weird issues, some even platform 
specific, such as integrating java threads and python threads. On Windows, 
this is trivial, python doesn't use real threads. On Unixes, python uses 
posix threads and I had to figure out how to coax python threads into gcj's 
boehm-gc package using some non-public functions and structures that may 
change anytime (see attachCurrentThread in PyLucene.i).


There are 5 parts to PyLucene:
  - a patched java lucene compiled by gcj
  - a SWIG part with python specific type translation code that could be
reused as a model for other similar SWIG supported languages
  - python specific java object reference management code
  - python specific python/java thread integration code
  - an optional Berkeley DB - based lucene Directory implementation (that code
is also part of the Java Lucene sandbox, the db package)
The patched java lucene compiled by gcj is the 'easy' part. Except for the 
Berkeley DB part, any developer who wants to extend the PyLucene idea to 
other languages is going to have to reimplement the other parts.


Upsteam wasn't too excited about a naked gcj lucene library, but maybe Doug 
just wasn't thinking about enabling C programmers and swig programmers. My 

current feeling is to wait for gcj-3.5 to reach Debian then revisit.


Who/What is Upsteam ? Doug Cutting ?
If all gcj bugs currently worked around with patches are fixed with gcc 3.5 
release, waiting may help. Figuring out what needs to be patched everytime 
Java Lucene releases a new library is some work.


I did notice that compiling the lucene demo executables using gcj works 
great, and I am tempted to ship those right away.[3]


Which version of Lucene, which version of gcj did you use ?
If you used Lucene 1.4.x and if 
bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15411 is not fixed then 
Search.java should crash quite rapidly. Also, you should have been getting a 
bunch of errors related to anonymous inner class constructors (the bulk of the 
patches in patches.lucene file are 

Bug#256283: [pylucene-dev] Re: pylucene-dev Digest, Vol 4, Issue 2

2004-09-21 Thread Jeff Bowden

Jeff Breidenbach wrote:

So Jeff what can I do?  I already build and install pylucene on several 
Debian machines with every new version but it's been a couple of years 
since I've made a Debian package.  Could you crank one out if I gave you 
build instructions and dependencies?
   



I will not crank out a pylucene package myself. I am willing to help
you (Jeff) produce the package if you are interested. That means I
will answer questions, review the package, and sponsor both it and you
into the Debian project. Please look at an existing Debian python
package (such as python-medusa) before tackling.



OK, I'll take a look.





Bug#256283: [pylucene-dev] Re: pylucene-dev Digest, Vol 4, Issue 2

2004-09-19 Thread Jeff Breidenbach

>Who/What is Upsteam ? Doug Cutting ?

Yes, Doug Cutting and other Java Lucene developers. I meant to say
upstream instead of upsteam.

>Which version of Lucene, which version of gcj did you use 
>[for the lucene demo programs]?

I didn't see any segfaults or compilation warnings.  Lucene is version
1.4.1 and gcj reports its version as "gcj-3.4 (GCC) 3.4.1 (Debian
3.4.1-4sarge1)"

gcj-3.4 -O3 \
/usr/share/javalucene-1.4.jar \
/usr/share/java/lucene-demos-1.4.jar \
-o lucene-search \
--main=org.apache.lucene.demo.SearchFiles

>   - a naked gcj compiled Java Lucene ?
>   - worthless for integrating with other non-java languages

Very well, I will drop the idea of a naked gcj compiled Java
Lucene. If you feel the situation changes in the future (due to a
better gcj) please let me know. In the meantime us poor C application
programmers will be stuck with Clucene.

Cheers,
Jeff