Hello,
Comments and replies inline...
On Wed, 1 Feb 2012, Chris Wilson wrote:
I have been working on integrating Apache Tika (in Java) with our open source
intranet application (in Python/Django) using JCC, as described here:
http://blog.aptivate.org/2012/02/01/content-indexing-in-django-using-apache-tika/
Very cool. I had done a Tika build with JCC some time ago and found that it
required a very long list of parameters as well as it integrates with a
large number of Java libraries. Using Maven there helped considerably with
getting all the pieces on the Java side.
Your remark about not needing JCC's shared library mode is probably correct
right now but as soon as anyone brings in another JCC-built library into the
same process as yours, shared mode is going to be required since the Java VM
can only be initialized once per process.
In order to make it easy to install Tika (which normally requires mystic
incantations of JCC) I have packaged it up with jar files and a setup.py
script. This required some changes to JCC. I hope you will consider these for
inclusion in your project. I don't believe that they break backwards
compatibility.
Changes implemented by the attached patch and visible online (formatted) at
<https://github.com/aptivate/jcc/commits/master>:
* Allow calling cpp.jcc with a --maxheap argument to reduce the heap size, as
the default doesn't fit in memory on a reasonably small virtual machine.
* Allow calling cpp.jcc with --egg-info to generate the egg_info, without
doing a build.
* Allow calling cpp.jcc with --extra-setup-arg <arg> to pass additional
arguments to the setup() function call.
No objections to these patches in principle but it would be easier for me to
integrate them if you could provide patches computed from the svn repository
of JCC: http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/
Your patches seem to be small enough so I should be able to do without but
it would be nicer if I didn't have to guess...
Also, please write small descriptions for these new command line flags to go
into JCC's __main__.py file:
http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/jcc/__main__.py
Changes that require more work:
* Can JCC please not fail completely if setuptools hasn't been patched? Can
it monkeypatch it instead, or at least fall back to non-shared mode?
This mess of setuptools patching was meant to be *temporary* until
setuptools' issue 43 was fixed. As you can see, I filed this bug 3 1/2 years
ago, http://bugs.python.org/setuptools/issue43, and my patch for issue 43
still hasn't been accepted, rejected, integrated, anything'ed... Dormant.
For over three years.
If one doesn't want support for shared mode:
- add a NO_SHARED environment variable during build
- don't use --shared with JCC during builds
* Why does JCC use non-standard command line arguments like --build and
--install? Can it be modified to make it easier to invoke from a
setup.py-style environment, such as exporting a setup() function as
setuptools does?
What standard are you referring to ?
The python extension module build/install/deploy story on Python keeps
evolving... Add Python 3.x support into the mix, and the mess is complete.
Seriously, though, I think that the right thing to do to better integrate
JCC with distutils/setuptools/distribute/pip/etc... is to make it into a
distutils 'compiler'. This requires some work, though, and I haven't done it
in all thee years. Anyone with the itch to hack on distutils is welcome to
take that on.
Additionally, issue 43 is all about using the distutils/setuptools compiler
and linker invocation machinery for building a vanilla shared library (as
opposed to a Python extension). On linux this is a bit cumbersome. On
Windows, at little less so. On Mac OS X, it just works.
The alternative would be to write a 'configure' script for that part of the
JCC build. A configure script would also solve the chicken/egg problem of
building that library on Windows (the first time, the build needs to be done
twice for the import library to be in the right place).
Currently, I'm leaning towards the configure script solution since none of
the projects mentioned above seems to have taken issue 43 on (by simply
integrating my patches) in all these years and Pylucene's issue 13 is
curently blocked:
https://issues.apache.org/jira/browse/PYLUCENE-13?focusedCommentId=13162273&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13162273
I have very little itch to dabble in configure scripts either so I've been
dragging my feet. If someone were to step forward with a patch for that,
I'd be delighted in ripping out all this patching brittleness.
* Could JCC be used to generate dynamic proxies at runtime (with a
performance cost) in Python, to avoid the need for a compiler?
That is a whole different project. If I remember correctly, the JPype
project is (or was) taking that approach: http://jpype.sourceforge.net
* Could JCC generate a source distribution (sdist) that could be uploaded to
pypi?
You mean a source distribution that includes the Java sources of all the
libraries/classes wrapped ?
* "setup.py develop" is still broken in the current implementation
I'm not familiar with this 'develop' command nor that it is broken.
What is it supposed to be doing and how is it broken ?
* JCC silently skips wrapping methods whose return type it doesn't know (for
example because I forgot to include a JAR file) which requires a lot of
debugging to track down and fix. This is doubly hard because it only seems to
work when installed, so I can't monkey patch it on the fly to investigate
problems, I have to remember to "setup.py install" each time.
A patch could be written to noisily emit a warning on all methods that are
skipped. Silently wrapping everything would simply wrap the entire JDK by
transitive closure and produce a huge library, assuming you'd have the
patience to watch it compile.
The skipping of method whose signature contains types that are not on the
'wrap this' list (explicit or implicit) is by design. Not being able
to request emitting a warning is a problem.
Thank you very much for your interest and contributions !
Andi..