Hi Maciej,

On Fri, 19 Jul 2019, Maciej Gawinecki wrote:

I have ported your Stempel stemmer [1] for Polish language from Java
to Python [2]. I know you have also Python wrapper for Lucene
(pyLucene) so I was curious if you would be interested in the native
implementation of a single stemmer?

It has same accuracy as the original version and only slightly better
performance comparing to the wrapped version (compared with pyjini)
but uses only one language (no need to switch between languages when
debugging) which was quite important in my NLP project. I understand
that it introduces the need to maintain two code bases, though.

PyLucene is not a port of Lucene to Python but a Python/C++ wrapper library auto-generated via JCC:
  http://lucene.apache.org/pylucene/jcc/
Users of PyLucene in fact embed an actual, unchanged, Apache Java Lucene jar file and a JVM into their Python VM.

The Stempel stemmer is part of PyLucene already since it is included in the wrapper generation (look for stempel):
  https://svn.apache.org/repos/asf/lucene/pylucene/tags/pylucene_7_7_1/Makefile

Your native port, which I'm sure is valid and useful, thus does not fit with that auto-wrapper model, however. There is little to no maintenance done on PyLucene proper as all its useful code is in Java Lucene and JCC. Adding native Python code to PyLucene would break that no-maintenance convenience.

Thank you for thinking of PyLucene for hosting it, though !

Andi..


Regards,
Maciej Gawinecki



[1]: 
https://github.com/apache/lucene-solr/tree/master/lucene/analysis/stempel/src/java/org
[2]:https://github.com/dzieciou/pystempel/tree/feature/1

Reply via email to