On Thu, 19 Mar 2020, Marc Jeurissen wrote:

Pylucene version: 8.1.1

Hi all,

When you have a custom tokenizer (class CustomTokenizer(PythonTokenizer)), you don?t seem to be able to override any method besides incrementToken (so not end, reset, close).

Is this correct?

Correct, the only native method in PythonTokenizer.java meant to be implemented in Python is incrementToken() since that is what Tokenizer.java documents as being the method to extend.

This doesn't mean that you can't add your own extension points. Just edit PythonTokenizer.java and add more native methods you wish to implement from python and rebuild extensions.jar and PyLucene. If you override Reset() or Close() you probably still want to ensure that the parent versions are called from your own python overrides by casting your instance to the parent class using its .cast_() method, using something like
  mytok.cast_(Tokenizer).reset()

Andi..


Thank you very much



Met vriendelijke groeten,
Marc Jeurissen

Bibliotheek UAntwerpen
Stadscampus ? Ve35.303
Venusstraat 35 ? 2000 Antwerpen
marc.jeuris...@uantwerpen.be
T +32 3 265 49 71




  • Tokenizer Marc Jeurissen
    • Re: Tokenizer Andi Vajda

Reply via email to