On Thu, 8 Sep 2016, Dirk Rothe wrote:

Am 05.09.2016, 21:27 Uhr, schrieb Andi Vajda <va...@apache.org>:


On Mon, 5 Sep 2016, Dirk Rothe wrote:
A volunteer is requested to build and test PyLucene's trunk on Windows. If noone comes forward, I intend to try to release PyLucene 6.2 in a few weeks, still.

Nice Job!

I've successfully build PyLucene 6.2 on windows. Most tests pass:
* skipped the three test_ICU* due to missing "import icu"

Yes, for this you need to install PyICU: https://github.com/ovalhub/pyicu

I'm going to assume this would work for now.

* fixed test_PyLucene.py by ignoring open file handles (os.error) in shutil.rmtree() in Test_PyLuceneWithFSStore.tearDown()

Do you have a patch for me to apply ?

Yes, attached.

Thanks, applied.

* then stuff like these in test_PythonDirectory.py
[..]
Can't make sense of this one, sorry.

* and this one in test_PythonException.py
[..]
This one could be because you may not have built JCC in shared mode ?
I vaguely remember there being a problem with proper cross-boundary exception propagation requiring JCC to be built in shared mode.

jcc.SHARED reports True, so seems OK.

I don't think these Windows glitches are really problematic, and our production code runs only in linux environments anyway. And I'm more interested in whether porting around 3kloc lucene-interfaces from v3.6 goes smoothly.

I've hit the first problematic case with an custom PythonAnalyzer/PythonTokenizer where I don't see how to pass the input to the Tokenizer implementation. I thought maybe like this, but PythonTokenizer does not accept an INPUT anymore (available in v4.10 and v3.6).

class _Tokenizer(PythonTokenizer):
  def __init__(self, INPUT):
        super(_Tokenizer, self).__init__(INPUT)
      # prepare INPUT
  def incrementToken(self):
      # stuff into termAtt/offsetAtt/posIncrAtt

class Analyzer6(PythonAnalyzer):
  def createComponents(self, fieldName):
      return Analyzer.TokenStreamComponents(_Tokenizer())

The PositionIncrementTestCase is pretty similar but initialized with static input. Would be a nice place for an example with dynamic input, I think.

This was our 3.6 approach:
class Analyzer3(PythonAnalyzer):
  def tokenStream(self, fieldName, reader):
     data = data_from_reader(reader)
     class _tokenStream(PythonTokenStream):
         def __init__(self):
              super(_tokenStream, self).__init__()
              # prepare termAtt/offsetAtt/posIncrAtt
         def incrementToken(self):
              # stuff from data into termAtt/offsetAtt/posIncrAtt
    return _tokenStream()

Any hints how to get Analyzer6 working?

I've lost track of the countless API changes since 3.x.

The Lucene project does a good job at tracking them in the CHANGES.txt file, usually pointing at the issue that tracked it, often with examples about how to accomplish the same in the new way and the rationale behind the change.

You can also look at the PyLucene tests I just ported to 6.x. For example, in test_Analyzers.py, you can see that Tokenizer no longer takes a reader but can be set one with setReader() after construction.

Andi..


--dirk

Reply via email to