Hi,
Is there any way to get more details of a JavaError in PyLucene? It seems
that the method getJavaException() only returns the Exception message, but
not the full stack trace.

Background: we noticed some strange behaviour in PyLucene (JCC-version 2.2)
when trying to index a RTF (richtext) document - PyLucene reported 
JavaError: java.lang.IllegalArgumentException: term length 55296 exceeds max
term length 16383

and (more remarkably) all attemtps to index further documents in that batch
job result in
 JavaError: java.lang.NullPointerException
 in writer.addDocument(doc)

The reason is obviously the content of the RTF document (lengthy strings) -
which should have been converted to plain text before indexing.  

Having looked at the java source of Lucene it seems the Exception is raised
in DocumentsWriter where the Token lenght is checked. I understand that it
doesnt make sense to index terms of certain length, but I'd expect that
either those terms are silently ignored, or at least indexing further
documents should still work. 

Has anyone encounterd this yet or found a workaround? E.g. is it possible to
configure lucene to ignore terms of a specified length at all? (Without
raising an Exception)

Our current solution/workaround is the fetch the JavaError and close the
writer before adding further documents. That way index seems to keep in a
valid state.

I'm not sure if this is a PyLucene-related question so pls excuse in that
case (should probably post to lucene mailing list then). Anyway the
JavaError-question is certainly PyLucene related.


Kind regards

Thomas Koch
--
OrbiTeam Software GmbH & Co. KG     
http://www.orbiteam.de

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to