hi,list
i am using nutch-0.8.1 which use poi as its msword parsing solution.
it works well while dealing with English doc, even the doc file is
pretty large.
but it comes StringIndexOutOfBoundException when the doc(only one page)
is written in Chinese characters.
i try to isolate the problem, and find out that if i use
HWPFDocument.getRange().text() to read a local Chinese file, it's ok.
But in nutch's way,
DocumentInputStream->CHPBinTable->ComplexFileTable->TextPieceTable...,finally
it will meet StringIndexOutOfBoundException because the parameter in
TextPiece.substring() is negative.
I am going to do some futher study on this but wonder if anyone else has
had similar
experiences?
thanks
TKDD
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/