Re: UTF-8 and unit test failure for org.apache.analysis.ru.RussianStem in build with Kaffe

Ken Krugler Thu, 22 Sep 2005 08:04:45 -0700

Hi Barry,

    Hello, it's those pesky Debian Lucene package maintainers again :-).
 Lucene currently builds and passes all but one unit test against
Kaffe[0] 1.1.6.  In debugging the failure of the unit test for
org.apache.analysis.ru.RussianStem, I enabled a build of the JUnit test
reports.  A detailed account is listed in Debian Bug Report #272295[1],
but in brief, the 7-character String of Cyrillic expected is matched for
the first five characters, then an issue occurs and what appears to be a
few thousand characters are spewed out and the unit test fails.  I have
a tarball of the unit test reports temporarily stored on my FTP site[2]
if anyone would care to take a look.
    Given the recent thread about UTF-8[3], I thought I would present
this to you guys to see if you might have any insight on the issue.
Thanks in advance for your time in reading this message.

Without downloading the tarball and digging into it, one bit offeedback is that Cyrillic has numerous encodings. A common source ofproblems is that text encoded using 8859-5 (for example) is gettingidentified as KOI8-R (or vice versa), so the conversion to Unicodefails on some characters.

As to the bug report, the HTML is tagged as UTF-8, but it looks likethe text coming from the DB is using one of the legacy Cyrillicencodings. So my browser isn't very happy :)


-- Ken


[0] - http://www.kaffe.org
[1] - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=272295
[2] - ftp://www.bytemason.org/lucene_reports_2005092001.tar.gz
[3] -
http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200509.mbox/[EMAIL 
PROTECTED]


--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: UTF-8 and unit test failure for org.apache.analysis.ru.RussianStem in build with Kaffe

Reply via email to