To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=74034





------- Additional comments from [EMAIL PROTECTED] Thu Feb 22 08:40:44 +0000 
2007 -------
I discussed MeCab integration with MeCab developer. He told me helpful
advice like following.

=====
First of all, MeCab is designed to be independent from specific character
encodings. So it works correctly while the character encoding of input
string is the same as the one of the dictionary.

Thus, in principle, we can pass UCS-2(BE|LE) string to MeCab by the
current interface without having to create a new interface if we
encoded the MeCab dictionary by UCS-2(BE|LE). However, we need a lot
of modifications to support UCS-2(BE|LE) dictionary because MeCab uses
"char *" string and considers 0x00 as the end of string. 

In addition, the comment "All internal codes are represented in UCS2,"
in ucs.h implies that MeCab calls *_to_ucs2 functions to determine the
type of characters included in unknown words. The process for known
words and the one for unknown words are distinct. Only the latter
calls *_to_ucs2.
=====

In fact, MeCab doesn't encode and decode all UTF-8 strings by UCS2 in
vain. Writing patches for the problem seems to be very difficult and, in my
humble opinion, such patches don't affect on OOo's performance. In
conlusion, it is the practically best that OOo passes UTF8 string to
MeCab. Of course, we should set MECAB_USE_UTF8_ONLY = 1 in order to
remove useless conversion table.

I'm sorry but the legal issue has not been solved yet. The external
project pages told me how to integrate external source codes, so I
canceled a mail to mh and I will follow the instruction written in
external project website.


---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to