CJKTokenizer is bigram based: C1C2C3 ==> C1C2 C2C3, so you it will result return when you search C2C1,
briefly: CJKTotenizer is better than StandardTokenizer for CJK but I don't know how to implement bigram based token in StandartTokenzier.
Che Dong http://www.chedong.com/tech/lucene.html
From: Erik Hatcher <[EMAIL PROTECTED]>with Microsoft SMTPSVC(5.0.2195.6824); Fri, 27 Feb 2004 05:29:21 -0800
Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Subject: Re: CJK Analyzer in lucene 1.3 final
Date: Fri, 27 Feb 2004 08:29:10 -0500
MIME-Version: 1.0 (Apple Message framework v612)
Received: from mail.apache.org ([208.185.179.12]) by mc11-f27.hotmail.com
Received: (qmail 58976 invoked by uid 500); 27 Feb 2004 13:29:16 -0000daedalus.apache.org with SMTP; 27 Feb 2004 13:29:15 -0000
Received: (qmail 58962 invoked from network); 27 Feb 2004 13:29:15 -0000
Received: from unknown (HELO c000.snv.cp.net) (209.228.32.77) by
Received: (cpmta 24544 invoked from network); 27 Feb 2004 05:29:16 -0800(209.228.32.77) with SMTP; 27 Feb 2004 05:29:16 -0800
Received: from 128.143.26.2 (HELO ?128.143.26.2?) by smtp.hatcher.net
X-Message-Info: JGTYoYF78jEAnq90Su6PQLeCibywrZOE[EMAIL PROTECTED]
Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
Precedence: bulk
List-Unsubscribe: <mailto:[EMAIL PROTECTED]>
List-Subscribe: <mailto:[EMAIL PROTECTED]>
List-Help: <mailto:[EMAIL PROTECTED]>
List-Post: <mailto:[EMAIL PROTECTED]>
List-Id: "Lucene Users List" <lucene-user.jakarta.apache.org>
Delivered-To: mailing list [EMAIL PROTECTED]
X-Sent: 27 Feb 2004 13:29:16 GMT
In-Reply-To: <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>
Message-Id: <[EMAIL PROTECTED]>
X-Mailer: Apple Mail (2.612)
X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
Return-Path:
X-OriginalArrivalTime: 27 Feb 2004 13:29:21.0631 (UTC)FILETIME=[B57A96F0:01C3FD35]
On Feb 27, 2004, at 7:12 AM, Ankur Goel wrote:Hi,
In the lucene-1.3-final version's CHANGES.txt it is written that "Fix
StandardTokenizer's handling of CJK characters (Chinese, Japanese and Korean
ideograms)."
Does it mean that for CJK characters we now do not need to use any separate
analyzer, standard analyzer will be sufficient??
You tell us. Does it work for you?
An analyzer is a pretty personal decision based on your dataset, so it is impossible to answer your question directly.
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
_________________________________________________________________
免费下载 MSN Explorer: http://explorer.msn.com/lccn/
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]