Hi, Sorry for sending many mails....
From: [EMAIL PROTECTED] (Craig Small) Subject: Re: Status of new search engine Date: Tue, 17 Dec 2002 22:16:51 +1100 > Ah google does it right, let's see then. > Search for you,which ifthis email client doesn't mangle it should be > ??? ?? (looks like question marks to me). > > Now, if I pick it up from the search page, I get > http://search.debian.org/new/search.en.cgi?q=%E4%B9%85%E4%BF%9D%E7%94%B0+%E6%99%BA%E5%BA%83 > and results look sensible. > > I then searched ???????? which is something to do with security > and got > http://search.debian.org/new/search.en.cgi?q=%E3%82%BB%E3%82%AD%E3%83%A5%E3%83%AA%E3%83%86%E3%82%A3%E6%83%85%E5%A0%B1&ps=10&o=0&m=and&lang= > with no results > > and > http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%E3%82%BB%E3%82%AD%E3%83%A5%E3%83%AA%E3%83%86%E3%82%A3%E6%83%85%E5%A0%B1&btnG=Google+Search > with lots of results. > > I don't understand why its not giving the right results. I said that sentence analysis should be the reason, but it may be wrong. In such a case, a Japanese word which is accidentally separated by whitespace or HTML tags should be searched well. However, this is not true. For example, http://www.debian.org/index.en.html has a word "News" and each translated page has a translated word for "News". I searched "News" in English, Russian, and Greek, and it worked well. http://search.debian.org/new/search.cgi?q=News&ps=10&o=0&m=and&lang= http://search.debian.org/new/search.cgi?q=%D0%9D%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8&ps=10&o=0&m=and&lang= http://search.debian.org/new/search.cgi?q=%CE%9D%CE%AD%CE%B1&ps=10&o=0&m=and&lang= On the other hand, I searched "News" in Japanese, Chinese, and Korean, and the result was zero. http://search.debian.org/new/search.cgi?q=%E3%83%8B%E3%83%A5%E3%83%BC%E3%82%B9&ps=10&o=0&m=and&lang= http://search.debian.org/new/search.cgi?q=%EC%83%88%EC%86%8C%EC%8B%9D&ps=10&o=0&m=and&lang= http://search.debian.org/new/search.cgi?q=%E6%9C%80%E6%96%B0%E6%B6%88%E6%81%AF&ps=10&o=0&m=and&lang= Note that some Japanese words such as http://search.debian.org/new/search.cgi?q=%E4%B9%85%E4%BF%9D%E7%94%B0&ps=10&o=0&m=and&lang= and http://search.debian.org/new/search.cgi?q=%E6%97%A5%E6%9C%AC%E8%AA%9E&ps=10&o=0&m=and&lang= are described in &#****; expression in HTML. (For example, http://www.debian.org/intl/index.ja.html ). It comes from webwml/japanese/po/langs.ja.po . However, I could not find where my name (%E4%B9%85%E4%BF%9D%E7%94%B0) comes from. Thus I imagine that pre-conversion for input for search engine may have some problem. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/