- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: HonestQiao Subject: Re: Hot to segment for UTF-8 of China when index on DB full text search?
I reinstall dps, but it can't segm Chinese. www#wget -d http://www.dataparksearch.org/add-on/mandarin.freq.gz www#gzip -d mandarin.freq.gz www#wget -d http://www.dataparksearch.org/dpsearch-4.45-28012007.tar.gz www#tar xzvf dpsearch-4.45-28012007.tar.gz www#cd dpsearch-4.45-28012007 www#./configure --prefix=/usr/local/dpsearch --with-extra-charsets=chinese --with-mysql www#make && make install www#cp ../mandarin.freq /usr/local/dpsearch/etc/ www# diff indexer.conf indexer.conf-dist 68,69c68,69 < #DBAddr mysql://foo:[EMAIL PROTECTED]/search/?dbmode=cache < DBAddr mysql://search:[EMAIL PROTECTED]/search/?dbmode=single --- > DBAddr mysql://foo:[EMAIL PROTECTED]/search/?dbmode=cache > 164c164 < LocalCharset UTF-8 --- > #LocalCharset UTF-8 291d290 < LoadChineseList GB2312 mandarin.freq 706c705 < DefaultLang zh --- > #DefaultLang en 837c836 < RemoteCharset UTF-8 --- > #RemoteCharset iso-8859-1 1027,1041d1025 < < HTDBAddr mysql://search:[EMAIL PROTECTED]/db_test_com/ < HTDBLimit 512 < < Limit t:tag < Tag works < HTDBList "SELECT SQL_NO_CACHE id FROM article" < HTDBDoc "SELECT SQL_NO_CACHE concat(\ < 'HTTP/1.0 200 OK\\r\\n',\ < 'Content-type: text/html\\r\\n',\ < 'Last-Modified: ',FROM_UNIXTIME(a.lasttime,'%a, %d %b %Y %H:%i:%s GMT'),'\\r\\n',\ < '\\r\\n',\ < '<html><head><title>',b.body,'</title></head><body>TAG:',a.tag,' UID:',a.uid,' WORD:',b.body,'</body></html>') \ < FROM article as a LEFT JOIN content as b USING(id) WHERE a.id='$2'" < Server htdb:/works/ \ No newline at end of file www# www# diff search.htm search.htm-dist 17,20c17 < #DBAddr mysql://foo:[EMAIL PROTECTED]/search/?dbmode=cache < DBAddr mysql://search:[EMAIL PROTECTED]/search/?dbmode=single < < LoadChineseList GB2312 mandarin.freq --- > DBAddr mysql://foo:[EMAIL PROTECTED]/search/?dbmode=cache 32,33c29,30 < LocalCharset UTF-8 < BrowserCharset UTF-8 --- > LocalCharset iso-8859-1 > BrowserCharset iso-8859-1 www# www#cat langmap.conf LangMapFile langmap/zh.utf8.lm Indexer can get data. But in table dict , word wasn't be segment. And I use search.cgi , If I dont use "Search for:Substring", the search result return nothing. And my msn is [EMAIL PROTECTED] Can you helo online? Thanks. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1170316385
