Author: Alexander Barkov Email: b...@mnogosearch.org Message: > Thank you! > > The problem with search.cgi was really because of the changed format > search.htm > But I have problems with encodings (e.g. Cyrillic windows-1251 or UTF-8). > I installed both versions of mnogosearch with separate bases, but with the > same settings. > The old version works fine, but the new one has problems. > > Encoding settings: > indexer.conf > RemoteCharset windows-1251 > LocalCharset UTF-8 > > search.htm > string BrowserCharset= "windows-1251"; > string LocalCharset= "UTF-8"; >
Please start investigating the problem from checking data in the database. It's important to make sure that indexer collects data in true utf8. What does this query return: SELECT word, hex(word) FROM bdict WHERE word RLIKE '^[^a-z]$' LIMIT 30; ? > > 1) The New version requires that the base encoding by default coincided with > LocalCharset: > ALTER DATABASE `mnogosearch_new` DEFAULT CHARACTER SET utf8 COLLATE > utf8_unicode_ci; > > Otherwise, you get the message in stderr: > An error occurred! > DB: MySQL driver: #1267: Illegal mix of collations > (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=' > > > 2) With the same settings in indexer.conf and search.htm the search in the > Cyrillic is not working in the new version of mnogosearch. > Setting of BrowserCharset= "UTF-8" does not change anything. > > Your search - "агент" - did not match any documents. > > Debug log: > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start UdmFind > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start Prepare > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop Prepare > 0.00 > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start FindWords > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start FindWordsDB for > mysql://mnogosearch_new:***@localhost/mnogosearch_new/?dbmode=blob&SetNames=UTF-8 > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start loading limits > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} WHERE limit loaded. 149 > URLs found > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop loading limits > 0.01 (149 URLs found) > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start fetching words > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start search for > 'агенСM-^B' > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start fetching > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop FindWordsDB: > 0.01 > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start UdmQueryConvert > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop UdmQueryConvert: > 0.00 > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start Excerpts > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop Excerpts: > 0.00 > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start WordInfo > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop WordInfo: > 0.00 > May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop UdmFind: > 0.01 > > > 3) When searching for words in the Latin, the base gives the text fragments > in the correct Cyrillic, but the header of each retrieved document is always > issued in the wrong encoding: > navigator : 405 > Results 1-10 of 99 ( 0.021 seconds) > ?“?»?°?????°?? [ 15.095% Popularity: 0.89705 ] > ... сети Интернет по адресу: http://navigator***.ru Прежде чем приобрести ... > > > > I would be very grateful for help with solving the last two problems. > > Generally, when we install programs, they have the possibility of issuing > various warning messages. > It would be nice if a new version of mnogosearch will warn about occurred > serious changes. > I set up our old CMS to the new server and there are possible experiments. > But if a new version of mnogosearch will installed as one of the updates to > the server under working loads, then there would be a complete disaster. > > > > Regarding to a long hang of mnogosearch indexing. > I found that this is due to the very slow network retrieval of large PDF > documents. > I tried to set minimum limits of timeouts, but it does not help. > MaxNetErrors 10 > ReadTimeOut 10s > DocTimeOut 30s > > For example, I tried to set a time limit of 300s indexing, but indexing took > 1360s. Moreover, the document was not indexed. > /usr/local/bin/indexer -ob -v6 -N 1 -c 300 > /usr/local/etc/mnogosearch/indexer.conf 2> /var/log/mnogosearch.log > ------------------ > Done (1360 seconds, 1 documents, 11049522 bytes, 7.93 Kbytes/sec.) > > I sent you the log of attempt of indexing this one document. > > When I set: > Disallow *.pdf > indexing is fast. > > Why is setting of time limits doesn't help? How can avoid such lockups of the > indexing process? > Reply: <http://www.mnogosearch.org/board/message.php?id=21777> _______________________________________________ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general