Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
> Thank you!
> 
> The problem with search.cgi was really because of the changed format 
> search.htm
> But I have problems with encodings (e.g. Cyrillic windows-1251 or UTF-8).
> I installed both versions of mnogosearch with separate bases, but with the 
> same settings.
> The old version works fine, but the new one has problems.
> 
> Encoding settings:
> indexer.conf
>   RemoteCharset windows-1251
>   LocalCharset UTF-8
> 
> search.htm
>   string BrowserCharset= "windows-1251";
>   string LocalCharset= "UTF-8";
> 

Please start investigating the problem from checking data
in the database. It's important to make sure that indexer
collects data in true utf8.

What does this query return:

SELECT word, hex(word) FROM bdict WHERE word RLIKE '^[^a-z]$' LIMIT 30;

?

> 
> 1) The New version requires that the base encoding by default coincided with 
> LocalCharset:
> ALTER DATABASE `mnogosearch_new` DEFAULT CHARACTER SET utf8 COLLATE 
> utf8_unicode_ci;
> 
> Otherwise, you get the message in stderr:
> An error occurred!
> DB: MySQL driver: #1267: Illegal mix of collations 
> (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
> 
> 
> 2) With the same settings in  indexer.conf and search.htm  the search in the 
> Cyrillic is not working in the new version of mnogosearch.
> Setting of BrowserCharset= "UTF-8" does not change anything.
> 
> Your search - "агент" - did not match any documents.
> 
> Debug log:
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start UdmFind
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start Prepare
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  Prepare              
>    0.00
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start FindWords
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start FindWordsDB for 
> mysql://mnogosearch_new:***@localhost/mnogosearch_new/?dbmode=blob&SetNames=UTF-8
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start loading limits
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} WHERE limit loaded. 149 
> URLs found
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  loading limits       
>    0.01 (149 URLs found)
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start fetching words
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start search for 
> 'агенСM-^B'
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start fetching
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  FindWordsDB:         
>    0.01
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start UdmQueryConvert
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  UdmQueryConvert:     
>    0.00
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start Excerpts
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  Excerpts:            
>    0.00
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start WordInfo
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  WordInfo:            
>    0.00
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  UdmFind:             
>    0.01
> 
> 
> 3) When searching for words in the Latin, the base gives the text fragments 
> in the correct Cyrillic, but the header of each retrieved document is always 
> issued in the wrong encoding:
> navigator : 405       
> Results 1-10 of 99 ( 0.021 seconds)
>       ?“?»?°?????°??   [ 15.095% Popularity: 0.89705 ]
> ... сети Интернет по адресу: http://navigator***.ru Прежде чем приобрести ...
> 
> 
> 
> I would be very grateful for help with solving the last two problems.
> 
> Generally, when we install programs, they have the possibility of issuing 
> various warning messages.
> It would be nice if a new version of mnogosearch will warn about occurred 
> serious changes.
> I set up our old CMS to the new server and there are possible experiments. 
> But if a new version of mnogosearch will installed as one of the updates to 
> the server under working loads, then there would be a complete disaster.
> 
> 
> 
> Regarding to a long hang of mnogosearch indexing.
> I found that this is due to the very slow network retrieval of large PDF 
> documents.
> I tried to set minimum limits of timeouts, but it does not help.
> MaxNetErrors 10
> ReadTimeOut 10s
> DocTimeOut 30s
> 
> For example, I tried to set a time limit of 300s indexing, but indexing took 
> 1360s. Moreover, the document was not indexed.
> /usr/local/bin/indexer -ob -v6 -N 1 -c 300 
> /usr/local/etc/mnogosearch/indexer.conf 2> /var/log/mnogosearch.log
> ------------------
> Done (1360 seconds, 1 documents, 11049522 bytes,  7.93 Kbytes/sec.)
> 
> I sent you the log of attempt of indexing this one document.
> 
> When I set: 
> Disallow *.pdf
> indexing is fast.
> 
> Why is setting of time limits doesn't help? How can avoid such lockups of the 
> indexing process?
> 

Reply: <http://www.mnogosearch.org/board/message.php?id=21777>

_______________________________________________
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

Reply via email to