Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hmm. It seems your MySQL is client is not configured well.
It's using latin1 as a connection character set, while the
display is onviously utf8. So it prints garbage instead of
Cyrillic letters.

You can check this using "show variables like 'character_set%';".
It seems character_set_connection is latin1.

In order to see Cyrillic letters, you can try:

- mysql --default-character-set=utf8
- or put default-character-set=utf8 into my.cnf
- or run "SET NAMES utf8" immediately after connecting

Note, this does not affect the way how indexer works.
It's only for the "mysql" client.


> The results are the same for both bases.

They are not. Hex codes are different.
The old database contains Cyrillic codes,
the new database contains something different for the same
strings:


This is wrong:

| 30летних               | 
3330C390C2BBC390C2B5C391E2809AC390C2BDC390C2B8C391E280A6 |

This is correct:
| 30летних               | 3330D0BBD0B5D182D0BDD0B8D185 |



Try adding "SetNames=utf8" in the DBAddr string in indexe.conf in the 
new database, like this:

DBAddr mysql://root@localhost/test/?SetNames=utf8

then clean the database and crawl and index again.


> 
> mysql> use mnogosearch_new;
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
> Database changed
> mysql> SELECT word, hex(word) FROM bdict WHERE word NOT RLIKE 
> '^[a-z0-9?#_]*$' LIMIT 30;
> +------------------------------+----------------------------------------------------------+
> | word                         | hex(word)                                    
>             |
> +------------------------------+----------------------------------------------------------+
> | 000в                        | 303030C390C2B2                               
>             |
> | 099в                        | 303939C390C2B2                               
>             |
> | 107рѕ                      | 313037C391E282ACC391E280A2                   
>             |
> | 10млн                     | 3130C390C2BCC390C2BBC390C2BD                 
>             |
> | 11в                         | 3131C390C2B2                                 
>             |
> | 18в                         | 3138C390C2B2                                 
>             |
> | 1970Ñ…                       | 31393730C391E280A6                           
>             |
> | 1980г                       | 31393830C390C2B3                             
>             |
> | 1в                          | 31C390C2B2                                   
>             |
> | 1Ñ€                          | 31C391E282AC                                 
>             |
> | 2001г                       | 32303031C390C2B3                             
>             |
> | 2002рі                     | 32303032C391E282ACC391E28093                 
>             |
> | 2004г                       | 32303034C390C2B3                             
>             |
> | 2006г                       | 32303036C390C2B3                             
>             |
> | 2008г                       | 32303038C390C2B3                             
>             |
> | 2009г                       | 32303039C390C2B3                             
>             |
> | 2009рі                     | 32303039C391E282ACC391E28093                 
>             |
> | 2011г                       | 32303131C390C2B3                             
>             |
> | 2012рі                     | 32303132C391E282ACC391E28093                 
>             |
> | 20Ñ                         | 3230C391C281                                  
>            |
> | 30летних               | 
> 3330C390C2BBC390C2B5C391E2809AC390C2BDC390C2B8C391E280A6 |
> | 3летний                | 
> 33C390C2BBC390C2B5C391E2809AC390C2BDC390C2B8C390C2B9     |
> | 40в                         | 3430C390C2B2                                 
>             |
> | 41в                         | 3431C390C2B2                                 
>             |
> | 48в                         | 3438C390C2B2                                 
>             |
> | 599в                        | 353939C390C2B2                               
>             |
> | 59в                         | 3539C390C2B2                                 
>             |
> | 600в                        | 363030C390C2B2                               
>             |
> | 60в                         | 3630C390C2B2                                 
>             |
> | 90Ñ…                         | 3930C391E280A6                               
>             |
> +------------------------------+----------------------------------------------------------+
> 30 rows in set (0,00 sec)
> 
> 
> 
> mysql> use mnogosearch;
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
> Database changed
> mysql> SELECT word, hex(word) FROM bdict WHERE word NOT RLIKE 
> '^[a-z0-9?#_]*$' LIMIT 30;
> +------------------------------+------------------------------+
> | word                         | hex(word)                    |
> +------------------------------+------------------------------+
> | 000в                        | 303030D0B2                   |
> | 099в                        | 303939D0B2                   |
> | 107рѕ                      | 313037D180D195               |
> | 10млн                     | 3130D0BCD0BBD0BD             |
> | 11в                         | 3131D0B2                     |
> | 18в                         | 3138D0B2                     |
> | 1970Ñ…                       | 31393730D185                 |
> | 1980г                       | 31393830D0B3                 |
> | 1в                          | 31D0B2                       |
> | 1Ñ€                          | 31D180                       |
> | 2001г                       | 32303031D0B3                 |
> | 2002рі                     | 32303032D180D196             |
> | 2004г                       | 32303034D0B3                 |
> | 2006г                       | 32303036D0B3                 |
> | 2008г                       | 32303038D0B3                 |
> | 2009г                       | 32303039D0B3                 |
> | 2009рі                     | 32303039D180D196             |
> | 2011г                       | 32303131D0B3                 |
> | 2012рі                     | 32303132D180D196             |
> | 20Ñ                         | 3230D181                     |
> | 30летних               | 3330D0BBD0B5D182D0BDD0B8D185 |
> | 3летний                | 33D0BBD0B5D182D0BDD0B8D0B9   |
> | 40в                         | 3430D0B2                     |
> | 41в                         | 3431D0B2                     |
> | 48в                         | 3438D0B2                     |
> | 599в                        | 353939D0B2                   |
> | 59в                         | 3539D0B2                     |
> | 600в                        | 363030D0B2                   |
> | 60в                         | 3630D0B2                     |
> | 90Ñ…                         | 3930D185                     |
> +------------------------------+------------------------------+
> 30 rows in set (0,00 sec)
> 

Reply: <http://www.mnogosearch.org/board/message.php?id=21781>

_______________________________________________
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

Reply via email to