Isn't a large part of the problem that the search strings are url-encoded utf-8, resulting in 2 byte url encoded escape strings whereas Analog seems prepared to only handle the 1 byte url encodes of iso-8859? -Bill


KEVIN ZEMBOWER wrote:

> [I have no idea how the things I'm pasting in here will appear in your email readers. Hope it still makes sense. Thanks for your patience.]
>
> One of the web sites I run is for our field office in Moscow, Russia (http://www.fzr.ru). Their content is entirely in Cyrillic. So are the search terms that folks type into the Russian Google to find it. My Search Query and Search Word reports look like this:
> #pages: %pages: search term
> ------: ------: -----------
> 2: 9.09%: Ð%92Ñ%80еРнÑ%8bе поÑ%81леРÑ%81Ñ%82вРÑ%8f пÑ%80овеРенРÑ%8f пРÑ%80Ñ%81Рнга
> 2: 9.09%: Ð%9fолезнÑ%8bе пÑ%80РвÑ%8bÑ%87кРРнавÑ%8bкР2: 9.09%: Ñ%8dÑ%84Ñ%84екÑ%82РвнÑ%8bе меÑ%82оРÑ%8b леÑ%87енРÑ%8f гоноÑ%80еР2: 9.09%: Ñ%85оРÑ%83Ñ%80ока СÐ%9fÐ%98Ð%94
> 2: 9.09%: êàê ïîëüçîâàòüñÿ ïðîòèâîçà÷àòî÷íûìè òàáëåòêàìè
> 2: 9.09%: Ñ%84емРнал
> 2: 9.09%: Ð%92о влагалРÑ%89е
> 2: 9.09%: СÐ%9fÐ%98Ð%94 Ñ%80азÑ%80абоÑ%82ка Ñ%83Ñ%80оков в наÑ%87алÑ%8cной Ñ%88коле
> 1: 4.55%: related:www.infoshare.ru/
> 1: 4.55%: Ñ%84онР1: 4.55%: ноÑ%81 нÑ%8eÑ%85аÑ%82Ñ%8c амÑ%84еÑ%82амРн
> 1: 4.55%: мÑ%83зÑ%8bка в Ñ%88коле меÑ%82оРÑ%8b обÑ%83Ñ%87енРÑ%8f
> 1: 4.55%: гÑ%80Ñ%83Рное вÑ%81каÑ%80млРванРе РРеÑ%82а
> 1: 4.55%: каÑ%81каРнÑ%8bй меÑ%82оÐ
> I think this is what the actual log entries look like:
> host-212-158-203-161.bulldogdsl.com - - [27/Oct/2004:18:53:56 -0400] "GET /style.css HTTP/1.1" 404 215 "http://www.google.ru/search?q=cache:c_ArIHZtASQJ:www.healthyrussia.ru/doc.php%3Fae%3D607%26ar%3D4++site:www.healthyrussia.ru+%D0%93%D1%80%D1%83%D0%B4%D0%BD%D0%BE%D0%B5+%D0%B2%D1%81%D0%BA%D0%B0%D1%80%D0%BC%D0%BB%D0%B8%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5+%22%D0%97%D0%B4%D0%BE%D1%80%D0%BE%D0%B2%D0%B0%D1%8F+%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D1%8F%22&hl=ru"; "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
> host-212-158-203-161.bulldogdsl.com - - [27/Oct/2004:18:53:56 -0400] "GET /calendar.css HTTP/1.1" 404 218 "http://www.google.ru/search?q=cache:c_ArIHZtASQJ:www.healthyrussia.ru/doc.php%3Fae%3D607%26ar%3D4++site:www.healthyrussia.ru+%D0%93%D1%80%D1%83%D0%B4%D0%BD%D0%BE%D0%B5+%D0%B2%D1%81%D0%BA%D0%B0%D1%80%D0%BC%D0%BB%D0%B8%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5+%22%D0%97%D0%B4%D0%BE%D1%80%D0%BE%D0%B2%D0%B0%D1%8F+%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D1%8F%22&hl=ru"; "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
> host-212-158-203-161.bulldogdsl.com - - [27/Oct/2004:18:53:56 -0400] "GET /vsem.css HTTP/1.1" 404 214 "http://www.google.ru/search?q=cache:c_ArIHZtASQJ:www.healthyrussia.ru/doc.php%3Fae%3D607%26ar%3D4++site:www.healthyrussia.ru+%D0%93%D1%80%D1%83%D0%B4%D0%BD%D0%BE%D0%B5+%D0%B2%D1%81%D0%BA%D0%B0%D1%80%D0%BC%D0%BB%D0%B8%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5+%22%D0%97%D0%B4%D0%BE%D1%80%D0%BE%D0%B2%D0%B0%D1%8F+%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D1%8F%22&hl=ru"; "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
> host-212-158-203-161.bulldogdsl.com - - [27/Oct/2004:18:53:56 -0400] "GET /imgs/logo.gif HTTP/1.1" 404 219 "http://www.google.ru/search?q=ca%D0%B0%D0%BD%D0%B8%D0%B5+%22%D0%97%D0%B4%D0%BE%D1%80%D0%BE%D0%B2%D0%B0%D1%8F+%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D1%8F%22&hl=ru"; "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
>
> I'd like to change the display of the report so that my Russian-reading colleagues can read what the queries are, but I'm not having any luck. To display the Cyrillic characters on their web site, I added these lines to the Apache httpd.conf file:
> #This enables pages to be correctly displayed in Cyrillic
> AddCharset windows-1251 .html .htm .shtml
> AddLanguage ru .ru
> LanguagePriority en ru
>
> However, adding these exact lines to the section where the web statistics report is located didn't improve or change the display.
>
> For all I know, the setting is on the client, and someone with the correct settings would see the report exactly as it's displayed correctly. This report is normally password protected, but if anyone would like to test this possibility, I could move it into a publicly-accessible area temporarily.
>
> Any suggestions on what I have to change to display the terms correctly?
>
> Thanks for all your suggestions and advice.
>
> -Kevin Zembower
>
>
>
> -----
> E. Kevin Zembower
> Internet Systems Group manager
> Johns Hopkins University
> Bloomberg School of Public Health
> Center for Communications Programs
> 111 Market Place, Suite 310
> Baltimore, MD 21202
> 410-659-6139
>
> +------------------------------------------------------------------------
> | TO UNSUBSCRIBE from this list:
> | http://lists.meer.net/mailman/listinfo/analog-help
> |
> | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
> | List archives: http://www.analog.cx/docs/mailing.html#listarchives
> +------------------------------------------------------------------------
>



+------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.meer.net/mailman/listinfo/analog-help | | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------

Reply via email to