On Fri, 3 Dec 2004 01:41:41 +0100
Christian Biere <[EMAIL PROTECTED]> wrote:

> Does that mean, there's a problem with such results when not using ICU? ICU
> is actually only used to match queries and canonicalize (outgoing) queries
> if I remember correctly. It shouldn't affect viewing at all.

It might be just my misunderstanding, though. Umm, I can't make it clear
since there are combinations of GTK1 with(out) ICU, GTK2 with(out) ICU.
Also I can't remember exactly from which version of gtkg, however formerly
all that seems to be Japanese (Chinese) is underscored. One day it appeared
suddenly, "Oh I can read these character, Japanese!".

By the your pointing out, I recompiled gtkg without ICU library, then had
been running, I confirmed it dosen't affect viewing. Now I must say I had
wrong guess. I used to think why ICU library will be required for libiconv
and libintl to be here.

However where outgoing query is concerned, gtkg with GTK2 and ICU is
dropping my search query (some Japanese, Chinese, Russian) automatically
with '(WARNING): dropping invalid local query ""'.

> Which GUI did you use, GTK1 or GTK2?

Mainly GTK2, but I can get similar results if I used GTK1 as well. It might
make this subject complicated, there is an another problem in GTK1; when I
try to see "Information about selected file" pulling up from bottom of search
results pane, even what is displayed completely in the search results pane
becomes the blank (i.e. at file name, SHA1, size...) in the file details panel.

> All peers must use UTF-8 and only UTF-8 encoded queries and results. There's
> probably still quite an amount of improperly encoded of those on the network

Indeed, I don't know how amounts are too. It says why I having carried out
such a question, I worry about the next stage, i.e. "The time has come, we
need to ban...".

> The underscore is probably created by gtk-gnutella itself due to a conversion
> problem (invalid or unexpected encoding). We don't use the official unicode
> replacement character there because it would often unnecessarily enforce
> UTF-8 (instead of plain ASCII) encoding of string and it's much more 
> inconvinient to handle in filenames (at least in a terminal).

OK, I can see.

> If string is not UTF-8 encoded, gtk-gnutella can only guess the used encoded
> which means it falls back to used locale character set boldly assuming that
> the user is rather interested in search results from users/machines using the
> same locale settings.

Ah, my problem might be around here. If there is a feature which can confirm
filenames of mine currently shared (I know there is number of files, its size
and LimeWire have all these feature), or emits notification when I'm trying to
share a file with invalid encoded its filename, the problem caused by encoding
is less than now for those who are annoying against bogus strings same as me,
yes, applied only to outgoing of hits on local DB in the gtkg though...

> What means "unreadable"? Only underscores and question marks, or what?

An underscore almost all, a few ASCII character caused by invalid conversion
and an ordinary ASCII character, there is no question mark. Then I've noticed
these underscored search results come from Shareaza which is avaiable only
Windows (95, 98, ME, NT, 2000, XP). As a matter of course, there is a certain
exception even if it comes from LimeWire. These exceptions make me confused
all the more :-(

> LimeWire (due to Java) uses UTF-16 internally and emits only UTF-8 encoded
> search results - I'm not sure whether composed or decomposed. During my
> tests I didn't notice too many broken results that is most results with
> (probably) Japanese filenames don't contain any characters that imply a
> conversion error.

My search query is 'limewire', 'japanese' which brings many Japanese filename.

> gtk-gnutella will only convert strings that are not valid UTF-8 encoded. I
> don't know your locale settings. If you used EUC and the remote peer sends
> ShiftJIS (which is illegal and a bug in the remote peer), the conversion
> fails and you'll see a broken string (with a lot of underscores or
> question marks). 

> For gtk-gnutella it's optimal to use a locale with UTF-8 encoding
> (and if necessary override the language setting). 

My encoding is ja_JP.EUC (LANG, LC_ALL). And there are two versions of
libiconv, I'm using locally installed GNU libiconv 1.9.2 to be enabled
extra encoding. I have a bit hesitation to enfoce whole my encoding UTF-8,
since my system dosen't have it. Well ok, I'll have to write a wrapper
script.

Thank you.

-- 
Daichi


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
Gtk-gnutella-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel

Reply via email to