Re: Which does "A" in "DSA" mean, "Alert" or "Advisory"?
Hi, From: Gerfried Fuchs <[EMAIL PROTECTED]> Subject: Re: Which does "A" in "DSA" mean, "Alert" or "Advisory"? Date: Thu, 20 May 2004 16:14:17 +0200 > > I also changed "SuSE Security Alert" to "SuSE Security Announcement". > > Where? Have you checked if that naming was correct at the time of the > writing? I guess they call it even differently these days, because they > are now SUSE and not SuSE anymore, too. I guess we shall stick with the > naming that they used at the time of the writing Here is the page linked as "SuSE Security Announcement" from security/1999/19990330.wml (Previously "SuSE Security Alert"). http://seclists.org/lists/bugtraq/1999/Mar/0216.html This page has a title of "Bugtraq: SuSE Security Announcement - XFree86". The page contains a PGP-signed message which include a title of "SeSE Security Announcement" (though I have not validate the sign), thus I guess it is not likely that the page was modified later. I have not checked further. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Debian web page validation
Hi, Alfie is kind enough to run Debian web page validation pages. http://people.debian.org/~alfie/validate/ However, a validation page for Japanese http://people.debian.org/~alfie/validate/ja seems to have a problem for a few weeks, like: /home/alfie/bin/validate.sh: /home/alfie/extra/bin/validate: No such file or directory Pages for other languages don't have this problem. Alfie or someone, could you please fix this situation? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: Which does "A" in "DSA" mean, "Alert" or "Advisory"?
Hi, From: Matt Kraai <[EMAIL PROTECTED]> Subject: Re: Which does "A" in "DSA" mean, "Alert" or "Advisory"? Date: Fri, 14 May 2004 03:08:32 -0700 > The security team is the authoritative source, so the correct term > is "advisory". Please change "alert" to "advisory" where you find > it. I have done. The following is a list of changed flles. english/index.wml english/MailingLists/desc/devel/debian-security.wml english/News/weekly/2000/23/index.wml english/News/weekly/2001/11/index.wml english/News/weekly/2001/34/index.wml english/security/index.wml english/security/1997/index.wml english/security/1998/index.wml english/security/1999/index.wml english/security/2000/index.wml english/security/2001/index.wml english/security/2002/index.wml english/security/2003/index.wml english/security/2004/index.wml english/security/undated/index.wml I also changed "SuSE Security Alert" to "SuSE Security Announcement". --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: Which does "A" in "DSA" mean, "Alert" or "Advisory"?
Hi, From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Subject: Which does "A" in "DSA" mean, "Alert" or "Advisory"? Date: Wed, 12 May 2004 12:49:18 +0900 (LDT) > For example, http://www.debian.org/security/index.en.html shows > a list of "alerts". On the other hand, page for each item (ex. > http://www.debian.org/security/2004/dsa-502.en.html) has a title > of "Debian Security Advisory" which implies "A" in "DSA" means > "advisory". My last mail seems not very clear. In short, my concern is: - Are these usages of terms "alert" and "advisory" intended? Or, just vacillations? - If intended, how "alert" or "advisory" is chosen in each page? - If vacillations, I propose unification of these terms. However, I have no idea which ("alert" or "advisory") should be used. Any comments? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Which does "A" in "DSA" mean, "Alert" or "Advisory"?
Hi, Japanese translation team found that the term "advisory" has been translated into two Japanese words from pages to pages. Thus, we would like to decide a unified Japanese word. However, we also found during the unification work that English pages have ambiguity around usage of terms of "alert" and "advisory". For example, http://www.debian.org/security/index.en.html shows a list of "alerts". On the other hand, page for each item (ex. http://www.debian.org/security/2004/dsa-502.en.html) has a title of "Debian Security Advisory" which implies "A" in "DSA" means "advisory". IMO, Japanese translation team is able to assign different Japanese word for each of "advisory" and "alert", if each of English page intentionally chooses "advisory" or "alert". However, I doubt. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
what is "redaction"?
Hello, When I translate Debian webpages into Japanese I found a sentence which I cannot understand. webwml/english/devel/todo/items/60dwn.wml reads: Debian Weekly News is looking for contributors. They need some people who keep in touch with mailing lists and news websites to help in the weekly redaction of what's happening in the Debian world. ~ What is "redaction"? My dictionary doesn't have the word. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: [tux-master@web.de: Fixed a few wrong URLs in Japanese translation of www.debian.org]
Hi, From: Jens Seidel <[EMAIL PROTECTED]> Subject: [EMAIL PROTECTED]: Fixed a few wrong URLs in Japanese translation of www.debian.org] Date: Tue, 16 Mar 2004 16:41:14 +0100 > I wrote a small script which compares URLs in english/ with URLs in > translations and found a few wrong URLs. > > Please check the attached patch, it's possible that a few of my changes > are related to outdated translations or minor changes to reflect > Japanese style. > It's hard for me to compare because the order of links changed. > Nevertheless I'm sure I will many find more errors after refining my > script (but first I want to see a few commits). Thank you for this check. I found it is useful. Some of diffs have been fixed. How about putting the script in webwml/ directory so that translators can check it? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: I don't understand News/2000/20000216.wml
Hi, From: Jens Seidel <[EMAIL PROTECTED]> Subject: Re: I don't understand News/2000/2216.wml Date: Mon, 15 Mar 2004 01:42:35 +0100 > As far as I understand it means something similar to > "Scan.pm y2k problem Date: fixed missing message's date > (was incorrect in 2000 or later)." I found the following description in the changelog of im: > 135 (2000/01/05) mew-dist release > > * Y2K fixes for broken year and no Date: field. > SAITO Tetsuya <[EMAIL PROTECTED]> Though the version (135) is very different from the Debian page's description (1:100-3), I found the following mail (Japanese) and I think it is the original bug report. http://www.mew.org/ml/mew-dist-1.94/msg01382.html Here is a digest of the mail: My name is Hosono. When a message doesn't have "Date:" field and the date (I - Kubota - guess it means the date when im is invoked) is 2000, a problem occur. The year is displayed as "100", like following: 31 100/01/01 Re: [linux-users:63039] Re: PCMCIA... Here I post a tiny patch to fix this problem. This patch has been already applied to the im package of Kondara MNU/Linux distribution. Here I think the meaning of the sentence is clear. I propose to rewrite the sentence in the wml file like following: Scan.pm y2k problem: Messages without "Date:" fields will be processed wrongly in 2000 or later. > KUBOTA, what about changing the indentation of the colons? The Japanese > file looks in konqueror similar to: > package: im > version: 1:100-3 > architectures: all > issue : Scan.pm y2k problem Date: filed missing message's Please see the page using Japanese-enabled environment. The colons are already well aligned. I imagine your browser cannot handle doublewidth characters. > PS: Three weeks ago I sent you two mails related to wrong URLs in Japanese > translation. You never replied ... I tried to fix most of these already > but it would be easier if I could speak Japanese :-)) Is the message sent me directly, or to debian-www? I could not find a direct message to me. I am afraid I lost the message among spams --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
I don't understand News/2000/20000216.wml
Hi, During updating Japanese translation of Debian webwml, I found a possible typo in English page. The page is News/2000/2216.wml. -- package : im version : 1:100-3 architectures: all issue: Scan.pm y2k problem Date: filed missing message's date (was incorrect in 2000 or later). -- In the "issue", I imagine "filed" should be "field". Am I right? Also, I don't understand the whole "issue" sentence. Where is the subject, where is the verb, and is this grammatically correct? I might understand the sentence wrongly and the "filed" might be correct --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
dsa-455.wml: native expression of a personal name.
Hello, I added a native expression of a personal name (reporter of the original vulnerability) in webwml/security/2004/dsa-455.wml. In case of Japanese, there are often several possible candidates of Kanji names which share one reading. (Latin alphabet expression is just a transliteration and drops any information other than reading). Of course only one out of them is "correct" for each person. In case of "Yuuichi Teranishi", there are many possible candidates of Kanji expression of "Yuuichi" and of course only one of them is correct and other candidates are wrong. (Of course there are other "Yuuichi"s whose Kanji expressions are different.) On the other hand, there is virtually one candidate for "Teranishi". Here I explain how to find the correct Kanji expression for *this* Yuuichi. Now, the following URL shows a report of this vulnerability. http://mail.gnome.org/archives/xml/2004-February/msg00070.html The text has his mail address and GPG public key. Next, I found the following URL: http://emacs-w3m.namazu.org/ml/msg06080.html The text has the same mail address and points the same URL for the GPG public key. Thus these two messages are written by the same person. The latter has Kanji expression. It is what I searched for. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Bug#235590: www.debian.org: translation statistics pages lack distrib.po
Package: www.debian.org Version: 20040301 Severity: normal http://www.debian.org/devel/website/stats/ pages have statistics on po/*.po files. However, it lacks distrib.po. I imagine webwml/english/po/Makefile lacks something on distrib.po. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Bug#227273: packages.debian.org: charset mismatch (always in UTF-8?)
Hi, From: Hideki Yamane <[EMAIL PROTECTED]> Subject: Bug#227273: packages.debian.org: charset mismatch (always in UTF-8?) Date: Mon, 16 Feb 2004 23:30:51 +0900 > # Sorry, I thought that it is not such a difficult thing. >Because Debian web contents use ISO-2022-JP. It is because Debian web pages are converted into ISO-2022-JP *at the very final stage* of the generation. Thus, if the conversion will be the very final process in packages.debian.org, the pages will be OK. However, even in this case, if someone were add some extra item at the last stage of the generation process, Japanese pages would be easily broken. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Bug#227273: packages.debian.org: charset mismatch (always in UTF-8?)
Hi, From: Hideki Yamane <[EMAIL PROTECTED]> Subject: Bug#227273: marked as done (packages.debian.org: charset mismatch (always in UTF-8?)) Date: Mon, 16 Feb 2004 18:27:31 +0900 > http://packages.debian.org/unstable/x11/9menu.ja.html is nice, > it works without mojibake. good. > > but, see, description of 9menu in list page > http://packages.debian.org/unstable/x11/index.ja.html caused mojibake. > wrong characters are there. I checked the index.ja.html and found: 1. Starting escape sequence is missing for many descriptions. Explanation on starting escape sequences: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=227273&msg=31 2. For example, 0x3C in Japanese "state" in ISO-2022-JP is wrongly regarded as ASCII "<" and then converted into "<". I now STRONGLY insist that ISO-2022-JP should not be used because it is difficult to fix all such problems. I recommend EUC-JP. Both of above problems will be automatically solved (and more importantly, we won't have to consider such problems when we will want to modify or improve the generating scripts). Hideki, do you prefer ISO-2022-JP even now? It is Japanese people who suffer such endless problems! If we stick to use ISO-2022-JP, both maintainers and users of Japanese web pages would suffer such problems forever. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Please give him a CVS COMMIT account (Re: There are 'three' conferences in that announcement.)
Hello, I am a Japanese translation cordinator. Please give him a commit account for Debian webwml repository. He has translated Debian Weekly News for Japanese for more than a year. If he will have an account, Japanese pages will be able to catch up with English pages more quickly. Also, by reading English version of DWN regularly and carefully for translation, he will able to find typos in the English pages. (Since Japanese language is very different from English or other European languages, Japanese people have to read English pages *very* carefully to translate them. From some point of view, even Chinese is similar to English while Japanese is not.) From: Nobuhiro IMAI <[EMAIL PROTECTED]> Subject: Re: There are 'three' conferences in that announcement. Date: Sat, 07 Feb 2004 04:04:27 +0900 (JST) > > You have to check with your translation coordinator, they will have to > > request it from the webmaster team. > > Thanks, I'll do so. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/ pgpefcZpRiYcB.pgp Description: PGP signature
Bug#227273: www.debian.org: Japanese DDTP files are provided with EUC-JP endoding.
From: Frank Lichtenheld <[EMAIL PROTECTED]> Subject: Bug#227273: www.debian.org: Japanese DDTP files are provided with EUC-JP endoding. Date: Thu, 29 Jan 2004 01:25:26 +0100 > On Thu, Jan 29, 2004 at 12:16:19AM +0100, Frank Lichtenheld wrote: > > I used the Perl module Text::Iconv which itself uses iconv(3) > > This module seems to suck or I am to dump to use it. If I convert the > > raw Japanese Packages file with iconv(1) (which probably uses iconv(3), > > too) all escape sequences seem to be generated correctly, if I use > > Text::Iconv->convert, only the very first one is. > > Correction, it only forgets the very last escape sequence since this one is > not generated by iconv(3). It "forgets" to clear the state at the end of > the conversion which I found out in comaprison with iconv(1) that handles > this case correctly. I prepared a patch and will file a bug against the > package. I tested on gluck (packages.debian.org): (a) $ echo -en '\xa4\xa2' | iconv -f EUC-JP -t ISO-2022-JP | od -t x1 000 1b 24 42 24 22 1b 28 42 010 The last three bytes is the closing escape sequence. Thus iconv(1) works well. Next, I wrote the following script: (b) #!/usr/bin/perl use Text::Iconv; $conv = Text::Iconv->new("EUC-JP", "ISO-2022-JP"); $a=""; while(<>){ $a .= $_; } $b = $conv->convert($a); print $b; Then (c) $ echo -ne '\xa4\xa2' | ./a.pl | od -t x1 000 1b 24 42 24 22 005 In this case, closing escape sequence is missing. However, if the source string has some following characters after JIS X 0208 Japanese characters, like: (d) $ echo -e '\xa4\xa2' | ./a.pl |od -t x1 000 1b 24 42 24 22 1b 28 42 0a 011 (e) $ echo -ne '\xa4\xa2\x41' | ./a.pl |od -t x1 000 1b 24 42 24 22 1b 28 42 41 011 Then the closing escape sequence is added. Explanation: In the case of (e), it is clear that closing escape sequence is needed. In case of (d), it is also needed because ISO-2022-JP requires that when Line Feed appears the "state" must be ASCII. In case of (c), Text::Iconv does not know whether the following string will be Japanese or ASCII. Addition of closing escape sequence would be redundant if Japanese would follow. I imagine this is why Text::Iconv does not add closing escape sequence in this case. I think the safest way is to use Text::Iconv to convert the whole web page at one time. (Or, at least the whole line (logical line which ends with Line Feed code) at one time.) --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Bug#227273: www.debian.org: Japanese DDTP files are provided with EUC-JP endoding.
Hi, From: Hideki Yamane <[EMAIL PROTECTED]> Subject: Bug#227273: www.debian.org: Japanese DDTP files are provided with EUC-JP endoding. Date: Sun, 25 Jan 2004 01:10:41 +0900 > * tag is OK. That says "content="text/html; charset=ISO-2022-JP"". > * It looks like contents is not valid ISO-2022-JP. I don't know why. >Frank, would you tell me the way how did you convert it from EUC-JP >to ISO-2022-JP ? I checked http://packages.debian.org/unstable/misc/language-env.ja.html and found that closing escape sequences are missing. ISO-2022-JP is a "stateful" encoding. It means that a string consists of escape sequences to determine the "state" and ordinary codes whose meaning (corresponding characters) depends on the "state". For example, is: 1B 24 42 24 22 1B 28 42 where 1B 24 42 (the starting three bytes) means "here starts JIS X 0208 Japanese", 24 22 (following two bytes) is Japanese Hiragana A and the following 1B 28 42 means "here starts ASCII". In Japanese state, 24 22 means Japanese Hiragana A while in ASCII state it means Dollar and Double Quatation. I said closing escape sequences are missing. This means the "here starts ASCII" part is missing. Thus, all of the following ASCII characters (including HTML tags) are regarded as Japanese and causes Mojibake. I don't know what algorithm is used for generating the page, so I have no idea the reason of this broken page. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
"Debian Installer" in devel/debian-installer/index.wml
Hello, During checking Japanese translation of devel/debian-installer/index.wml, I noticed one point. "Debian-Installer" is (according to my understanding) a name (proper noun) of a specific installer system (which will be a successor or "boot-floppies"), not just a common noun which can be substituted by "installer of Debian". However, Debian-Installer is expressed in various ways, like "Debian Installer" and so on. Because of this, translators have difficulty distinguishing Debian installer as a common noun and Debian-Installer as a proper noun. (Usage of capital letter cannot always be a test, because many software have lowercase names, like "boot-floppies".) Could someone clarify the distinction? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Bug#227273: packages.debian.org: charset mismatch (always in UTF-8?)
Hi, > Hmm, I have difficulties to understand what you mean. I will try to > formulate your report in my own words: > Currently the Japanese pages are served in UTF-8 (is this right?), > but you request that we serve it in iso-2022-jp instead, because > UTF-8 causes problems in reading the pages. > > Have I understood you correctly? I will explain. (I am the author of the "Mojibake" page which Yamane-san introduced.) For example, please see http://packages.debian.org/stable/misc/language-env.ja.html The HTML source of the page says the page is written in UTF-8. (The 6th line of ). However, in the reality, the page is written in EUC-JP. Because of this inconsistency, web browsers will render the page by assuming the page is UTF-8 and the result will be the Mojibake. The main point of the problem is this inconsistency. Thus, at least, this inconsistency must be fixed. There are several ways to fix this problem. (a) Change the encoding of the page to UTF-8 (to match the 6th line). (b) Change the 6th line to EUC-JP (to match the real content). (c) Change both the 6th line and the encoding of the page to some other encoding (for example ISO-2022-JP). Yamane-san asks to choose the solution (c). This is because ISO-2022-JP is the best encoding for Japanese web page because of the least possibility to Mojibake even when web browsers cannot understand the 6th line. I agree that (c) is the best solution but I don't think (a) and (b) are unacceptable at all. I think EUC-JP will be acceptable (solution (b)), because recent web browsers are likely to understand the 6th line. (However, UTF-8 (solution (a)) should be avoided if possible, because some browsers such as w3m (popular in Japan) cannot handle UTF-8.) In short, my opinion is: (c) is the best solution. (b) has no problem, too. (a) should be avoided if possible. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Bug#207455: acknowledged by developer
Hi, From: Colin Watson <[EMAIL PROTECTED]> Subject: Re: Bug#207455: acknowledged by developer Date: Fri, 5 Sep 2003 01:20:12 +0100 > On Fri, Sep 05, 2003 at 08:32:59AM +0900, Tomohiro KUBOTA wrote: > > Note that I think UTF-8 environment will not be popular until several > > basic features (like manpages) will be UTF-8-ready. > > What's wrong with UTF-8 man pages? You can't *write* them in UTF-8, > true, but they should be perfectly readable in such locales. I cannot read Japanese manpage in ja_JP.UTF-8 locale. It is because groff cannot know what encoding the manpage source is written in. For example, Japanese manpage source is written in EUC-JP, while groff try to interpret it as UTF-8 in UTF-8 locales. Previously groff has a special workaround for UTF-8 only for ISO-8859-1 environment. I.e., with "UTF-8 device", groff interprets the input as ISO-8859-1 and outputs as UTF-8. Like this, there are points where ISO-8859-1/15 people may misunderstand that UTF-8 support is more mature than the real situation. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Bug#207455: acknowledged by developer
Hi, From: Frank Lichtenheld <[EMAIL PROTECTED]> Subject: Bug#207455: acknowledged by developer (Re: Bug#207455: packages.debian.org: HTML-encodes multi-byte characters as single bytes) Date: Thu, 4 Sep 2003 18:46:36 +0200 > 1. Take them literaly and specify a charset=ascii for the page > 2. Dito, but charset=iso-8859-1 > 3. Dito, but charset=utf-8 > 4. Use one of the three charsets but make a list of broken descriptions > that have to be converted > > Currently we do (2) but I would prefer to go to (3). As long as policy > doesn't mandate one encoding for the description it's our decision > anyway and I would prefer to give everyone the same chance to break > something ;) Yes, ISO-8859-1 is a *local* character encoding which is useful only for a part of European-language speaking people. Currently, ASCII is the only character range which is common in the world. Though migration into UTF-8 is welcome, please note that U+0020 - U+007E will continue the only common character range for a while. If the policy will mandate usage of UTF-8, then the policy will have to note that the contents must be comprehensible even when being read in ASCII environment, i.e., even when non-ASCII characters are removed. Indeed, in multibyte locales which are popular in east Asia, an 8bit character (for example ISO-8859-1) will break not only the character itself but also the next character. Even though we have to be careful to use UTF-8, it is much better than the current situation that Debian is biased to a part of a world (i.e., ISO-8859-1 usage). Note that I think UTF-8 environment will not be popular until several basic features (like manpages) will be UTF-8-ready. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: Package description page is not compliant to multibyte characters
Hello, From: Michael Bramer <[EMAIL PROTECTED]> Subject: Re: Package description page is not compliant to multibyte characters Date: Fri, 1 Aug 2003 10:25:27 +0200 > > > I heard a report that Japanese translation of package description > > > pages (by Debian Description Translation Project) is broken. > > > For example, > > > > > > http://ddtp.debian.org/packages.debian.org/stable/admin/apmd.ja.html > Now I fixed this on http://ddtp.debian.org/packages.debian.org/ Thank you for your effort, but I am afraid that another (bigger) problem occurs. Japanese pages seem to lack long-description. The Japanese page writes (please try the above apmd.ja.html): Package: (in ) (in ) Other packages related to (in ) I think it is clear even for non-Japanese-speakers that Japanese page lacks long description which should appear between and "Other packages related to". I checked apmd.{de,en,es,it,fr,pt_BR,ru} and found that Russian page also lacks long-description. In short, non-Latin-character languages seem to suffer this problem. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
non-ASCII in webwml/english/devel/website/tc.data
Hi, I found a non-ASCII character (\xc7) in webwml/english/devel/website/tc.data . Since the data are used for all translated pages in various character encodings, usage of non-ASCII character causes broken pages and should be avoided. (Especially, in multibyte encodings, illegal 8bit byte may break neighbor characters.) Can I assume the character \xc7 is from ISO-8859-9 (because it is Turkish), i.e., U+00C7 "LATIN CAPITAL LETTER C WITH CEDILLA" ? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: wml build errors in japanese/ and spanish/
Hi, From: Frank Lichtenheld <[EMAIL PROTECTED]> Subject: Re: wml build errors in japanese/ and spanish/ Date: Thu, 12 Jun 2003 10:34:53 +0200 > While we're at it, I think I identified and solved another build problem: > in japanese/devel/wnpp/wnpp.wml you have to replace all occurences of > en.html in the shebang line with ja.html . Otherwise the translated > pages get not installed. Thank you for information. I fixed this, but the fix caused a much serious problem. For example, http://www.debian.org/devel/wnpp/rfa_bypackage.ja.html is written in EUC-JP encoding but the header says the page is ISO-2022-JP encoding. This causes the page entirely unreadable. (Just a row of random letters) Though I am not familiar with the build process of Debian web pages, I imagine Makefile has some problem and WMEPILOG is ignored. (WMEPILOG for Japanese converts pages from EUC-JP to ISO-2022-JP). I have no idea how to fix this. I'd like to revert my modification if there are no solution. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: wml build errors in japanese/ and spanish/
Hi, From: Frank Lichtenheld <[EMAIL PROTECTED]> Subject: wml build errors in japanese/ and spanish/ Date: Wed, 11 Jun 2003 23:40:25 +0200 > japanese/News/weekly/2003/21 in lines 57,58 something seems > wrong. There is some English text and a unclosed just fix such errors but the Japanese text is so cryptic for me :) Thank you. I fixed the problem. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Package description page is not compliant to multibyte characters
Hi, I heard a report that Japanese translation of package description pages (by Debian Description Translation Project) is broken. For example, http://ddtp.debian.org/packages.debian.org/stable/admin/apmd.ja.html (It might be difficult to understand the page is broken if you cannot read Japanese.) Analysis: This page seems to be generated by a script gluck.debian.org:/org/packages.debian.org/htmlscripts/pages.pl and the first character of the long description of a package is written in larger font: $long_desc =~ /^([^&]|&[^;]+;)/; $first = $1; $rest = substr($long_desc,length($first)); $package_page .= "$first$rest\n"; However, in multibyte encodings such as EUC-JP (Japanese), a character may be consist of multiple bytes. On the other hand the expression [^&] matches one *byte* rather than one *character*. Thus, when the first character of the long description is a multibyte character, $first will be the first byte of the multibyte character, not entire the multibyte character. Solution: Right way is to make the script multibyte-compliant. It may be difficult to support arbitrary encodings. However, it may be easy to support a limited range of multibyte encodings which are possible candidates for Debian web pages (such as "EUC-JP, EUC-KR, GB2312, Big5, Big5HKSCS, and UTF-8"). I heard that there is an another solution like following: <!-- p.description {text-align: justify;} p.description:first-letter {font: 150%;} --> and This is a long package discription. Though this solution is environment-dependent, at least this way never make the content unreadable. However, an another solution is to give up the decorating by using larger font for the first character. I think this might be a good solution because "using larger font for the first character" cannot be truely universal. Imagine Arabic characters. How can the first character of an Arabic word be a larger? Though we don't have Arabic translation yet, we may have in future. Thus, my suggestion is to give up the decoration. However, I will appreciate any other solutions which will stop breaking the contents. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, From: [EMAIL PROTECTED] (Craig Small) Subject: Re: enable searching East Asian words at search.debian.org Date: Wed, 14 May 2003 16:19:29 +1000 > Yes, you are right I am confusing the two issues. I can easily, well > almost easily, make a special search.d.o set of binaries and they can > have little or no bearing on the packages... This must be easy, because you are willing to force all Japanese mnogosearch users to do this and I will agree on it. > It gets us back to the original problem that what is the license for > ipadic? And libchasen is broken. How this should be fixed, Nokubi-san? Does Kakashi or MeCab have an emulating layer (API) for Chasen? Or, any alternatives already available for ipadic? mnogosearch seems to use chasen_sparse_tostr() and chasen_getopt_argv(). > Why is it broken? > It won't work without some files that are not part of the package. > These files are nowhere to be seen and there is no documentation on > how these files are supposed to come about nor what format they are in. I don't understand why you say so strongly. Yes, it is a bug. However, did you document that Debian mnogosearch package is compiled with eliminating east Asian support? This is just as severe as that. Anyway, Nokubi-san is a maintainer of chasen packages and I hope he will fix this soon. > That can all be solved I'm sure, but its no use asking admins to put > libchasen on until this is fixed or a work-around is found. A work-around. "apt-get install libchasen-dev ipadic" instead of "apt-get install libchasen-dev". --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, From: [EMAIL PROTECTED] (Craig Small) Subject: Re: enable searching East Asian words at search.debian.org Date: Wed, 14 May 2003 11:36:28 +1000 > Great, ipadic is 3meg and consumes 12meg. I cannot expect people to > download that with the default packages. > > So I'll release mnogo 3.2.10 with no chasen. It's broken anyway because > it needs some rc files and other things. > > Now for the webiste, I'll get the other charsets going and we'll work > on the JP problem separately. Sorry, I don't understand the meaning or feeling of "Great" here. Can you explain? You are confusing two different aspects: one is providing Debian mnogosearch packages and another is how search.debian.org is constructed. I agree that Japanese people cannot use Debian mnogosearch package but we are forced to recompile it, in order to save megs of disk space from people who don't need Japanese. (Please write an instruction on recompilation at README.Debian). However, search.debian.org is a different topic. Since Japanese is one of several languages for which number of translated pages in http://www.debian.org/ is more than 50%, it is nonsence to exclude these pages from the target of search. I don't understand at all why some of Debian (and other free-software- related) people tend to exclude Japanese and other Asian languages from range of support Even people who are interested in i18n and translation sometimes tend to do! --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, From: [EMAIL PROTECTED] (Denis Barbier) Subject: Re: enable searching East Asian words at search.debian.org Date: Tue, 13 May 2003 14:12:00 +0200 > In this bugreport you tell that lynx-cur is right, but I have similar > results with lynx-cur 2.8.5-10. I tested lynx and lynx-cur and found that both of them are problematic. I tested lynx and lynx-cur on mlterm and xterm in UTF-8 mode and ja_JP.UTF-8 locale. I searched a Russian word for "News". Then, though the search seems to work well, all Cyrillic characters are displayed in Latin alphabet transliteration. I imagine they are not sensible of locale. Please test w3mmee. It should work well. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, From: [EMAIL PROTECTED] Subject: Re: enable searching East Asian words at search.debian.org Date: Tue, 13 May 2003 15:32:49 +0900 > Now I'm trying to make another DFSG-free dictionary for ChaSen. If I > can do it, I'll move ipadic package to non-free and ITP the new one. Any perspectives? > The another solution is to use libkakasi instead libchasen. It is > completely free. At first, I imagine chasen is much better than kakasi because chasen analyzes the grammer of Japanese sentences while kakasi doesn't. Which is better, chasen without dictionary (chasen itself is free) or kakasi? Or, chasen *needs* dictionary (though libchasen0 doesn't Depends: on ipadic)? Second, can we use kakasi for mnogosearch? If we don't have solution, how about writing "please use google for searching CJK words in Debian site" at http://search.debian.org/ and admit that free softwares are not yet something which can substitute proprietary softwares? At last, which solution do you suggest? Should we wait for "free" alternative for ipadic? Or, ipadic should be regarded free? Or, can we use kakasi? Or, should we recognize there are no free implementation for web search which supports languages including Japanese? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, From: [EMAIL PROTECTED] (Craig Small) Subject: Re: enable searching East Asian words at search.debian.org Date: Tue, 13 May 2003 15:24:15 +1000 > I'm working on 3.2.10 that will have the charset support. Do I also > need to include chasen? Without chasen, mnogosearch will not understand > a Japanese "word"? You are right. Also, --with-extra-charsets=all needed (if 3.2.10's default setting eliminates mapping tables for CJK just like 3.2.8). > I'll get something uploaded soon, can you test it for me on a simple > set of pages (use the builtin if you like for no db) to see it does > work for your pages. If so I'll get it compiled on klecker. Yes, I will, of course. You mean, you will compile 3.2.10 and set-up a test search page, then I will test searching CJK words? However, I tested builtin database but it didn't work well. I didn't research further on builtin because builtin won't be used in the real search page. Thus, if you'd like to test builtin, please test that your new compilation works well for English words. Then I will test for various languages including Chinese, Japanese, and Korean. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, From: [EMAIL PROTECTED] (Denis Barbier) Subject: Re: enable searching East Asian words at search.debian.org Date: Mon, 12 May 2003 19:09:54 +0200 > If now I run > $ export LANG=fr_FR.UTF-8 > $ xterm > go to search.debian.org in this window and cut'n'paste this word from > another window, I am redirected to > http://search.debian.org/?q=%C3%83%C2%A9lection&ps=10&o=0&m=all&g=fr > which means that e-acute has been converted twice, and no pages are > found. Am I doing something wrong? There might be two problems. One is whether cut'n'paste works well or not, and another is whether the browser can handle encoding conversion correctly. I tested with galeon and w3mmee. (w3m doesn't support UTF-8.) Also, Intenet Explorer on Windows works well. I'd like to test your operation. Which browser did you use? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, From: [EMAIL PROTECTED] (Denis Barbier) Subject: Re: enable searching East Asian words at search.debian.org Date: Mon, 12 May 2003 23:55:03 +0200 > When done, mnogosearch from unstable has to be recompiled, I can volunteer > to provide a backport if that helps. I think there are no problem on this, because Craig has already compiled version 3.2.7 at his home directory to test search.debian.org/new , though his compilation is "by eliminating east Asian character mapping tables and without chasen support". The version 3.2.7 was the latest version at that time (December 2002). --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, From: [EMAIL PROTECTED] (Denis Barbier) Subject: Re: enable searching East Asian words at search.debian.org Date: Mon, 12 May 2003 13:45:08 +0200 > > For example, I can search an Russian word "Novosti" (of course in > > Cyrillic) > > The point is: how are Cyrillic words passed by the web browser to the > search engine? > Are they encoded in ISO-8859-5, KOI8-R or UTF-8 charsets? UTF-8, i.e., the same encoding as the search page. For example, the previous example: http://search.debian.org/?q=%D0%9D%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8&ps=10&o=0&m=all&g= The first 6 bytes read: %D0%9D -> U+041D (CYRILLIC CAPITAL LETTER EN) %D0%BE -> U+043E (CYRILLIC SMALL LETTER O) %D0%B2 -> U+0432 (CYRILLIC SMALL LETTER VE) --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, From: [EMAIL PROTECTED] (Denis Barbier) Subject: Re: enable searching East Asian words at search.debian.org Date: Mon, 12 May 2003 09:54:58 +0200 > My understanding of Josip mail is that when investigating your > instructions about mnogosearch, he wondered how input text has > to be encoded when filling search form. This is a good question, > search page should tell which encoding to use when searching for > non-English words. Yes, I know. The solution is to write the search page in UTF-8, which has been available since last December when Craig and I discussed about this problem. For example, I can search an Russian word "Novosti" (of course in Cyrillic) (which means "News") at http://search.debian.org/ English page like: http://search.debian.org/?q=%D0%9D%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8&ps=10&o=0&m=all&g= and the page shows 112 results. Also, I can input Japanese words. However, there will be no results for Japanese words because of problems I wrote. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: enable searching East Asian words at search.debian.org Date: Sun, 11 May 2003 19:44:17 +0200 > On Sun, May 11, 2003 at 11:09:44PM +0900, Tomohiro KUBOTA wrote: > > > c=`grep CHARSET ../.wmlrc | cut -d= -f2`; \ > > > iconv -f $c -t UTF-8 search.ja.html | perl -pe 's,^(\s* > > http-equiv="Content-Type" content="text/html; > > > charset=)\S+(">)$,$1UTF-8$2,' > search.ja.html > > > iconv: cannot open input file `euc-jp': No such file or directory > > > > Sorry I don't understand what you are doing. However, my "improvement" > > is not related to search.ja.html (or translation of search page) at all. > > Well, it's related if you want people to be able to actually input stuff > properly into the search engine. :) OK, I remembered. The search web page must be UTF-8. The current (English) version of the search page is already UTF-8 and have no problem for international search, I think. However, if you would like to supply translated search pages (though I think it is not an urgent problem), I just read the webwml/english/searchtmpl/Makefile and found that `grep CHARSET ../.wmlrc` might have a problem. webwml/japanese/.wmlrc have two lines which matches 'grep CHARSET', which are '-D CHARSET=iso-2022-jp' and '-D CHARSET_WML=euc-jp'. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: enable searching East Asian words at search.debian.org Date: Sun, 11 May 2003 14:33:38 +0200 > make: Entering directory `/org/www.debian.org/webwml/japanese/searchtmpl' > wml -q -D CUR_YEAR=2003 -o UNDEFuJA:[EMAIL PROTECTED] --prolog="/usr/bin/kcc > -e -" --epilog="../convert search.ja.html" search.wml > c=`grep CHARSET ../.wmlrc | cut -d= -f2`; \ > iconv -f $c -t UTF-8 search.ja.html | perl -pe 's,^(\s* http-equiv="Content-Type" content="text/html; charset=)\S+(">)$,$1UTF-8$2,' > > search.ja.html > iconv: cannot open input file `euc-jp': No such file or directory > copying search.ja.html to ../../../www/searchtmpl > make: Leaving directory `/org/www.debian.org/webwml/japanese/searchtmpl' Sorry I don't understand what you are doing. However, my "improvement" is not related to search.ja.html (or translation of search page) at all. My intension is to enable searching, for example, "Bunsho" (in Kanji), which means "documentation" in Japanese, at the search page. It should be enabled, because there are many Japanese-translated pages at Debian site and these pages should be targets of searching. Not translation of the search page. (I guess you are trying to prepare Japanese translation of search page? I will research this point later. However, please note, for Japanese people, that a search page in English which can search Japanese words is absolutely better than a search page in Japanese which cannot search Japanese words.) The problems are: (1) Though mnogosearch is based on UTF-8 (and should be able to process all languages for translation of Debian web pages), the support of CJK languages are disabled. (Please read the ./configure --help output or installation instruction of mnogosearch). The option is just to drop character code mapping tables between CJK encodings and UTF-8. This is why recompilation of mnogosearch is needed. (2) Japanese and Chinese don't use whitespaces between "words", which causes indexing (i.e., reading all web pages and store all "words" into databaase for searching) doesn't work well. chasen-related packages are needed to fix this. (I hope you read my mails which I wrote that chasen is needed -- please just go back this thread.) --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: enable searching East Asian words at search.debian.org
Hi, No reply for more than one week. Someone please reply. There are Chinese, Japanese, and Korean translation of www.debian.org but search.debian.org cannot search words in these languages. Please do the following: 1. Install libchasen-dev, libchasen0, and ipadic packages to klecker. 2. Add me ([EMAIL PROTECTED]) as a user of postgresql database at klecker. 2. Create a postgresql database for which I have write permission at klecker. Then I can prove the improvement (or bugfix, I regard) in the last mail which I cite the whole contents of. From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Subject: enable searching East Asian words at search.debian.org Date: Sat, 26 Apr 2003 09:45:48 +0900 (JST) > Hi, > > So far search.debian.org doesn't support East Asian languages > (Chinese, Japanese, and Korean). I.e., it cannot search Chinese, > Japanese, nor Korean words. > > I have recently researched this problem and I think I found > how to fix it. I tested at my personal machine without 24hr > internet connection and it works almost fine. > > 1. install libchasen-dev, libchasen0, and ipadic packages. > 2. recompile mnogosearch (version 3.2.8 or later) with > --enable-chasen --with-extra-charsets=all option for ./configure . > 3. invoke "indexer -C" and then "indexer" to rebuild the search database. > > Could someone do this? Or, can I have a database (postgresql) access > (write access) permission at klecker to prove this? > > > Explanation: > > Chasen packages are needed to extract words from Japanese texts. > Japanese texts don't use whitespaces between words. --enable-chasen > (since version 3.2.8) option for mnogosearch enables usage of chasen > from mnogosearch. > > Though mnogosearch is Unicode-based software and potentially supports > East Asian languages, support of these languages is disabled by default. > To enable this, --with-extra-charsets=all is needed. > > Since the current search database in search.debian.org doesn't have > any east Asian words, it is needed to rebuild the whole database. > (Of course it is enough to rebuild database only for *.{ja,ko,zh-cn, > zh-hk,zh-tw}.html pages but I don't know if it is possible to this.) --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
enable searching East Asian words at search.debian.org
Hi, So far search.debian.org doesn't support East Asian languages (Chinese, Japanese, and Korean). I.e., it cannot search Chinese, Japanese, nor Korean words. I have recently researched this problem and I think I found how to fix it. I tested at my personal machine without 24hr internet connection and it works almost fine. 1. install libchasen-dev, libchasen0, and ipadic packages. 2. recompile mnogosearch (version 3.2.8 or later) with --enable-chasen --with-extra-charsets=all option for ./configure . 3. invoke "indexer -C" and then "indexer" to rebuild the search database. Could someone do this? Or, can I have a database (postgresql) access (write access) permission at klecker to prove this? Explanation: Chasen packages are needed to extract words from Japanese texts. Japanese texts don't use whitespaces between words. --enable-chasen (since version 3.2.8) option for mnogosearch enables usage of chasen from mnogosearch. Though mnogosearch is Unicode-based software and potentially supports East Asian languages, support of these languages is disabled by default. To enable this, --with-extra-charsets=all is needed. Since the current search database in search.debian.org doesn't have any east Asian words, it is needed to rebuild the whole database. (Of course it is enough to rebuild database only for *.{ja,ko,zh-cn, zh-hk,zh-tw}.html pages but I don't know if it is possible to this.) --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: What means english?
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: What means english? Date: Sat, 19 Apr 2003 02:59:35 +0200 > > Though I don't think it is a good idea to mix American and British > > English, I don't think we *must* avoid to do that. The requirement > > should be minimal. > > Sorry, what did you mean by this? It's not overly clear :) I wanted to say, we don't need to inhibit mixing American and British English. My understanding is that: the term > Only a small question: On devel/website/working exists the > following section "Use clear and simple English". means that we should use clear and simple English so that non-English speakers can understand pages, or we should avoid ambiguous or misleading expressions. How about changing the expression of the term, like following: "Use clear and simple expressions to avoid ambiguity or misunderstanding." ? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
discriminatory expression in DWN?
Hi, I am one of Japanese translators of Debian web pages. When I was checking translation of webwml/english/News/weekly/2003/14/index.wml , I found the following sentence: > Debian Universe". He admits that the current Debian installer > is ugly but also notes that some people believe that a not so easy installer ~~~ > will keep horde of unwashed masses away from Debian who aren't worthy of such ~ > a fine OS! In the article Jonathan describes in detail how the installer ~~ I think the sentence (underlined part) means that there are people who are so foolish that they should not use Debian, and a difficult installer is a good thing because it can prevent such foolish people to use Debian. Is this interpretation correct? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: What means english?
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: What means english? Date: Fri, 18 Apr 2003 17:47:48 +0200 > No, I don't think we should take sides in that. We haven't had problems with > the current mix we use, and we should keep it that way. Though I don't think it is a good idea to mix American and British English, I don't think we *must* avoid to do that. The requirement should be minimal. On the other hand, it should be a minimal requirement, for example, not to use expressions, proverbs, and idioms for which people (especially non-English-speaking people) may feel difficulty to read or even consult dictionaries. I.e., "every sane people can understand the meaning of the page" be a minimal requirement. I imagine this requirement sometimes conflicts with usage of literary or wit expressions. As one of Japanese translators of Debian webpages, I often feel such difficulty. One example: http://lists.debian.org/debian-www/2003/debian-www-200301/msg00256.html --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
The maintainer still isn't listening?
Hi, I am checking Japanese translation of Debian Weekly News in order to cvs commit it. However, Japanese translation team found a difficulty translating an English sentence. webwml/english/News/weekly/2003/06/index.wml says in the Qt3 paragraph: Several issues haven't been dealt with and the maintainer still isn't listening. None of members of Japanese team don't understand what the "listening" means. Is it a metaphor or an idiom? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: lists.debian.org de-localization
Hi, (Remember, the topic is that http://lists.debian.org pages sometimes use 8bit characters which may break all contents after the character when east Asian users browse the pages.) From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Sun, 12 Jan 2003 04:14:45 +0100 > On Sun, Jan 12, 2003 at 10:38:52AM +0900, Tomohiro KUBOTA wrote: > > However, I don't think this can be a solution now because it will take a > > very long time that the version will be stable, then the stable version > > will be adopted into unstable/testing version of Debian distribution, then > > the distribution will become stable (released), and then the stable > > distribution will be adopted to master.debian.org . > > Actually, we use a non-.deb mhonarc on lists.d.o so this isn't a problem > per se. A new version of MHonArc (2.6.0) was released recently which I think can solve all encoding-related problem by converting everything into UTF-8. > This, on the other hand, is a hassle to handle (backporting or installation > into subdirs). master.d.o is scheduled to be upgraded to woody after samosa. > That's all I know. Any new information? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: Translation of Debian Hompage to Arabic
Hi, From: Ayman Negm <[EMAIL PROTECTED]> Subject: Re: Translation of Debian Hompage to Arabic Date: Thu, 6 Feb 2003 22:26:43 +0100 > I did as it descriped in the README but nothing changed :-( I am afraid that the Pics script (and Gimp 1.2) doesn't support multibyte encodings such as UTF-8. This is why east Asian Pics are made by hand. (Since east Asian languages need several thousands of characters, they must be multibyte.) Well, I am a Japanese speaker and I made the Japanese Pics by using Gimp-1.3 with a semi-automatic Script-fu script. Can you try ISO-8859-6, a singlebyte Arab encoding? However, I don't think it supports word-top, word-bottom, word-intermediate, and independent forms of glyphs. Otherwise I think you have to make Pics manually. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: "family name, personal name" in devel/people
Hi, From: Osamu Aoki <[EMAIL PROTECTED]> Subject: Re: "family name, personal name" in devel/people Date: Fri, 31 Jan 2003 13:16:23 -0800 > If Oohara Yuuma said "In Japan a family name comes before a personal > name in _Japanese_ _language_ context. I want to retain the same order > in English context too.", it was not stretching the fact. I am all for > respecting his defiance wish to the Japanese translation convention. Is "Given Family" order related to English (or Western) language? I don't know. However, I can say that the order is related to English (or Western) people. I.e., Western people have names with "Given Family" order. If they visit some other country, they don't change their name. Vice versa. > Please remember that it is a well established convention for Japanese > name to be flipped in English or French context upon translation. That > was my point. I guess if translated into Hungarian or Chinese, we > should use surname first as the translation convention. I know that convention. However, the convention is related to Japanese people's custom, not western languages. It is proved by your example of Hungarian or Chinese. Thus, it is Japanese people (not English speakers) who have right to determine how to write Japanese people's name in English. > Boy, it is a hot topic. In my daily activities, defending my real first > name against anglocized name is a real challenge in a hostile > environment. I already gave up on my real fist name at the restaurant > or bar. :-) It is because the restaurant and the bar are localted at a specific country. It is natural that the restaurant follows the convention in the country. I cannot imagine a restaurant which is located at all countries or "international restaurant". However, our project is international. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: future of developers list
Hi, From: Andrew Shugg <[EMAIL PROTECTED]> Subject: Re: future of developers list Date: Fri, 31 Jan 2003 21:10:38 +0800 > - one field for the name to be displayed in the Western convention > - one field for the name to be displayed in the local convention (and >optionally a different character set How about the current way ("Family, Given") instead of "Western convention"? I think it is a good compromise, because the comma implies that the whole expression itself ("Family, Given") is not the complete native name, and can be free from the name order flamewar. IMO, the reason why we need "Western" field is that we cannot read all characters in the world while all of us are expected to be able to read ASCII alphabets. If my name would be shown only in native Kanji, many people in the world could not even write my name. Also, I cannot read nor write Arabic/Hebrew/Thai/Armenian characters. The second field (local convention) can also include Western, like other local conventions, of course. However, if the "different character set" is not achieved, I don't think this is worthwhile to be implemented using some labor. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: "family name, personal name" in devel/people
Hi, From: Osamu Aoki <[EMAIL PROTECTED]> Subject: Re: "family name, personal name" in devel/people Date: Fri, 31 Jan 2003 00:45:16 -0800 > > My family name is Oohara and my personal name is Yuuma. I am _not_ > > Yuuma Oohara -- in Japan a family name comes before a personal name. > > I know some minority of Japanese people wish to be called the same order > in English/French context as ithey do in Japanese context. That is fine > as preference but you are stretching too far and twisting the fact. Though I don't adopt this method myself, I understand what they insist. For example, when I search by using Google the current Korean president: "Dae Jung Kim" 2130 hits "Kim Dae Jung" 98000 hits You know, "Kim" is family name and "Dae Jung" is given name and the native order of Korean is "Family Given". This means that Korean people preserve their name order even in alphabet transliteration. Then, why not Japanese? I don't want to think that you think "Western culture is international and eastern culture is local, so eastern people must act in western way in international projects such as Debian". I know this result shows only one aspect of this problem. This is just an example of what they say. My own opinion is that I don't want to discuss the real contents of this problem. I just want to ask respecting way of thinking of each developer. > All you had to say was "I prefer to be called Oohara Yuuma" > > I am sorry but I have to remind you that you are very likely to be > called officially "Yuuma Oohara" in the letters your government issues > in English or French. That may be where you have to fight :-) I do not > understand why you are so picky on this issue here ? England and France are only two countries in the world. They don't have rights to determine international way to do something. If we were developing English or French localized distribution, you would be right. > > The NM application form insisted on the personal-name-family-name > > order, so it may be the cause of the bug. > > Form is form. Just follow the instruction. If this is a bug, or at least may cause a confusion, let us think about improvement. For example, adding a note "Please input your given name and family name in the form regardless of your native order of name." If we don't have to know which part is given name and which is family name, it is a good idea to have only one form to input the whole name. In this case, the order is completely free. > > You will be surprised to find many US official forms use surname first, > given name second, and middle name initial last format. As long as each > entry is clearly marked, I see no issues. > > I see no major threat of cultural imperialism here either. So relax. > Japan exports enough pop-culture trashes these days. Video games, > anime, to name a few ... The name order will not trash Japanese > culture. > > > I don't see any point in splitting a developer's name into the family > > name and the personal name, but it is another issue. > > I guess many others see differently. > > -- > ~\^o^/~~~ ~\^.^/~~~ ~\^*^/~~~ ~\^_^/~~~ ~\^+^/~~~ ~\^:^/~~~ ~\^v^/~~~ + > Osamu Aoki <[EMAIL PROTECTED]> Cupertino CA USA, GPG-key: A8061F32 > .''`. Debian Reference: post-installation user's guide for non-developers > : :' : http://qref.sf.net and http://people.debian.org/~osamu > `. `' "Our Priorities are Our Users and Free Software" --- Social Contract > > > -- > To UNSUBSCRIBE, email to [EMAIL PROTECTED] > with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED] >
Is this character belongs to ISO-8859-2?
Hi, I found webwml/english/devel/tc.data has a 8bit character, because I found a broken character in Japanese translation page. Thus I modified the character in tc.data into a SGML entity. In this case, I assumed the 8bit character is ISO-8859-2 and the character is lowercase "c" with acute accent, while the character is "ae" in ISO-8859-1. I have two reasons: - the CVS commit log says "Zoran Obradovic" and the problematic character corresponds to the last "c". - I imagine Slovene use ISO-8859-2. Am I right? I cannot tell which is a natural human name, "Obradovi(c')" or "Obradovi(ae)". Zocky, please check http://www.debian.org/devel/website/translation_coordinators.html and translated pages for the page. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: future of developers list
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: future of developers list Date: Thu, 30 Jan 2003 16:15:37 +0100 > Another added benefit would be that we could then extend our LDAP database > schemas to include another field for people's names in native character set, > which would nicely fix the problem with Japanese names -- the non-Japanese > web pages would print the names of these developers in the western way, and > the Japanese web pages would print them in the Japanese way. I think another field for native character is a good idea. However, however, why not UTF-8 rather than native character set? (You mean, native expression and you didn't want to talk about character encoding?) I don't understand why "another field for native character (set)" will fix the name order problem. Though all Japanese people would agree on "surname givenname" order in native character, they have argument on the order in alphabet transliteration expression. I think the current devel/people has the best format -- "surname, givenname" for alphabet transliteration expression is what all Japanese people agree on. I hope other peoples will also agree on the format. If we would have another field for native character, I think it is nice for devel/people to have both of alphabet expression and native expression like devel/website/translation_coordinators . I think all translated pages (devel/people.*.html) should have the same expression. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: "family name, personal name" in devel/people
Hi, From: Michael Stone <[EMAIL PROTECTED]> Subject: Re: "family name, personal name" in devel/people Date: Thu, 30 Jan 2003 07:32:49 -0500 > Well, this is simply an example of how you can't please all of the > people all of the time. You brought these examples up because you didn't > like how those developer's names appeared. You don't want them to > conform to the way the majority of developers do things (fair enough), > you don't want them to conform to the (admitedly ugly) uppercase > convention (fair enough), but you still want the parser to somehow > figure out the names (not fair) and complain about some sort of cultural > imperialism while you're at it (unjustified, and not the first time > you've done it.) The parser already has many exception handlers. I just wanted it to have two additional handlers. Since I recently read the parser to solve other problem, I knew it was easy to add two handlers. I don't think we can build a perfect parser. If you think you can, it is based on the "cultural imperialism". However, it is a good idea to design a parser which can reduce the number of exceptions because it reduces the load of maintainance, which is why the Matt's improvement is worthwhile. Anyway, I'd like to praise Josip who maintains such a script with a cluster of exception handers. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: "family name, personal name" in devel/people
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: "family name, personal name" in devel/people Date: Wed, 29 Jan 2003 10:33:58 +0100 > % grep-available -F Maintainer Shuzo -s Maintainer | sort -u > Maintainer: HATTA Shuzo <[EMAIL PROTECTED]> > Maintainer: Hatta Shuzo <[EMAIL PROTECTED]> > > % grep-available -F Maintainer Yuuma -s Maintainer | sort -u > Maintainer: Oohara Yuuma <[EMAIL PROTECTED]> > > Perhaps one of you could politely inform these two developers that they > might get the westerners to read their name right if they changed the > ordering? :) Well, I don't want to do this. I want nobody to do this. It is not a very good idea that non-westerners have to follow the customs of westerners but westerners don't need to follow that of non-westerners. Non-westerners already suffer from paying cost to learn many customs of westerners when we want to do something in international societies, and I want to reduce the load if possible. I think I can ask them to write family name in uppercase, it is the maximum which I can ask them. I don't know they will accept even this idea. Please note that this *is* what I recently mentioned as a "10-year flamewar" and I *never* want to join it, and even asking writing familyname in uppercase might arouse the flamewar. (If I would ask to change name order, I would certainly stimulate the core part of flamewar and Japanese members of Debian might drop their activity as developers.) --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: "family name, personal name" in devel/people
Hi, From: Osamu Aoki <[EMAIL PROTECTED]> Subject: Re: "family name, personal name" in devel/people Date: Tue, 28 Jan 2003 19:48:35 -0800 > > > "Shuzo, Hatta", where "Hatta" is surname and "Shuzo" is given name. > > > "Yuuma, Oohara", where "Oohara" is surname and "Yuuma" is given name. > > I am wondering why these strange entry exist. Are they mistake of > original data entry? I never see Japanese name spelled that way. It is because the web page is automatically generated by a script from developers database on db.debian.org . The script assumes the first part of name is given name and the last part is family name. Since there are many names which don't follow the assumption, the script has many exception handlers. Josip added two exception handlers and the page will be fixed in the next build. You know, some Japanese people write names in their native order, "Family Given", and such expressions exist in db.debian.org database. ... but I checked the script (klecker:/org/www.debian.org/cron/people_scripts/people.pl) and I couldn't find additional handlers. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
"family name, personal name" in devel/people
Hi, I imagine names in http://www.debian.org/devel/people have the unified format of "Surname, Given name". I found two exceptions: "Shuzo, Hatta", where "Hatta" is surname and "Shuzo" is given name. "Yuuma, Oohara", where "Oohara" is surname and "Yuuma" is given name. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
August 01-04 1999 is future?
Hi, I found that Linux World Conference and Expo in August 01-01, 1999 is listed as a "future event" in http://www.debian.org/events/1999/index.ja.html . This problem is only for Japanese page. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
events pages not updated
Hi, I found Japanese translation of events pages per year (for example, http://www.debian.org/events/2002/index.ja.html) aren't updated these several days, even though I updated gettext items for "Past events" and "Future events" and I also updated several event items to use SGML entities instead of ISO-8859-1. The pages also sometimes fail to use localized date formats in po/ctime.ja.po . --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: translation of searchtmpl/search.wml
Hi, From: [EMAIL PROTECTED] (Craig Small) Subject: Re: translation of searchtmpl/search.wml Date: Tue, 21 Jan 2003 09:13:57 +1100 > It was before you replied to that mail (or maybe another?). Essentially > I was incorrect. mnogosearch *stores* its indexing in UTF-8 but needs > all those flags to do the indexing. > > Have you tried a little bit of indexing of Japanese pages to see if it > does seem to behave itself? Not yet. However, I believe that --with-extra-charsets is a necessary condition, though I am not sure that it is a necessary and sufficient condition (but I expect so). Please read: http://www.mnogosearch.org/board/message.php?id=6350 Note that Japanese (and Chinese) has "The Problem 2" (no spaces between words) and need the newer version of mnoGoSearch with ChaSen, as you wrote. However, I expect Korean will be fully fixed by --with-extra-charsets. I also expect that Japanese and Chinese words which occasionally appear independently (i.e., separated by spaces or HTML tags) will be able to be searched. I thought about testing it but I don't have enough time to study database, because I am entirely new on database. (Also, I could not test another etc/index..htm problem because search.cgi in klecker:~/public_html didn't work well and I don't know why. It may be because of apache's configuration.) --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: translation of searchtmpl/search.wml
Hi, From: [EMAIL PROTECTED] (Denis Barbier) Subject: Re: translation of searchtmpl/search.wml Date: Sun, 19 Jan 2003 21:46:34 +0100 > At first glance it sounds very good, but I am not sure this is the way > to go, because some strings are not handled by gettext, e.g. see > Catalan strings in webwml/english/template/debian/ctime.wml > There are also several Perl variables in date.pot, which will be > displayed according to current locale, and not UTF-8. > We could certainly play with CUR_LOCALE, but a simpler solution is > to post-process HTML files with iconv and change their charset > field in tags, see attached patch. I think your idea is better than mine. I also checked your patch works well, i.e., builds translated pages in UTF-8. BTW, do you have any idea why Craig thinks like the following mail? Maybe I am missing something http://lists.debian.org/debian-www/2003/debian-www-200301/msg00271.html --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
translation of searchtmpl/search.wml
Hi, As already we know, web pages of searchtmpl/search.html must be encoded in UTF-8 and it is already done. http://lists.debian.org/debian-www/2002/debian-www-200212/msg00260.html However, though we can write translated (or original) searchtmpl/search.wml in UTF-8, gettext items are converted into legacy encodings and it breaks the pages. I think I found a fix for this problem. Though this doesn't immediately enable translated search page, I think this surely fixes one of the problems. I found that target encoding of gettext is defined in webwml/english/template/debian/common_tags.wml . Thus, the best way is to redefine CHARSET_WML and CHARSET variables before it. (These variables are defined in webwml//.wmlrc files.) The patch attached to this mail includes this modification. It is also needed that translated search.wml files to be written in UTF-8. The following patch includes a note on this point. (The patch for Makefile is to disable conversion from EUC-JP to ISO-2022-JP for Japanese.) May I commit these modifications? PS. The content negotiation http://search.debian.org/new/index.en.cgi http://search.debian.org/new/index.fr.cgi seems not work well. Though I am afraid I am wrong, how about renaming /org/search.debian.org/etc/search..htm into renaming /org/search.debian.org/etc/index..htm ? The source code of mnoGoSearch seems to substitute ".cgi" with ".htm" to search the configuration file (src/search.c). Index: english/searchtmpl/search.data === RCS file: /cvs/webwml/webwml/english/searchtmpl/search.data,v retrieving revision 1.30 diff -u -r1.30 search.data --- english/searchtmpl/search.data 30 Dec 2002 03:26:24 - 1.30 +++ english/searchtmpl/search.data 19 Jan 2003 14:02:03 - @@ -1,3 +1,4 @@ +$(CHARSET=UTF-8) $(CHARSET_WML=UTF-8) #include "../../english/searchtmpl/search.def" -$(CHARSET=UTF-8) $(CHARSET_WML=UTF-8) #use wml::debian::common_translation HOME="http://www.debian.org"; $(title=) #use wml::debian::languages Index: english/searchtmpl/Makefile === RCS file: /cvs/webwml/webwml/english/searchtmpl/Makefile,v retrieving revision 1.5 diff -u -r1.5 Makefile --- english/searchtmpl/Makefile 2 Nov 2002 23:36:01 - 1.5 +++ english/searchtmpl/Makefile 19 Jan 2003 14:05:58 - @@ -10,6 +10,11 @@ include $(WMLBASE)/Make.lang +ifeq "$(LANGUAGE)" "ja" + WMLOUTFILE = $(@F) + WMLPROLOG = + WMLEPILOG = +endif search.$(LANGUAGE).html: search.wml $(ENGLISHSRCDIR)/searchtmpl/search.data \ $(ENGLISHSRCDIR)/searchtmpl/search.def $(TEMPLDIR)/common_translation.wml \ Index: english/searchtmpl/search.wml === RCS file: /cvs/webwml/webwml/english/searchtmpl/search.wml,v retrieving revision 1.8 diff -u -r1.8 search.wml --- english/searchtmpl/search.wml 20 Dec 2002 00:23:53 - 1.8 +++ english/searchtmpl/search.wml 19 Jan 2003 14:11:37 - @@ -10,6 +10,8 @@ # Of course edit the blurb below (in your language directory) to suit. # Email debian-www@lists.debian.org if you want to understand how this # horror works. +# +# Translated files of this file must be encoded in UTF-8. #include "$(ENGLISHDIR)/searchtmpl/search.data"
Re: UPPERCASE surname?
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: UPPERCASE surname? Date: Sat, 18 Jan 2003 13:39:04 +0100 > And I don't believe they exclude web pages that write e.g. "TOKUGAWA > Ieyasu", as Google ignores case when searching (obviously because many > people get case wrong, in general). > > I know there can be confusion with this but if we keep the uppercase, we > lose the consistency and at the same time don't provide any benefit to > people who already understand what uppercase means. In this case, "tokugawa" (Tokugawa, TOKUGAWA) is surname. Therefore, this is consistent with my usage. And, the intension of the statistics is just the order of surname-givenname or givenname-surname, *not* UPPERCASE or Lowercase. Do you still try to force me not to use UPPERCASE surname, though I said I don't care how other people write my name? I don't like flamewar. I just want you not to join Japanese people's flamewar which continues about 10 years. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: UPPERCASE surname?
From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: UPPERCASE surname? Date: Sat, 18 Jan 2003 01:32:58 +0100 > I think if the document is in English, then English style is just fine. > The uppercase stuff may be appropriate in e.g. French, but I don't believe > there's any rule in English that requires it. > > Also, writing whole words uppercase is a sign of yelling in electronic > communication channels which a lot of us are accustomed to. I see, but I will continue to write my name as "Tomohiro KUBOTA" in "From:" mail headers and so on. I won't force others to follow my way. > By default, the first part is the name, and the second part is the surname. > There would only be confusion if someone wrote "Kubota Tomohiro", and I > don't see why anyone would do that. Though we rarely need to know which part is given name and which part is surname, it is not safe to assume there is a "default". Though I everytime write my given name first and surname next in Alphabet transcription, I don't know about other Japanese or other peoples. For example, the followings are a statistics of famous Japanese people, though the results don't exclude Japanese web pages: --- surname givenname Google hits in Google hits in "s-g" order "g-s" order --- Natsume Soseki 34101700 Kawabata Yasunari 54105370 MatsuiHideki 5946030 Tokugawa Ieyasu 69002010 Oda Nobunaga 56601650 Ito Hirobumi 1880 633 Ozawa Seiji 3150 25100 (Mao Tse-Tung 827001260) * Chinese ------- I think you cannot "assume" givenname-surname order is "default". --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
UPPERCASE surname?
Hi, I found that webwml/english/News/weekly/2003/02/index.wml was modified to de-capitalize a surname of a person (well, me). http://cvs.debian.org/webwml/english/News/weekly/2003/02/index.wml.diff?r1=1.10&r2=1.11&cvsroot=webwml I don't understand why uppercase surname is not accepted. Uppercase surname is widely used in academic world and it is useful to show which part is the surname. Especially, because of some confusion of Japanese (and other east Asian?) way of writing their name in Alphabets transcription (which Osamu Aoki mentioned recently), it is useful to capitalize surname (though it is not used by everyone). Though I am not enthusiastic enough to insist to modify the wml file again, my concern is how my name (and other capitalized names) should be handled when these names are newly written somewhere. My opinion is to respect their own way to write or the expression in the news source. PLEASE DON'T REQUIRE JAPANESE PEOPLE TO UNIFY THE WAY TO WRITE NAMES. Such an argument continues more than 10 years in Japan. Thus, the only realistic solution is never to meddle. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: Japanese name use single space between the last name and the first name?
Hi, From: Gerfried Fuchs <[EMAIL PROTECTED]> Subject: Re: Japanese name use single space between the last name and the first name? Date: Thu, 16 Jan 2003 10:57:27 +0100 > > I do not care which way to write, IMHO. But it has to be consistent. > > I was longing for consistency when I changed it in the DWN pages, too. > > So, do I need to revert the changes, or was it right? I think you can leave them. I will change Japanese names to use spaces to achieve consistency (however I regard this as rather low priority work). Any suggestions? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: difference between "task" and "job"?
Hi, From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Subject: difference between "task" and "job"? Date: Wed, 15 Jan 2003 16:14:18 +0900 (JST) > I am now revieweing a Japanese translation of Debian web page, > devel/join/nm-step4. > > The page has words like "task" and "job". In Japanese language, > there are no distinct words for these two words. Thus, I don't > understand the difference between them. We solved this translation with help of a Japanese AM (Applicant Manager). --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: Japanese name use single space between the last name and the first name?
Hi, From: Osamu Aoki <[EMAIL PROTECTED]> Subject: Re: Japanese name use single space between the last name and the first name? Date: Wed, 15 Jan 2003 16:15:01 -0800 > I may be the one to be blamed to cause this. > > When I saw my name in document maintainer section in English with first > and my last name in one piece, I felt strange. I posted here and since > no one replied, I fixed that page. Ok. I posted the mail because the comment with cvs commit may mislead non-Japanese people who don't know Japanese custom. (The comment can read that Japanese name uses spaces everywhere in every contexts; which is not true. I'd like Chinese translation of DWN not to use spaces.) I think a space may be used for Japanese name in English sentences. Also, I added spaces even for Japanese translations if the name is written independently in "()". However, I didn't add spaces when the names appear in ordinary sentences because such a expression is apparently strange. > Very valid question. Most name references in the modern newspapers are > spaceless and all in Japanese characters (I just checked.) I have been > spelling my names with separated format since most of the document I sign > are government/bank document where they have box for each section. Also > my name tag in my elementary school days tends to separate them. Right. Your explanation is consistent with mine, and I expect non-Japanese members of this list will trust us. The keypoint seems whether the name appears independently (i.e., book author, sign on government/bank documents, name tags, and so on) or in ordinary sentences. > At any rate, mixed character group document is unconventional and I do > not know what is right. My intent of adding space in the English was to > clarify splits between first and last name. I understand your intent. However, I am afraid that many people will misunderstand that "Osamu" is "青木" and "Aoki" is "修", while the truth is opposite. > I do not care which way to write, IMHO. But it has to be consistent. Ok. Since it is I who modify English version of DWN for Japanese names (by a semi-automatic small Perl script), I can change the policy and the script hereafter. > Also, getting opinion of Chinese person's preference in English context > may be interesting. I see most chinese names in Japanese web pages do > not use a space between last and first name in Japanese. I am also interested. Also, I can add items for my script for Chinese, Korean, Russian, Greek, Thai, and any other non-Latin-alphabet people. Suggestions are welcome. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Japanese name use single space between the last name and the first name?
Hi, I found the following updates: > NeedToUpdate News/weekly/2002/19/index 1.8 1.9 > NeedToUpdate News/weekly/2002/24/index 1.7 1.8 > NeedToUpdate News/weekly/2002/26/index 1.101.11 > NeedToUpdate News/weekly/2002/40/index 1.8 1.9 > NeedToUpdate News/weekly/2002/44/index 1.9 1.10 which is commented: "Japanese name use single space between the last name and the first name". Either (single space or no space) will be OK in Japanese, though I don't know typesetting rule when Japanese names in Kanji appear in *English* sentences. IMO, I feel single space can be used when I write my name independently to fill out some forms. However, I feel single space is funny when I write my name in Japanese sentence, because Japanese sentence uses few (or no) white spaces. (Look the Japanese translations of DWN, they don't use white spaces.) Thus, though there are more Japanese names in DWN which don't use white space between family name and individual name, I thing they can be left as such. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: search.debian.org is online
From: [EMAIL PROTECTED] (Craig Small) Subject: Re: search.debian.org is online Date: Thu, 16 Jan 2003 09:35:00 +1100 > On Wed, Jan 15, 2003 at 03:32:43PM +0900, Tomohiro KUBOTA wrote: > > > > I'd like the mnoGoSearch of search.debian.org to be recompiled > > with extra-charsets enabled, because it (I expect) immediately > > benefits Korean. (Note that Korean doesn't have the problem 2). > > Since it doesn't need the newer version of mnoGoSearch with ChaSen > > support (CVS version 3.2.8, to solve problem 2), it can be done now! > > Except we're using UTF-8, so it shouldn't matter, I think. mnoGoSearch uses Unicode internally for their indexing and searching in the current configuration, as you wrote. Thus, it needs to convert HTML files into Unicode before processing them and it needs converters. The default compilation of mnoGoSearch omits converters to Unicode from east Asian encodings (ISO-2022-JP, EUC-KR, Big5, GB2312), and this is why it cannot index nor search east Asian pages. Compilation with the ./configure option will enable this. Though Japanese and Chinese have further problem (problem 2), Korean should be solved by this. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
difference between "task" and "job"?
Hi, I am now revieweing a Japanese translation of Debian web page, devel/join/nm-step4. The page has words like "task" and "job". In Japanese language, there are no distinct words for these two words. Thus, I don't understand the difference between them. Are these words used just by chance? (In English class, Japanese people learn that English speakers like to use different words to mention one object when the object is mentioned multiple times.) Or, does the difference between them have some meaning? I guess "task" means four fields (or classification) of "jobs": - Package Management - Documentation - Debugging and Testing - Infrastructure However, I don't understand what "Alternative demonstration tasks" means. Does it mean jobs which cannot be classified into the four fields and demonstration jobs to prove that the applicant can do it? Or, just some jobs which _can be_ classified into the four fields but the applicant wants to prove his/her skill in different way? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: search.debian.org is online
Hi, From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Subject: Re: search.debian.org is online Date: Sun, 12 Jan 2003 16:51:31 +0900 (JST) > > 1. handling of two-byte characters > > 2. extraction of words from sentences without whitespaces > > I think I found the reason of the problem 1. Though mnogosearch > supports multibyte languages, it doesn't support them by default. > To support them, recompilation is needed. > > > mnogosearch-3.2.7$ ./configure --help > . > --with-extra-charsets=CHARSET[,CHARSET,...] > Use additional non-default charsets: > none, all or a list from this set: > big5 gb2312 gbk japanese euc-kr gujarati tscii > . I'd like the mnoGoSearch of search.debian.org to be recompiled with extra-charsets enabled, because it (I expect) immediately benefits Korean. (Note that Korean doesn't have the problem 2). Since it doesn't need the newer version of mnoGoSearch with ChaSen support (CVS version 3.2.8, to solve problem 2), it can be done now! I think --with-extra-charset=all or --with-extra-charset=big5,gb2312,japanese,euc-kr is a good idea because it enables sane "search results" page. > Note that "japanese" means Shift_JIS, which is not the encoding for > Debian Japanese web pages. Debian Japanese web pages are written > using ISO-2022-JP which seems not be supported by mnogosearch. During browsing the source code of mnoGoSearch, I found that version 3.2.6 seems to support ISO-2022-JP encoding which is used for Debian Japanese pages, though it is not documented. (Of course "japanese" extra-charsets must be enabled in ./configure time.) --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: search.debian.org is online
Hi, From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Subject: Re: search.debian.org is online Date: Mon, 30 Dec 2002 19:53:31 +0900 (JST) > 1. handling of two-byte characters > 2. extraction of words from sentences without whitespaces I think I found the reason of the problem 1. Though mnogosearch supports multibyte languages, it doesn't support them by default. To support them, recompilation is needed. mnogosearch-3.2.7$ ./configure --help . --with-extra-charsets=CHARSET[,CHARSET,...] Use additional non-default charsets: none, all or a list from this set: big5 gb2312 gbk japanese euc-kr gujarati tscii . Note that "japanese" means Shift_JIS, which is not the encoding for Debian Japanese web pages. Debian Japanese web pages are written using ISO-2022-JP which seems not be supported by mnogosearch. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: lists.debian.org de-localization
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Sun, 12 Jan 2003 04:14:45 +0100 > This, on the other hand, is a hassle to handle (backporting or installation > into subdirs). master.d.o is scheduled to be upgraded to woody after samosa. > That's all I know. This is a good news. Then I will work later on various encoding support. Anyway, I don't expect the new master.d.o will have development version of MHonArc (with encoding-assuming feature for raw 8bit headers) even if it comes from non-Debian-package version. Thus I think we will have to have some method to handle raw 8bit headers. Here is a "filter" to convert 8bit characters (assumed to be KOI8-R) to "&#;" expression, which I wrote by imitating iso8859.pl, CharEnt.pm, and UTF8.pm . This filter is used for raw 7bit/8bit strings. Since 7bit part of KOI8-R is identical to ASCII, it doesn't harm legal ASCII headers. The filter is to be installed into org/lists.debian.org/mhonarc/share/mhonarc/MHonArc/DEBIAN.pm and doesn't depend on the version of MHonArc or Debian. ## DEBIAN.pm by Tomohiro KUBOTA <[EMAIL PROTECTED]> ## ## CHARSETCONVERTER module that assume input string to be KOI8-R ## and convert it into &#xxx; expression where xxx is decimal Unicode ## codepoint. package DEBIAN; %US_ASCII_To_Ent = ( #-- # Hex CodeEntity Ref # ISO external entity and description #-- 0x22, """, # ISOnum : Quotation mark 0x26, "&",# ISOnum : Ampersand 0x3C, "<", # ISOnum : Less-than sign 0x3E, ">", # ISOnum : Greater-than sign ); %KOI8_R_To_Ent = ( #-- # Hex CodeEntity Ref # ISO external entity and description #-- 0x80, "─", # BOX DRAWINGS LIGHT HORIZONTAL 0x81, "│", # BOX DRAWINGS LIGHT VERTICAL 0x82, "┌", # BOX DRAWINGS LIGHT DOWN AND RIGHT 0x83, "┐", # BOX DRAWINGS LIGHT DOWN AND LEFT 0x84, "└", # BOX DRAWINGS LIGHT UP AND RIGHT 0x85, "┘", # BOX DRAWINGS LIGHT UP AND LEFT 0x86, "├", # BOX DRAWINGS LIGHT VERTICAL AND RIGHT 0x87, "┤", # BOX DRAWINGS LIGHT VERTICAL AND LEFT 0x88, "┬", # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL 0x89, "┴", # BOX DRAWINGS LIGHT UP AND HORIZONTAL 0x8a, "┼", # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL 0x8b, "▀", # UPPER HALF BLOCK 0x8c, "▄", # LOWER HALF BLOCK 0x8d, "█", # FULL BLOCK 0x8e, "▌", # LEFT HALF BLOCK 0x8f, "▐", # RIGHT HALF BLOCK 0x90, "░", # LIGHT SHADE 0x91, "▒", # MEDIUM SHADE 0x92, "▓", # DARK SHADE 0x93, "⌠", # TOP HALF INTEGRAL 0x94, "■", # BLACK SQUARE 0x95, "∙", # BULLET OPERATOR 0x96, "√", # SQUARE ROOT 0x97, "≈", # ALMOST EQUAL TO 0x98, "≤", # LESS-THAN OR EQUAL TO 0x99, "≥", # GREATER-THAN OR EQUAL TO 0x9a, " ", # NO-BREAK SPACE 0x9b, "⌡", # BOTTOM HALF INTEGRAL 0x9c, "°", # DEGREE SIGN 0x9d, "²", # SUPERSCRIPT TWO 0x9e, "·", # MIDDLE DOT 0x9f, "÷", # DIVISION SIGN 0xa0, "═", # BOX DRAWINGS DOUBLE HORIZONTAL 0xa1, "║", # BOX DRAWINGS DOUBLE VERTICAL 0xa2, "╒", # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE 0xa3, "ё", # CYRILLIC SMALL LETTER IO 0xa4, "╓", # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE 0xa5, "╔", # BOX DRAWINGS DOUBLE DOWN AND RIGHT 0xa6, "╕", # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE 0xa7, "╖", # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE 0xa8, "╗", # BOX DRAWINGS DOUBLE DOWN AND LEFT 0xa9, "╘", # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE 0xaa, "╙", # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE 0xab, "╚", # BOX DRAWINGS DOUBLE UP AND RIGHT 0xac, "
Encoding of db.debian.org page
Hi, I found that http://db.debian.org/ is written in ISO-8859-1, as the page says: I hope it were UTF-8, if the backend database is working on UTF-8 or other encodings of Unicode. It will enable users to input UTF-8 characters in the web forms and developers can store their names in their own native expressions. (In such a case, Debian developer database should have additional items of ASCII expressions of their names, because I cannot read or even input from keyboard Arab or Thai characters.) Please note I don't know about database nor I have no experience using database.
Re: lists.debian.org de-localization
Hi, From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Tue, 07 Jan 2003 21:45:05 +0900 (JST) > I think more important problem is how to deal with raw 8bit mail > headers without encoding specification or encodings which are not > supported by the current set-up but used in Debian mailing lists > (GB2312, BIG5, and KOI8-R). I heard that the current development version of MHonArc has a feature to assume raw 8bit characters as some specified encoding . However, I don't think this can be a solution now because it will take a very long time that the version will be stable, then the stable version will be adopted into unstable/testing version of Debian distribution, then the distribution will become stable (released), and then the stable distribution will be adopted to master.debian.org . Anyway, I can write a KOI8-R -> SGML entity (or "&#;" expression) filter very easily. My plan is to assume raw 8bit characters to be KOI8-R Russian and I think this can be achieved easily. Remained problem is: how to handle unsupported encodings such as GB2312 and Big5. I found that the current set-up of lists.debian.org mhonarc converts GB2312 and Big5 into raw 8bit streams (or can be said 16bit streams because these encodings are multibyte) and they also cause encoding conflicts and loss of following "<" in "". Thus I'd like these encodings to be converted into "&#;" expressions. (Also, debian-esperanto people may want to use ISO-8859-3 and UTF-8.) I found master.debian.org:/org/lists.debian.org/mhonarc/share/mhonarc/MHonArc/UTF8.pm but I don't think this will work well because it depends on Unicode::MapUTF8 module which is available as libunicode-maputf8-perl package since Woody, where master.debian.org is Potato. Then, I might be able to write an original filter using libtext-unicode-perl but the package is also available since Woody. I don't know any other ways. Any suggestions? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages Date: Thu, 9 Jan 2003 14:22:19 +0100 > Oh, yeah, that's in another sub. I'll find it and have it use > from_utf8_or_iso88591_to_sgml() as well, of course. How about this patch? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/ --- people.pl 2003-01-09 07:31:58.0 +0900 +++ people.pl.new 2003-01-11 09:05:07.0 +0900 @@ -442,15 +442,23 @@ foreach (`ldapsearch -P 2 -x -h db.debian.org -b dc=debian,dc=org uid=\* cn mn sn labeledurl`) { chop; $line = $_; if ($line =~ /^(dn: )?uid=(.+),.+$/) { $name = $2; } -elsif ($line =~ /^cn(=|: )(.+)$/) { $ldap_cn = $2; } +elsif ($line =~ /^cn(=|: )(.+)$/) { + $ldap_cn = from_utf8_or_iso88591_to_sgml($2); +} elsif ($line =~ /^mn(=|: )(.+)$/) { next; } -elsif ($line =~ /^sn(=|: )(.+)$/) { $ldap_sn = $2; } +elsif ($line =~ /^sn(=|: )(.+)$/) { + $ldap_sn = from_utf8_or_iso88591_to_sgml($2); +} elsif ($line =~ /^(\w+):: (.+)$/) { use MIME::Base64; my $namepart = $1; my $worddata = decode_base64($2); - if ($namepart eq "cn") { $ldap_cn = $worddata; } - elsif ($namepart eq "sn") { $ldap_sn = $worddata; } + if ($namepart eq "cn") { + $ldap_cn = from_utf8_or_iso88591_to_sgml($worddata); + } + elsif ($namepart eq "sn") { + $ldap_sn = from_utf8_or_iso88591_to_sgml($worddata); + } elsif ($namepart ne "mn") { die "something went wrong, a non-name field is BASE64 encoded"; }
Re: webwml/english/mirror/Mirrors.masterlist
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: webwml/english/mirror/Mirrors.masterlist Date: Thu, 9 Jan 2003 14:31:01 +0100 > Haha. Good one. I think it's easiest to make mirror_list.pl replace "&\s" > with "&\s", i.e. have it work properly with "foo & bar" but not mess > with entities. "foo&bar" shouldn't happen, and we don't have any mirrors at > AT&T. :) A good idea, but I think it is clearer to modify "&" as "&" manually in Mirrors.masterlist file, because it will lead more consistent principle on how "&" is handled in the Mirrors.masterlist file or other files in webwml CVS repository. However, I don't mind whether your idea or my idea will be adopted. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Translated pages unavailable for DWN 2003
Hi, I found that translated pages of DWN 2003 #01 are not available, though its title is found translated in index pages like http://www.debian.org/News/weekly/2003/index. where is "de", "fr", "ja", and so on. Indeed, http://www.debian.org/News/weekly/2003/01/index..html doesn't exist even for "en" and the language-less page 2003/01/index.html doesn't have language chooser in its footer. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
webwml/english/mirror/Mirrors.masterlist
Hi, As I wrote in the "Unidetified subject!" thread, I rewrite the Mirrors.masterlist file to use ASCII characters only (including SGML entity expressions which themselves are written in ASCII). Howevere, I found that these SGML entities themselves appear in the webpage of http://www.debian.org/mirror/official_sponsors.en.html against my intension. This is because "&" is processed in some scripts and modified into "&". I think there are several ways of solutions. 1) Modify the scripts not to modify "&" into "&", and leave the Mirrors.masterlist as it is. 2) Restore the Mirrors.masterlist and modify the scripts to change 8bit characters into SGML entities. 3) Modify the Mirrors.masterlist to use UTF-8 and modify the scripts to change UTF-8 characters into SGML entities. Now, 2) cannot be a solution because I found Mirrors.masterlist uses not only ISO-8859-1 but also ISO-8859-2 characters. How about 1) or 3) ? PS. I found http://www.debian.org/mirror/sponsors.en.html is too old. Why isn't it regenerated since Nov 11, 2002 ? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages Date: Wed, 8 Jan 2003 15:02:21 +0100 > Sounds very good, thanks. Thank you very much for adopting my patch, but I found the renewed devel/people webpage still has several 8bit characters. There are seven 8bit characters and all of them are from "homepage" lines, like: Dahlqvist, Andr* (http://jota.sm.luth.se/~anedah-9/";>home page) where "*" is \xe9. I imagine that above lines are generated separatedly from canonical_names() and therefore aren't filtered by from_utf8_or_iso88591_to_sgml(). --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
webwml/english/mirror/Mirrors.masterlist (Re: [no subject])
Hi, Sorry for sending a funny mail. It seems that I wrongly removed ":" after "Cc" in the mail header. (However, the content of the mail is intended one.) From: <[EMAIL PROTECTED]> Subject: Unidentified subject! Date: Thu, 09 Jan 2003 07:17:59 +0900 (JST) > Cc Josip Rodin <[EMAIL PROTECTED]> > Subject: webwml/english/mirror/Mirrors.masterlist > From: Tomohiro KUBOTA <[EMAIL PROTECTED]> > X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) > Mime-Version: 1.0 > Content-Type: Text/Plain; charset=us-ascii > Content-Transfer-Encoding: 7bit > > Hi, (snip). --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages
Hi, From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages Date: Tue, 07 Jan 2003 21:29:24 +0900 (JST) > Anyway, though I don't know such a module, your way can be very easily > implemented. I think the easiest one is like following: > > $name =~ s/([\x80-\xff])/"&#".ord($1).";"/eg; I wrote a new filter which - assume the input string is UTF-8 if it can be interpreted as such, - assume it is ISO-8859-1 if not. Since UTF-8 encoding method is relatively strict, it is not likely that ISO-8859-1-intended string is wrongly assumed to be UTF-8. I confirmed that people.names has no octet stream which can be interpreted as UTF-8. (Individual 8bit character must not be UTF-8; in UTF-8, 8bit character must appear in series.) With this filter, my concern is completely solved. Also you don't need to think about future maintainance labor when a new maintainer uses 8bit characters for his/her name. #!/usr/bin/perl sub from_utf8_or_iso88591_to_sgml ($) { my $str=$_[0]; my $strsave = $str; if ($str !~ /[\x80-\xff]/) { # return ASCII string for less machine-time consumption. return $str; } $str =~ s/([\xf0-\xf7])([\x80-\xbf])([\x80-\xbf])([\x80-\xbf])/ "&#" . ((ord($1)&0x7)* 0x4 + (ord($2)&0x3f)* 0x1000 + (ord($3)&0x3f)* 0x40 + (ord($4)&0x3f)) . ";"/eg; $str =~ s/([\xe0-\xef])([\x80-\xbf])([\x80-\xbf])/ "&#" . ((ord($1)&0xf)* 0x1000 + (ord($2)&0x3f)* 0x40 + (ord($3)&0x3f)) . ";"/eg; $str =~ s/([\xc0-\xdf])([\x80-\xbf])/ "&#" . ((ord($1)&0x1f)* 0x40 + (ord($2)&0x3f)) . ";"/eg; if ($str !~ /[\x80-\xff]/) { # $str is UTF-8 compliant, assume UTF-8. return $str; } else { # $str is not UTF-8 compliant, assume ISO-8859-1. $strsave =~ s/([\x80-\xff])/"&#".ord($1).";"/eg; return $strsave; } } while(<>) { chomp($_); print from_utf8_or_iso88591_to_sgml($_); }
Re: lists.debian.org de-localization
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Tue, 7 Jan 2003 11:41:36 +0100 > Hm, but doesn't the section on character sets cover the mails themselves as > well? There are a bit under twenty thousand indices, which is a large amount > by itself, but the mails themselves are really problematic. I think the priority of the mails themselves is much lower than that of the lists, because 8bit character in only one mail affects the whole month in a list. Anyway, I don't think the regeneration or modification is needed *now*, because our solution is not yet perfect. It is obviously waste of machine time to modify both *now* and in future when better solution will be available. I think more important problem is how to deal with raw 8bit mail headers without encoding specification or encodings which are not supported by the current set-up but used in Debian mailing lists (GB2312, BIG5, and KOI8-R). --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages Date: Tue, 7 Jan 2003 12:31:37 +0100 > Hmm, I see how the already-hardcoded ones need to be fixed immediately, but > shouldn't there be a Perl module which we could use to convert stuff > automatically? I never bothered to look for it, but the number of hardcoded > names just because of the character set shouldn't be growing, it's a pain to > maintain. I (weakly) prefer my way to your way because your way will declare that *all* 8bit characters in maintainer names are ISO-8859-1 even in future. I think it is not a very good idea because I hope ISO-8859-1 would fade out and be substituted by UTF-8 in future. Anyway, though I don't know such a module, your way can be very easily implemented. I think the easiest one is like following: $name =~ s/([\x80-\xff])/"&#".ord($1).";"/eg; --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: lists.debian.org de-localization
Hi, From: Marco d'Itri <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Tue, 7 Jan 2003 01:10:29 +0100 > On Jan 06, Tomohiro KUBOTA <[EMAIL PROTECTED]> wrote: > > >> This is not needed, only spammers put raw latin-1 characters in mail > >> headers. > >The key point is that when we receive a mail with raw 8bit characters, > The key point is that we should not even accept mail with raw 8bit > characters in the headers. Though I agree with you, it is an ideal solution. As Stephen said, there are people who use raw 8bit characters (intended to be KOI8-R). If you could force them to use "right" MUAs, I would fully agree with you. Anyway, in the current set-up of lists.debian.org, encodings such as GB2312 and BIG5 (used in debian-chinese-gb and debian-chinese-big5, respectively) are not supported and processed just like raw 8bit characters. We also have to deal with them. I am now interested in MHonArc::UTF8.pm . I had been thinking that it converts all UTF-8 characters (besides ASCII) into &#; expression and doesn't support east Asians, which was wrong. It seems to convert *from* all non-UTF8 encodings *to* UTF-8 and seems to support east Asians also (because Unicode::MapUTF8 supports east Asian encodings). --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: lists.debian.org de-localization
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Mon, 6 Jan 2003 16:07:49 +0100 > Future only. Is there a pressing need to regenerate the old mails? > I would rather avoid it... I have an idea about an easy modification to old list pages. Add the following line to all http://lists.debian.org/*/*/threads.html , http://lists.debian.org/*/*/maillist.html , http://lists.debian.org/*/*/subject.html , and http://lists.debian.org/*/*/author.html besides following exceptions: for debian-chinese-gb, for debian-chinese-big5, for debian-russian, I expect that this can be done by much less machine time than to regenerate all above pages by using MHonArc, because this doesn't need to access any individual mails. I am also now reading MHonArc documents so that these above headers can be added to future pages also (or, if everything can be migrated into UTF-8, it will be much better). --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: lists.debian.org de-localization
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Mon, 6 Jan 2003 16:07:49 +0100 > On Mon, Jan 06, 2003 at 11:42:44PM +0900, Tomohiro KUBOTA wrote: > > Thank you for commiting the modification of debian.rc . Does the change > > affect future archives only? Or all past and future archives? > > Future only. Is there a pressing need to regenerate the old mails? > I would rather avoid it... I also think it is not a good idea to regenerate all pages now. A part of the reason is that it is too heavy, but another part of the reason is that the solution is not very good yet. In future, when MHonArc can handle all popular encodings and convert them into UTF-8, and when it can handle raw 8bit headers by some intelligent way, we may think about regenerate all pages by a very low priority background task. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: lists.debian.org de-localization
Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Mon, 6 Jan 2003 15:09:47 +0100 > > (permission of klecker:/org/www.debian.org/cron/people_scripts/people.pl > > I have no idea how you came from mhonarc to people.pl, but okay. :) Ah, I was confused. It is from another thread in debian-www list about similar 8bit problem on http://www.debian.org/devel/people.ja.html . Please read the recent thread named "automatically-generated ISO-8859-1 characters in mulbibyte webpages" for detail. Thank you for commiting the modification of debian.rc . Does the change affect future archives only? Or all past and future archives? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: lists.debian.org de-localization
Hi, From: Edmund GRIMLEY EVANS <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Mon, 6 Jan 2003 13:45:47 + > If the headers contain 8-bit octets and are valid as UTF-8, it's > fairly safe to assume that they really are UTF-8. Otherwise, you could > look for a Content-Type field or make it depend on the mailing list. A good idea, but I think people who use UTF-8 today are those who know well on character encodings and don't send raw 8bit headers. > I thought some Japanese non-spammers use iso-2022-jp in headers, which > isn't 8-bit, but it isn't us-ascii, either. Am I out of date? Sometimes I read raw iso-2022-jp headers. However, fortunately, there are no Japanese mailing lists in Debian. (debian-japanese is an English mailing list.) And more, MHonArc seems not to have features to convert Japanese into SGML entity or &#; expression and we cannot support Japanese headers anyhow. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: lists.debian.org de-localization
Hi, From: Marco d'Itri <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Mon, 6 Jan 2003 13:34:17 +0100 > >Again, speaking about lists.debian.org, my original idea is to assume > >all 8bit raw characters to be ISO-8859-1, though I don't know this is > >technically possible or not. > This is not needed, only spammers put raw latin-1 characters in mail > headers. The key point is that when we receive a mail with raw 8bit characters, we don't have an easy and relyable method to tell the characters are from ISO-8859-1 or KOI8-R or other character sets. Anyway, in debian-russian mailing list, raw 8bit characters in mail headers should be allowed and they should be assumed to be KOI8-R on building lists.debian.org pages. In any cases, using raw 8bit characters in lists.debian.org pages must be avoided (so that the pages are not broken), and thus, raw 8bit characters in mail headers must be converted into something (or must be deleted). An easy way is to assume *all* raw 8bit characters to be KOI8-R and convert into SGML entity. However, I don't know whether there are some other languages where a certain amount of non-spammer people use raw 8bit characters. If they exist, they will complain on this idea. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: free in free beer? (News/2003/20030102.wml)
Hi, From: Osamu Aoki <[EMAIL PROTECTED]> Subject: Re: free in free beer? (News/2003/20030102.wml) Date: Sun, 5 Jan 2003 23:51:55 -0800 > This is "free bear". Translation shall be: Thank you. I committed the file. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: free in free beer? (News/2003/20030102.wml)
Hi, From: "James A. Treacy" <[EMAIL PROTECTED]> Subject: Re: free in free beer? (News/2003/20030102.wml) Date: Mon, 6 Jan 2003 00:40:49 -0500 > > > The Test Drive Program is a free service of HP. > > > > I think that the "free" here is "free beer", not "free speech". > > > Correct. I see. I'd like to ask the meaning of another "free" in the following sentence: When you register, you get a free shell account you can use to log into the wide variety of systems on the Test Drive network and try out the software and operating systems running on them. I think it (free shell account) is also "free beer". However, the original translator seems to think about "free speech", i.e., "a shell account with which you can run anything you like". --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: lists.debian.org de-localization
Hi, From: "Stephen J. Turnbull" <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization (Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages) Date: Sun, 05 Jan 2003 16:10:02 +0900 > This is a fairly small sample (about 100 subscribers, 25 regular > posters). However, the Russian spam I've seen (isn't it funny how you > can identify spam even though you can't read the language it's written > in?) invariably fails either the addressee tests (implicit, too many), > the known spam software test, or the HTML-only test. So (FWIW) I've > disabled the 8-bit test and so far the Russian subscribers are happy. IMO, in such a case, allowing raw 8bit mails is better (i.e., its merit is larger than its demerit) than disabling them. Again, speaking about lists.debian.org, my original idea is to assume all 8bit raw characters to be ISO-8859-1, though I don't know this is technically possible or not. In this case, Russian people will be annoyed browsing lists.debian.org pages. If it is possible to have "assumption encoding" for each mailing list, that of debian-russian list will be KOI8-R, that of debian-chinese-gb will be GB2312, and so on, and all others ISO-8859-1. I also hope there are some UTF-8 filters. (There seems a writer who uses UTF-8 name (From:) in debian-esperanto.) However, I don't know at all about MHonArc --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
free in free beer? (News/2003/20030102.wml)
Hi, Now I am reviewing a Japanese translation of 20030102.wml news page in Debian web site. The page says: > The Test Drive Program is a free service of HP. I think that the "free" here is "free beer", not "free speech". Debian always says "Debian is free software, in free speech meaning, not free beer meaning" and we always fight against "free-of-charge software" interpretation. Thus, I think that the word "free" without any comments must mean "free in free speech", not "free beer", when the word is spoken by Debian. If Debian wants to mention about "free beer", the word "free" has to have some comments. Now the Test Drive's "free" means "free beer", I think. Thus, I propose to write like following: The Test Drive Program is a free-of-charge service of HP. Any comments? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: lists.debian.org de-localization
Hi, From: [EMAIL PROTECTED] (Denis Barbier) Subject: Re: lists.debian.org de-localization (Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages) Date: Sun, 5 Jan 2003 15:33:41 +0100 > > Why not use iso_8859::str2sgml; instead of mhonarc::htmlize for iso-8859-1? [...] > Sounds like a very good idea. Who should I ask for this modification? (permission of klecker:/org/www.debian.org/cron/people_scripts/people.pl is 755 owner=joy group=debwww, but I don't know whether klecker is the rignt place to do because I checked klecker just by chance. I also checked gluck(=www.debian.org) and master but they don't have the file. Where can I find a document on how /org/* are processed?) --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: date of events
Hi, From: [EMAIL PROTECTED] (Denis Barbier) Subject: Re: date of events Date: Mon, 6 Jan 2003 01:18:09 +0100 > > I don't see any worthwhile advantage of this new setup over the old one. > > Or am I missing something? > > Tomohiro KUBOTA explained many times that having several encodings in > a single file is painful. With this new setup, the problem reported > here could also not occur. Right. I had to edit ctime.wml with a broken (in meaning of i18n) editor which doesn't regard any multibyte encodings (not to break 8bit encoding parts) but happens to display multibyte character well (to input multibyte character). Usage of gettext is a wonderful advantage. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: DWN #49 2002 typo?
From: Peter Karlsson <[EMAIL PROTECTED]> Subject: Re: DWN #49 2002 typo? Date: Sun, 5 Jan 2003 23:26:35 +0100 (CET) > Tomohiro KUBOTA: > > > I hope more plain words without fear of misunderstanding are used. > > Please don't use minor meaning of a word with multiple meanings. > > Isn't that why we are translating the pages to the different languages? > To make sure people understand the pages without needing to know the > finer aspects of the source language? It is too simplification. Since I am (and none of members of Japanese translation teams are) professional translator, we sometimes fail to translate very difficult English. IMO, translation is useful because: - it omits labor and time from readers - it is like avoiding "reinvention of wheels". - many people would not read English because of such labor and time; thus translation is a good way to advertise Debian to such many people. - of course there are many people who don't know the basic of English and dictionaries cannot help them. > I often look up words in my dictionary when I translate to make sure I > get it right. I most often do, but sometimes I don't. I translate to > make sure that the people that cannot read the English text well enough > can understand the translation. Sure, in this case, I should consult my dictionary. However, I am sometimes unsure that such "lower meaning" of a word is used, because I don't understand the reason why such lower meaning must be used even when we have more plain words with the same meaning. In this case, "advise" did have a meaning of "tell" or "inform" in my dictionary but it was the last meaning in the list of meanings. Since a dictionary lists meanings from more possible to less possible, I think it is hardly possible the last meaning is used here. Furthermore, there are more clear words like "tell" or "inform" to say the same thing. In this case, we are lucky because I found the mistranslation. However, such a mistranslation may be missed if the mistranslation is self-consistent. To avoid this type of mistranslation, we have to consult *every* words with dictionary. It is impossible. I think English writers are *now* trying to write plain English, because it is required in http://www.debian.org/devel/website/working . However, I fear that English writers sometimes don't know which type of clearness and simpleness translators need. Thus, I say now that please use the most significant meaning of a word as far as possible. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: Norwegian Bokm(â)l
Hi, From: Tollef Fog Heen <[EMAIL PROTECTED]> Subject: Re: Norwegian Bokm(â)l Date: 05 Jan 2003 10:53:19 +0100 > | 2. modify "Bokm*l" to "Bokmâl". > > It is å, not â Thank you for your correction. I used correct one to modify language_names.wml but forgot to mention about that. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: Norwegian Bokm(â)l
Hi, From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Subject: Norwegian Bokm(â)l Date: Sun, 05 Jan 2003 10:45:44 +0900 (JST) > 1. "Bokm*l" and "Nynorsk" to be translation items. > > or > > 2. modify "Bokm*l" to "Bokmâl". I did both, i.e., modify webwml/english/template/debian/language_names.wml to have additional two items for Bokm*l and Nynorsk and gave the SGML entity expression as default translation (i.e., msgid). --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Norwegian Bokm(â)l
Hi, I found the page http://www.debian.org/international/l10n/po/index.ja.html is broken where "Norwegian Bokm*l" (* is a with circ) is written. This is because the 8bit character (ISO-8859-1) is regarded as the first byte of multibyte character of Japanese EUC-JP. It is recently when the Debian webpage adopted gettext to translate items. Before then, "Norwegian Bokm*l" and "Norgegian Nynorsk" were targets to be translated and such problem didn't occur because I translated these words into Japanese. However, now, these translations are lost and the page is brokwn. I think there are two ways of solution. 1. "Bokm*l" and "Nynorsk" to be translation items. or 2. modify "Bokm*l" to "Bokmâl". --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
lists.debian.org de-localization (Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages)
Hi, From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages Date: Fri, 03 Jan 2003 09:06:43 +0900 (JST) > BTW, I found similar trouble in lists.debian.org pages. In thread-list > pages or date-list pages like > > http://lists.debian.org/debian-devel/2002/debian-devel-200212/threads.html, > > there are no charset specification. In such cases, web browsers will > assume these pages according to user preference. Naturally, Japanese > people configure web browsers to "assume Japanese encoding for pages > without charset specification". On the other hand, the thread-list > pages show senders' names in format, and threfore, a tag > follows the name. If the last letter of the name is 8bit, the tag > is broken. The result is that all following part are shown in > (italic) format. > > The test is easy: please configure your browser to "assume Japanese > encoding for pages without charset specification" and load the above > page. > > > However, in this case, the solution is a bit complicated. All mails > should have encoding information in MIME format. Thus, the best > solution would be to parse MIME. On the other hand, the simplest > makeshift solution is to add "charset=iso8859-1" for all pages > but there are mailing lists where most of 8bit characters are > cyrillic and so on. I found that MHonArc has a feature to solve this problem. http://www.mhonarc.org/MHonArc/doc/faq/mime.html#nonascii I checked /org/lists.debian.org/mhonarc/debian.rc and found that it seems to ssume that any 8bit characters are ISO-8859-1. > > plain; mhonarc::htmlize; > us-ascii; mhonarc::htmlize; > iso-8859-1; mhonarc::htmlize; > iso-8859-2; iso_8859::str2sgml; iso8859.pl > iso-8859-3; iso_8859::str2sgml; iso8859.pl Why not use iso_8859::str2sgml; instead of mhonarc::htmlize for iso-8859-1? (Though I am new to MHonArc, I imagine that iso_8859::str2sgml converts ISO-8859 8bit characters into SGML entity like "ö".) It would be nice if we can convert raw 8bit mail headers (though it is illegal; it sometimes happens and may cause breaking the lists.debian.org pages) to SGML entities by assuming they are ISO-8859-1. Since this may annoy Russian (and other non-ISO-8859-1) people who happen to use MUAs which generates illegal mail headers with 8bit characters without charset specification, I'd like to hear from people from various countries. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/
Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages
Hi, From: [EMAIL PROTECTED] (Denis Barbier) Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages Date: Thu, 2 Jan 2003 16:24:59 +0100 > I find only 18 names in people.names containing non-ASCII letters, > so /org/www.debian.org/cron/people_scripts/people.pl could contain > some extra elsif in its canonical_names function to replace > non-ASCII letters by HTML entities. Most names seem to be ISO-8859-1 > encoded. I implemented your idea. Here is a patch. I assumed all 8bit characters to be ISO-8859-1. Could someone apply this? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/ people.pl.DIFF Description: Binary data
Re: DWN #49 2002 typo?
Hi, From: Peter Karlsson <[EMAIL PROTECTED]> Subject: Re: DWN #49 2002 typo? Date: Fri, 3 Jan 2003 18:55:08 +0100 (CET) > > It reads that Thomas advised Susan to rewrite the entire manual page; > > though the link[1] says that Susan has already rewritten the page. > > No, Thomas did not advise Susan, Thomas advised the readers. Advise can be > seen as a synonym for "announce" here. Thank you for explanation. Now I understand. I hope more plain words without fear of misunderstanding are used. Please don't use minor meaning of a word with multiple meanings. In real, the original Japanese translator translated as Thomas said Susan to rewrite the entire manual pages and several reviewers of the translation could not find the mistranslation. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/