Re: Which does "A" in "DSA" mean, "Alert" or "Advisory"?

2004-05-24 Thread Tomohiro KUBOTA
Hi,

From: Gerfried Fuchs <[EMAIL PROTECTED]>
Subject: Re: Which does "A" in "DSA" mean, "Alert" or "Advisory"?
Date: Thu, 20 May 2004 16:14:17 +0200

> > I also changed "SuSE Security Alert" to "SuSE Security Announcement".
> 
>  Where? Have you checked if that naming was correct at the time of the
> writing? I guess they call it even differently these days, because they
> are now SUSE and not SuSE anymore, too. I guess we shall stick with the
> naming that they used at the time of the writing

Here is the page linked as "SuSE Security Announcement" from
security/1999/19990330.wml (Previously "SuSE Security Alert").

 http://seclists.org/lists/bugtraq/1999/Mar/0216.html

This page has a title of "Bugtraq: SuSE Security Announcement - XFree86".

The page contains a PGP-signed message which include a title of "SeSE
Security Announcement" (though I have not validate the sign), thus I
guess it is not likely that the page was modified later.

I have not checked further.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Debian web page validation

2004-05-18 Thread Tomohiro KUBOTA
Hi,

Alfie is kind enough to run Debian web page validation pages.

http://people.debian.org/~alfie/validate/

However, a validation page for Japanese 

http://people.debian.org/~alfie/validate/ja

seems to have a problem for a few weeks, like:

/home/alfie/bin/validate.sh: /home/alfie/extra/bin/validate: No such file 
or directory

Pages for other languages don't have this problem.  Alfie or
someone, could you please fix this situation?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Re: Which does "A" in "DSA" mean, "Alert" or "Advisory"?

2004-05-17 Thread Tomohiro KUBOTA
Hi,

From: Matt Kraai <[EMAIL PROTECTED]>
Subject: Re: Which does "A" in "DSA" mean, "Alert" or "Advisory"?
Date: Fri, 14 May 2004 03:08:32 -0700

> The security team is the authoritative source, so the correct term
> is "advisory".  Please change "alert" to "advisory" where you find
> it.

I have done.  The following is a list of changed flles.

english/index.wml
english/MailingLists/desc/devel/debian-security.wml
english/News/weekly/2000/23/index.wml
english/News/weekly/2001/11/index.wml
english/News/weekly/2001/34/index.wml
english/security/index.wml
english/security/1997/index.wml
english/security/1998/index.wml
english/security/1999/index.wml
english/security/2000/index.wml
english/security/2001/index.wml
english/security/2002/index.wml
english/security/2003/index.wml
english/security/2004/index.wml
english/security/undated/index.wml


I also changed "SuSE Security Alert" to "SuSE Security Announcement".

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Re: Which does "A" in "DSA" mean, "Alert" or "Advisory"?

2004-05-13 Thread Tomohiro KUBOTA
Hi,

From: Tomohiro KUBOTA <[EMAIL PROTECTED]>
Subject: Which does "A" in "DSA" mean, "Alert" or "Advisory"?
Date: Wed, 12 May 2004 12:49:18 +0900 (LDT)

> For example, http://www.debian.org/security/index.en.html shows
> a list of "alerts".  On the other hand, page for each item (ex. 
> http://www.debian.org/security/2004/dsa-502.en.html) has a title
> of "Debian Security Advisory" which implies "A" in "DSA" means
> "advisory".

My last mail seems not very clear.  In short, my concern is:

  - Are these usages of terms "alert" and "advisory" intended?
Or, just vacillations?
  - If intended, how "alert" or "advisory" is chosen in each page?
  - If vacillations, I propose unification of these terms.
However, I have no idea which ("alert" or "advisory") should
be used.

Any comments?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Which does "A" in "DSA" mean, "Alert" or "Advisory"?

2004-05-11 Thread Tomohiro KUBOTA
Hi,

Japanese translation team found that the term "advisory" has
been translated into two Japanese words from pages to pages.
Thus, we would like to decide a unified Japanese word.

However, we also found during the unification work that English
pages have ambiguity around usage of terms of "alert" and
"advisory".

For example, http://www.debian.org/security/index.en.html shows
a list of "alerts".  On the other hand, page for each item (ex. 
http://www.debian.org/security/2004/dsa-502.en.html) has a title
of "Debian Security Advisory" which implies "A" in "DSA" means
"advisory".

IMO, Japanese translation team is able to assign different
Japanese word for each of "advisory" and "alert", if each of
English page intentionally chooses "advisory" or "alert".
However, I doubt.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



what is "redaction"?

2004-04-06 Thread Tomohiro KUBOTA
Hello,

When I translate Debian webpages into Japanese I found a sentence
which I cannot understand.
webwml/english/devel/todo/items/60dwn.wml reads:

  Debian Weekly News is looking for contributors. They need some people who
  keep in touch with mailing lists and news websites to help in the weekly
  redaction of what's happening in the Debian world.
  ~

What is "redaction"?  My dictionary doesn't have the word.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Re: [tux-master@web.de: Fixed a few wrong URLs in Japanese translation of www.debian.org]

2004-03-18 Thread Tomohiro KUBOTA
Hi,

From: Jens Seidel <[EMAIL PROTECTED]>
Subject: [EMAIL PROTECTED]: Fixed a few wrong URLs in Japanese translation of 
www.debian.org]
Date: Tue, 16 Mar 2004 16:41:14 +0100

> I wrote a small script which compares URLs in english/ with URLs in
> translations and found a few wrong URLs.
> 
> Please check the attached patch, it's possible that a few of my changes
> are related to outdated translations or minor changes to reflect
> Japanese style.
> It's hard for me to compare because the order of links changed.
> Nevertheless I'm sure I will many find more errors after refining my
> script (but first I want to see a few commits).

Thank you for this check.  I found it is useful.  Some of
diffs have been fixed.

How about putting the script in webwml/ directory so that
translators can check it?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Re: I don't understand News/2000/20000216.wml

2004-03-16 Thread Tomohiro KUBOTA
Hi,

From: Jens Seidel <[EMAIL PROTECTED]>
Subject: Re: I don't understand News/2000/2216.wml
Date: Mon, 15 Mar 2004 01:42:35 +0100

> As far as I understand it means something similar to
> "Scan.pm y2k problem Date: fixed missing message's date
>  (was incorrect in 2000 or later)."


I found the following description in the changelog of im:

> 135 (2000/01/05) mew-dist release
>
> * Y2K fixes for broken year and no Date: field.
> SAITO Tetsuya <[EMAIL PROTECTED]>

Though the version (135) is very different from the Debian page's
description (1:100-3), I found the following mail (Japanese) and
I think it is the original bug report.

   http://www.mew.org/ml/mew-dist-1.94/msg01382.html

Here is a digest of the mail:

   My name is Hosono.

   When a message doesn't have "Date:" field and the date
   (I - Kubota - guess it means the date when im is invoked)
   is 2000, a problem occur.  The year is displayed as "100",
   like following:

   31  100/01/01  Re: [linux-users:63039] Re: PCMCIA...

   Here I post a tiny patch to fix this problem.  This patch
   has been already applied to the im package of Kondara MNU/Linux
   distribution.

Here I think the meaning of the sentence is clear.  I propose
to rewrite the sentence in the wml file like following:

   Scan.pm y2k problem: Messages without "Date:" fields will
   be processed wrongly in 2000 or later.


> KUBOTA, what about changing the indentation of the colons? The Japanese
> file looks in konqueror similar to:
> package: im
> version: 1:100-3
> architectures: all
> issue  : Scan.pm y2k problem Date: filed missing message's

Please see the page using Japanese-enabled environment.  The colons
are already well aligned.  I imagine your browser cannot handle
doublewidth characters.


> PS: Three weeks ago I sent you two mails related to wrong URLs in Japanese
> translation. You never replied ... I tried to fix most of these already
> but it would be easier if I could speak Japanese :-))

Is the message sent me directly, or to debian-www?  I could not find
a direct message to me.  I am afraid I lost the message among spams

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



I don't understand News/2000/20000216.wml

2004-03-14 Thread Tomohiro KUBOTA
Hi,

During updating Japanese translation of Debian webwml,
I found a possible typo in English page.

The page is News/2000/2216.wml.

--
  package  : im
  version  : 1:100-3
  architectures: all
  issue: Scan.pm y2k problem Date: filed missing message's date
 (was incorrect in 2000 or later).
--

In the "issue", I imagine "filed" should be "field".  Am I right?

Also, I don't understand the whole "issue" sentence.  Where is the
subject, where is the verb, and is this grammatically correct?
I might understand the sentence wrongly and the "filed" might be
correct

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



dsa-455.wml: native expression of a personal name.

2004-03-08 Thread Tomohiro KUBOTA
Hello,

I added a native expression of a personal name (reporter
of the original vulnerability) in webwml/security/2004/dsa-455.wml.

In case of Japanese, there are often several possible candidates
of Kanji names which share one reading.  (Latin alphabet expression
is just a transliteration and drops any information other than
reading).  Of course only one out of them is "correct" for each
person.


In case of "Yuuichi Teranishi", there are many possible candidates
of Kanji expression of "Yuuichi" and of course only one of them
is correct and other candidates are wrong.  (Of course there are
other "Yuuichi"s whose Kanji expressions are different.)  On the
other hand, there is virtually one candidate for "Teranishi".

Here I explain how to find the correct Kanji expression for
*this* Yuuichi.

Now, the following URL shows a report of this vulnerability.

   http://mail.gnome.org/archives/xml/2004-February/msg00070.html

The text has his mail address and GPG public key.  Next, I found
the following URL:

   http://emacs-w3m.namazu.org/ml/msg06080.html

The text has the same mail address and points the same URL for
the GPG public key.  Thus these two messages are written by the
same person.  The latter has Kanji expression.  It is what I
searched for.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Bug#235590: www.debian.org: translation statistics pages lack distrib.po

2004-03-01 Thread Tomohiro KUBOTA
Package: www.debian.org
Version: 20040301
Severity: normal

http://www.debian.org/devel/website/stats/ pages have
statistics on po/*.po files.  However, it lacks distrib.po.

I imagine webwml/english/po/Makefile lacks something on distrib.po.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Bug#227273: packages.debian.org: charset mismatch (always in UTF-8?)

2004-02-16 Thread Tomohiro KUBOTA
Hi,

From: Hideki Yamane <[EMAIL PROTECTED]>
Subject: Bug#227273: packages.debian.org: charset mismatch (always in UTF-8?)
Date: Mon, 16 Feb 2004 23:30:51 +0900

>  # Sorry, I thought that it is not such a difficult thing.
>Because Debian web contents use ISO-2022-JP.

It is because Debian web pages are converted into ISO-2022-JP *at the
very final stage* of the generation.  Thus, if the conversion will be
the very final process in packages.debian.org, the pages will be OK.
However, even in this case, if someone were add some extra item at
the last stage of the generation process, Japanese pages would be
easily broken.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Bug#227273: packages.debian.org: charset mismatch (always in UTF-8?)

2004-02-16 Thread Tomohiro KUBOTA
Hi,

From: Hideki Yamane <[EMAIL PROTECTED]>
Subject: Bug#227273: marked as done (packages.debian.org: charset mismatch 
(always in UTF-8?))
Date: Mon, 16 Feb 2004 18:27:31 +0900

>  http://packages.debian.org/unstable/x11/9menu.ja.html is nice,
>  it works without mojibake. good.
> 
>  but, see, description of 9menu in list page
>  http://packages.debian.org/unstable/x11/index.ja.html caused mojibake. 
>  wrong characters are there.

I checked the index.ja.html and found:

1. Starting escape sequence is missing for many descriptions.

Explanation on starting escape sequences:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=227273&msg=31

2. For example, 0x3C in Japanese "state" in ISO-2022-JP is wrongly
   regarded as ASCII "<" and then converted into "<".


I now STRONGLY insist that ISO-2022-JP should not be used because
it is difficult to fix all such problems.  I recommend EUC-JP.
Both of above problems will be automatically solved (and more
importantly, we won't have to consider such problems when we
will want to modify or improve the generating scripts).

Hideki, do you prefer ISO-2022-JP even now?  It is Japanese people
who suffer such endless problems!  If we stick to use ISO-2022-JP,
both maintainers and users of Japanese web pages would suffer such
problems forever.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Please give him a CVS COMMIT account (Re: There are 'three' conferences in that announcement.)

2004-02-09 Thread Tomohiro KUBOTA
Hello,

I am a Japanese translation cordinator.

Please give him a commit account for Debian webwml repository.
He has translated Debian Weekly News for Japanese for more than
a year.  If he will have an account, Japanese pages will be
able to catch up with English pages more quickly.  Also, by
reading English version of DWN regularly and carefully for
translation, he will able to find typos in the English pages.
(Since Japanese language is very different from English or
other European languages, Japanese people have to read English
pages *very* carefully to translate them.  From some point of
view, even Chinese is similar to English while Japanese is not.)


From: Nobuhiro IMAI <[EMAIL PROTECTED]>
Subject: Re: There are 'three' conferences in that announcement.
Date: Sat, 07 Feb 2004 04:04:27 +0900 (JST)

> >  You have to check with your translation coordinator, they will have to
> > request it from the webmaster team.
> 
> Thanks, I'll do so.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/


pgpefcZpRiYcB.pgp
Description: PGP signature


Bug#227273: www.debian.org: Japanese DDTP files are provided with EUC-JP endoding.

2004-01-28 Thread Tomohiro KUBOTA
From: Frank Lichtenheld <[EMAIL PROTECTED]>
Subject: Bug#227273: www.debian.org: Japanese DDTP files are provided with 
EUC-JP endoding.
Date: Thu, 29 Jan 2004 01:25:26 +0100

> On Thu, Jan 29, 2004 at 12:16:19AM +0100, Frank Lichtenheld wrote:
> > I used the Perl module Text::Iconv which itself uses iconv(3)
> > This module seems to suck or I am to dump to use it. If I convert the
> > raw Japanese Packages file with iconv(1) (which probably uses iconv(3), 
> > too) all escape sequences seem to be generated correctly, if I use
> > Text::Iconv->convert, only the very first one is.
> 
> Correction, it only forgets the very last escape sequence since this one is 
> not generated by iconv(3). It "forgets" to clear the state at the end of 
> the conversion which I found out in comaprison with iconv(1) that handles 
> this case correctly. I prepared a patch and will file a bug against the 
> package.

I tested on gluck (packages.debian.org):

  (a)
  $ echo -en '\xa4\xa2' | iconv -f EUC-JP -t ISO-2022-JP | od -t x1
  000 1b 24 42 24 22 1b 28 42
  010

The last three bytes is the closing escape sequence.
Thus iconv(1) works well.  Next, I wrote the following script:

  (b)
  #!/usr/bin/perl
  use Text::Iconv;
  $conv = Text::Iconv->new("EUC-JP", "ISO-2022-JP");
  $a=""; while(<>){ $a .= $_; }
  $b = $conv->convert($a);
  print $b;

Then

  (c)
  $ echo -ne '\xa4\xa2' | ./a.pl | od -t x1
  000 1b 24 42 24 22
  005

In this case, closing escape sequence is missing.  However, if the
source string has some following characters after JIS X 0208 Japanese
characters, like:

  (d)
  $ echo -e '\xa4\xa2' | ./a.pl |od -t x1
  000 1b 24 42 24 22 1b 28 42 0a
  011

  (e)
  $ echo -ne '\xa4\xa2\x41' | ./a.pl |od -t x1
  000 1b 24 42 24 22 1b 28 42 41
  011

Then the closing escape sequence is added.

Explanation:
In the case of (e), it is clear that closing escape sequence is
needed.  In case of (d), it is also needed because ISO-2022-JP
requires that when Line Feed appears the "state" must be ASCII.
In case of (c), Text::Iconv does not know whether the following
string will be Japanese or ASCII.  Addition of closing escape
sequence would be redundant if Japanese would follow.  I imagine
this is why Text::Iconv does not add closing escape sequence in
this case.

I think the safest way is to use Text::Iconv to convert the whole
web page at one time.  (Or, at least the whole line (logical
line which ends with Line Feed code) at one time.)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Bug#227273: www.debian.org: Japanese DDTP files are provided with EUC-JP endoding.

2004-01-26 Thread Tomohiro KUBOTA
Hi,

From: Hideki Yamane <[EMAIL PROTECTED]>
Subject: Bug#227273: www.debian.org: Japanese DDTP files are provided with 
EUC-JP endoding.
Date: Sun, 25 Jan 2004 01:10:41 +0900

>  * tag is OK. That says "content="text/html; charset=ISO-2022-JP"".
>  * It looks like contents is not valid ISO-2022-JP. I don't know why.
>Frank, would you tell me the way how did you convert it from EUC-JP
>to ISO-2022-JP ? 

I checked http://packages.debian.org/unstable/misc/language-env.ja.html
and found that closing escape sequences are missing.


ISO-2022-JP is a "stateful" encoding.  It means that a string consists
of escape sequences to determine the "state" and ordinary codes whose
meaning (corresponding characters) depends on the "state".

For example,  is:

1B 24 42 24 22 1B 28 42

where 1B 24 42 (the starting three bytes) means "here starts JIS X 0208
Japanese", 24 22 (following two bytes) is Japanese Hiragana A and the
following 1B 28 42 means "here starts ASCII".  In Japanese state, 24 22
means Japanese Hiragana A while in ASCII state it means Dollar and Double
Quatation.


I said closing escape sequences are missing.  This means the "here starts
ASCII" part is missing.  Thus, all of the following ASCII characters
(including HTML tags) are regarded as Japanese and causes Mojibake.

I don't know what algorithm is used for generating the page, so I have
no idea the reason of this broken page.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



"Debian Installer" in devel/debian-installer/index.wml

2004-01-22 Thread Tomohiro KUBOTA
Hello,

During checking Japanese translation of 
devel/debian-installer/index.wml, I noticed one point.

"Debian-Installer" is (according to my understanding) a name
(proper noun) of a specific installer system (which will be
a successor or "boot-floppies"), not just a common noun which
can be substituted by "installer of Debian".

However, Debian-Installer is expressed in various ways, like
"Debian Installer" and so on.  Because of this, translators
have difficulty distinguishing Debian installer as a common
noun and Debian-Installer as a proper noun.  (Usage of capital
letter cannot always be a test, because many software have
lowercase names, like "boot-floppies".)

Could someone clarify the distinction?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Bug#227273: packages.debian.org: charset mismatch (always in UTF-8?)

2004-01-13 Thread Tomohiro KUBOTA
Hi,

> Hmm, I have difficulties to understand what you mean. I will try to
> formulate your report in my own words:
> Currently the Japanese pages are served in UTF-8 (is this right?),
> but you request that we serve it in iso-2022-jp instead, because
> UTF-8 causes problems in reading the pages.
> 
> Have I understood you correctly?

I will explain.  (I am the author of the "Mojibake" page which
Yamane-san introduced.)

For example, please see

   http://packages.debian.org/stable/misc/language-env.ja.html

The HTML source of the page says the page is written in UTF-8.
(The 6th line of

   

).  However, in the reality, the page is written in EUC-JP.
Because of this inconsistency, web browsers will render the page
by assuming the page is UTF-8 and the result will be the Mojibake.

The main point of the problem is this inconsistency.  Thus, at
least, this inconsistency must be fixed.

There are several ways to fix this problem.

(a) Change the encoding of the page to UTF-8 (to match the 6th line).
(b) Change the 6th line to EUC-JP (to match the real content).
(c) Change both the 6th line and the encoding of the page to some
other encoding (for example ISO-2022-JP).

Yamane-san asks to choose the solution (c).  This is because
ISO-2022-JP is the best encoding for Japanese web page because
of the least possibility to Mojibake even when web browsers cannot
understand the 6th line.  I agree that (c) is the best solution
but I don't think (a) and (b) are unacceptable at all.

I think EUC-JP will be acceptable (solution (b)), because recent
web browsers are likely to understand the 6th line.  (However,
UTF-8 (solution (a)) should be avoided if possible, because some
browsers such as w3m (popular in Japan) cannot handle UTF-8.)

In short, my opinion is:
(c) is the best solution.
(b) has no problem, too.
(a) should be avoided if possible.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/



Bug#207455: acknowledged by developer

2003-09-04 Thread Tomohiro KUBOTA
Hi,

From: Colin Watson <[EMAIL PROTECTED]>
Subject: Re: Bug#207455: acknowledged by developer
Date: Fri, 5 Sep 2003 01:20:12 +0100

> On Fri, Sep 05, 2003 at 08:32:59AM +0900, Tomohiro KUBOTA wrote:
> > Note that I think UTF-8 environment will not be popular until several
> > basic features (like manpages) will be UTF-8-ready.
> 
> What's wrong with UTF-8 man pages? You can't *write* them in UTF-8,
> true, but they should be perfectly readable in such locales.

I cannot read Japanese manpage in ja_JP.UTF-8 locale.
It is because groff cannot know what encoding the manpage source
is written in.  For example, Japanese manpage source is written in
EUC-JP, while groff try to interpret it as UTF-8 in UTF-8 locales.

Previously groff has a special workaround for UTF-8 only for
ISO-8859-1 environment.  I.e., with "UTF-8 device", groff interprets
the input as ISO-8859-1 and outputs as UTF-8.

Like this, there are points where ISO-8859-1/15 people may misunderstand
that UTF-8 support is more mature than the real situation.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/





Bug#207455: acknowledged by developer

2003-09-04 Thread Tomohiro KUBOTA
Hi,

From: Frank Lichtenheld <[EMAIL PROTECTED]>
Subject: Bug#207455: acknowledged by developer (Re: Bug#207455: 
packages.debian.org: HTML-encodes multi-byte characters as single bytes)
Date: Thu, 4 Sep 2003 18:46:36 +0200

> 1. Take them literaly and specify a charset=ascii for the page
> 2. Dito, but charset=iso-8859-1
> 3. Dito, but charset=utf-8
> 4. Use one of the three charsets but make a list of broken descriptions
> that have to be converted
> 
> Currently we do (2) but I would prefer to go to (3). As long as policy
> doesn't mandate one encoding for the description it's our decision
> anyway and I would prefer to give everyone the same chance to break
> something ;)

Yes, ISO-8859-1 is a *local* character encoding which is useful only
for a part of European-language speaking people.

Currently, ASCII is the only character range which is common in the
world.  Though migration into UTF-8 is welcome, please note that
U+0020 - U+007E will continue the only common character range for
a while.

If the policy will mandate usage of UTF-8, then the policy will have
to note that the contents must be comprehensible even when being read
in ASCII environment, i.e., even when non-ASCII characters are removed.

Indeed, in multibyte locales which are popular in east Asia, an 8bit
character (for example ISO-8859-1) will break not only the character
itself but also the next character.

Even though we have to be careful to use UTF-8, it is much better than
the current situation that Debian is biased to a part of a world (i.e.,
ISO-8859-1 usage).

Note that I think UTF-8 environment will not be popular until several
basic features (like manpages) will be UTF-8-ready.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/





Re: Package description page is not compliant to multibyte characters

2003-08-01 Thread Tomohiro KUBOTA
Hello,

From: Michael Bramer <[EMAIL PROTECTED]>
Subject: Re: Package description page is not compliant to multibyte characters
Date: Fri, 1 Aug 2003 10:25:27 +0200

> > > I heard a report that Japanese translation of package description
> > > pages (by Debian Description Translation Project) is broken.
> > > For example,
> > > 
> > > http://ddtp.debian.org/packages.debian.org/stable/admin/apmd.ja.html

> Now I fixed this on http://ddtp.debian.org/packages.debian.org/

Thank you for your effort, but I am afraid that another (bigger) problem
occurs.  Japanese pages seem to lack long-description.

The Japanese page writes (please try the above apmd.ja.html):

Package:  (in )
 (in )
Other packages related to  (in )

I think it is clear even for non-Japanese-speakers that Japanese page
lacks long description which should appear between 
and "Other packages related to".

I checked apmd.{de,en,es,it,fr,pt_BR,ru} and found that Russian page
also lacks long-description.  In short, non-Latin-character languages
seem to suffer this problem.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




non-ASCII in webwml/english/devel/website/tc.data

2003-06-15 Thread Tomohiro KUBOTA
Hi,

I found a non-ASCII character (\xc7) in webwml/english/devel/website/tc.data .
Since the data are used for all translated pages in various character
encodings, usage of non-ASCII character causes broken pages and should
be avoided.  (Especially, in multibyte encodings, illegal 8bit byte
may break neighbor characters.)

Can I assume the character \xc7 is from ISO-8859-9 (because it is
Turkish), i.e., U+00C7 "LATIN CAPITAL LETTER C WITH CEDILLA" ?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: wml build errors in japanese/ and spanish/

2003-06-12 Thread Tomohiro KUBOTA
Hi,

From: Frank Lichtenheld <[EMAIL PROTECTED]>
Subject: Re: wml build errors in japanese/ and spanish/
Date: Thu, 12 Jun 2003 10:34:53 +0200

> While we're at it, I think I identified and solved another build problem:
> in japanese/devel/wnpp/wnpp.wml you have to replace all occurences of
> en.html in the shebang line with ja.html . Otherwise the translated
> pages get not installed.

Thank you for information.  I fixed this, but the fix caused
a much serious problem.

For example,

   http://www.debian.org/devel/wnpp/rfa_bypackage.ja.html

is written in EUC-JP encoding but the header says the page is 
ISO-2022-JP encoding.  This causes the page entirely unreadable.
(Just a row of random letters)

Though I am not familiar with the build process of Debian web
pages, I imagine Makefile has some problem and WMEPILOG is
ignored.  (WMEPILOG for Japanese converts pages from EUC-JP
to ISO-2022-JP).  I have no idea how to fix this.

I'd like to revert my modification if there are no solution.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: wml build errors in japanese/ and spanish/

2003-06-11 Thread Tomohiro KUBOTA
Hi,

From: Frank Lichtenheld <[EMAIL PROTECTED]>
Subject: wml build errors in japanese/ and spanish/
Date: Wed, 11 Jun 2003 23:40:25 +0200

> japanese/News/weekly/2003/21 in lines 57,58 something seems
> wrong. There is some English text and a unclosed  just fix such errors but the Japanese text is so cryptic for me :)

Thank you.  I fixed the problem.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Package description page is not compliant to multibyte characters

2003-06-08 Thread Tomohiro KUBOTA
Hi,

I heard a report that Japanese translation of package description
pages (by Debian Description Translation Project) is broken.
For example,

http://ddtp.debian.org/packages.debian.org/stable/admin/apmd.ja.html

(It might be difficult to understand the page is broken if you
cannot read Japanese.)

Analysis:

This page seems to be generated by a script
gluck.debian.org:/org/packages.debian.org/htmlscripts/pages.pl
and the first character of the long description of a package
is written in larger font:

$long_desc =~ /^([^&]|&[^;]+;)/;
$first = $1;
$rest = substr($long_desc,length($first));
$package_page .= "$first$rest\n";

However, in multibyte encodings such as EUC-JP (Japanese),
a character may be consist of multiple bytes.  On the other
hand the expression [^&] matches one *byte* rather than
one *character*.  Thus, when the first character of the
long description is a multibyte character, $first will be
the first byte of the multibyte character, not entire the
multibyte character.

Solution:

Right way is to make the script multibyte-compliant.  It may
be difficult to support arbitrary encodings.  However, it may
be easy to support a limited range of multibyte encodings
which are possible candidates for Debian web pages (such as
"EUC-JP, EUC-KR, GB2312, Big5, Big5HKSCS, and UTF-8").

I heard that there is an another solution like following:

  
  <!--
  p.description {text-align: justify;}
  p.description:first-letter {font: 150%;}
  -->
  

and

  This is a long package discription.

Though this solution is environment-dependent, at least this
way never make the content unreadable.


However, an another solution is to give up the decorating
by using larger font for the first character.  I think this
might be a good solution because "using larger font for
the first character" cannot be truely universal.  Imagine
Arabic characters.  How can the first character of an Arabic
word be a larger?  Though we don't have Arabic translation
yet, we may have in future.

Thus, my suggestion is to give up the decoration.  However,
I will appreciate any other solutions which will stop breaking
the contents.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-14 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Craig Small)
Subject: Re: enable searching East Asian words at search.debian.org
Date: Wed, 14 May 2003 16:19:29 +1000

> Yes, you are right I am confusing the two issues.  I can easily, well
> almost easily, make a special search.d.o set of binaries and they can
> have little or no bearing on the packages...

This must be easy, because you are willing to force all Japanese
mnogosearch users to do this and I will agree on it.


> It gets us back to the original problem that what is the license for
> ipadic?  And libchasen is broken.

How this should be fixed, Nokubi-san?  Does Kakashi or MeCab have an
emulating layer (API) for Chasen?  Or, any alternatives already
available for ipadic?  mnogosearch seems to use chasen_sparse_tostr()
and chasen_getopt_argv().


> Why is it broken?
> It won't work without some files that are not part of the package.
> These files are nowhere to be seen and there is no documentation on
> how these files are supposed to come about nor what format they are in.

I don't understand why you say so strongly.  Yes, it is a bug.  However,
did you document that Debian mnogosearch package is compiled with
eliminating east Asian support?  This is just as severe as that.

Anyway, Nokubi-san is a maintainer of chasen packages and I hope he
will fix this soon.


> That can all be solved I'm sure, but its no use asking admins to put
> libchasen on until this is fixed or a work-around is found.

A work-around.  "apt-get install libchasen-dev ipadic" instead
of "apt-get install libchasen-dev".

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-13 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Craig Small)
Subject: Re: enable searching East Asian words at search.debian.org
Date: Wed, 14 May 2003 11:36:28 +1000

> Great, ipadic is 3meg and consumes 12meg.  I cannot expect people to
> download that with the default packages.
> 
> So I'll release mnogo 3.2.10 with no chasen. It's broken anyway because
> it needs some rc files and other things.
>
> Now for the webiste, I'll get the other charsets going and we'll work
> on the JP problem separately.

Sorry, I don't understand the meaning or feeling of "Great" here.
Can you explain?

You are confusing two different aspects: one is providing Debian
mnogosearch packages and another is how search.debian.org is
constructed.

I agree that Japanese people cannot use Debian mnogosearch package
but we are forced to recompile it, in order to save megs of disk
space from people who don't need Japanese.  (Please write an
instruction on recompilation at README.Debian).

However, search.debian.org is a different topic.  Since Japanese
is one of several languages for which number of translated pages in
http://www.debian.org/ is more than 50%, it is nonsence to exclude
these pages from the target of search.

I don't understand at all why some of Debian (and other free-software-
related) people tend to exclude Japanese and other Asian languages
from range of support  Even people who are interested in
i18n and translation sometimes tend to do!

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-13 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Denis Barbier)
Subject: Re: enable searching East Asian words at search.debian.org
Date: Tue, 13 May 2003 14:12:00 +0200

> In this bugreport you tell that lynx-cur is right, but I have similar
> results with lynx-cur 2.8.5-10.

I tested lynx and lynx-cur and found that both of them are problematic.

I tested lynx and lynx-cur on mlterm and xterm in UTF-8 mode and
ja_JP.UTF-8 locale.  I searched a Russian word for "News".  Then,
though the search seems to work well, all Cyrillic characters are
displayed in Latin alphabet transliteration.  I imagine they are
not sensible of locale.

Please test w3mmee.  It should work well.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-13 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED]
Subject: Re: enable searching East Asian words at search.debian.org
Date: Tue, 13 May 2003 15:32:49 +0900

> Now I'm trying to make another DFSG-free dictionary for ChaSen. If I
> can do it, I'll move ipadic package to non-free and ITP the new one.

Any perspectives?


> The another solution is to use libkakasi instead libchasen. It is
> completely free.

At first, I imagine chasen is much better than kakasi because chasen
analyzes the grammer of Japanese sentences while kakasi doesn't.
Which is better, chasen without dictionary (chasen itself is free)
or kakasi?  Or, chasen *needs* dictionary (though libchasen0 doesn't
Depends: on ipadic)?

Second, can we use kakasi for mnogosearch?

If we don't have solution, how about writing "please use google for
searching CJK words in Debian site" at http://search.debian.org/ 
and admit that free softwares are not yet something which can
substitute proprietary softwares?


At last, which solution do you suggest?  Should we wait for "free"
alternative for ipadic?  Or, ipadic should be regarded free?  Or,
can we use kakasi?  Or, should we recognize there are no free
implementation for web search which supports languages including
Japanese?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-13 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Craig Small)
Subject: Re: enable searching East Asian words at search.debian.org
Date: Tue, 13 May 2003 15:24:15 +1000

> I'm working on 3.2.10 that will have the charset support.  Do I also
> need to include chasen? Without chasen, mnogosearch will not understand
> a Japanese "word"?

You are right.  Also, --with-extra-charsets=all needed (if 3.2.10's
default setting eliminates mapping tables for CJK just like 3.2.8).


> I'll get something uploaded soon, can you test it for me on a simple
> set of pages (use the builtin if you like for no db) to see it does
> work for your pages.  If so I'll get it compiled on klecker.

Yes, I will, of course.  You mean, you will compile 3.2.10 and set-up
a test search page, then I will test searching CJK words?

However, I tested builtin database but it didn't work well.  I didn't
research further on builtin because builtin won't be used in the real
search page.  Thus, if you'd like to test builtin, please test that
your new compilation works well for English words.  Then I will test
for various languages including Chinese, Japanese, and Korean.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-12 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Denis Barbier)
Subject: Re: enable searching East Asian words at search.debian.org
Date: Mon, 12 May 2003 19:09:54 +0200

> If now I run
>   $ export LANG=fr_FR.UTF-8
>   $ xterm
> go to search.debian.org in this window and cut'n'paste this word from
> another window, I am redirected to
>   http://search.debian.org/?q=%C3%83%C2%A9lection&ps=10&o=0&m=all&g=fr
> which means that e-acute has been converted twice, and no pages are
> found.  Am I doing something wrong?

There might be two problems.  One is whether cut'n'paste works well
or not, and another is whether the browser can handle encoding conversion
correctly.

I tested with galeon and w3mmee.  (w3m doesn't support UTF-8.)
Also, Intenet Explorer on Windows works well.

I'd like to test your operation.  Which browser did you use?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-12 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Denis Barbier)
Subject: Re: enable searching East Asian words at search.debian.org
Date: Mon, 12 May 2003 23:55:03 +0200

> When done, mnogosearch from unstable has to be recompiled, I can volunteer
> to provide a backport if that helps.

I think there are no problem on this, because Craig has already compiled
version 3.2.7 at his home directory to test search.debian.org/new , though
his compilation is "by eliminating east Asian character mapping tables
and without chasen support".

The version 3.2.7 was the latest version at that time (December 2002).

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-12 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Denis Barbier)
Subject: Re: enable searching East Asian words at search.debian.org
Date: Mon, 12 May 2003 13:45:08 +0200

> > For example, I can search an Russian word "Novosti" (of course in
> > Cyrillic)
> 
> The point is: how are Cyrillic words passed by the web browser to the
> search engine?
> Are they encoded in ISO-8859-5, KOI8-R or UTF-8 charsets?

UTF-8, i.e., the same encoding as the search page.  For example,
the previous example:

http://search.debian.org/?q=%D0%9D%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8&ps=10&o=0&m=all&g=

The first 6 bytes read:

%D0%9D -> U+041D (CYRILLIC CAPITAL LETTER EN)
%D0%BE -> U+043E (CYRILLIC SMALL LETTER O)
%D0%B2 -> U+0432 (CYRILLIC SMALL LETTER VE)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-12 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Denis Barbier)
Subject: Re: enable searching East Asian words at search.debian.org
Date: Mon, 12 May 2003 09:54:58 +0200

> My understanding of Josip mail is that when investigating your
> instructions about mnogosearch, he wondered how input text has
> to be encoded when filling search form.  This is a good question,
> search page should tell which encoding to use when searching for
> non-English words.

Yes, I know.  The solution is to write the search page in UTF-8,
which has been available since last December when Craig and I
discussed about this problem.

For example, I can search an Russian word "Novosti" (of course in
Cyrillic) (which means "News") at http://search.debian.org/ English
page like:

http://search.debian.org/?q=%D0%9D%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8&ps=10&o=0&m=all&g=

and the page shows 112 results.

Also, I can input Japanese words.  However, there will be no results
for Japanese words because of problems I wrote.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-11 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: enable searching East Asian words at search.debian.org
Date: Sun, 11 May 2003 19:44:17 +0200

> On Sun, May 11, 2003 at 11:09:44PM +0900, Tomohiro KUBOTA wrote:
> > > c=`grep CHARSET ../.wmlrc | cut -d= -f2`; \
> > >   iconv -f $c -t UTF-8 search.ja.html | perl -pe 's,^(\s* > > http-equiv="Content-Type" content="text/html; 
> > > charset=)\S+(">)$,$1UTF-8$2,' > search.ja.html
> > > iconv: cannot open input file `euc-jp': No such file or directory
> > 
> > Sorry I don't understand what you are doing.  However, my "improvement"
> > is not related to search.ja.html (or translation of search page) at all.
> 
> Well, it's related if you want people to be able to actually input stuff
> properly into the search engine. :)

OK, I remembered.  The search web page must be UTF-8.

The current (English) version of the search page is already UTF-8 and
have no problem for international search, I think.

However, if you would like to supply translated search pages (though
I think it is not an urgent problem), I just read the
webwml/english/searchtmpl/Makefile and found that
`grep CHARSET ../.wmlrc` might have a problem.  webwml/japanese/.wmlrc
have two lines which matches 'grep CHARSET', which are
'-D CHARSET=iso-2022-jp' and '-D CHARSET_WML=euc-jp'.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-11 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: enable searching East Asian words at search.debian.org
Date: Sun, 11 May 2003 14:33:38 +0200

> make: Entering directory `/org/www.debian.org/webwml/japanese/searchtmpl'
> wml -q -D CUR_YEAR=2003 -o UNDEFuJA:[EMAIL PROTECTED] --prolog="/usr/bin/kcc 
> -e -" --epilog="../convert search.ja.html" search.wml
> c=`grep CHARSET ../.wmlrc | cut -d= -f2`; \
>   iconv -f $c -t UTF-8 search.ja.html | perl -pe 's,^(\s* http-equiv="Content-Type" content="text/html; charset=)\S+(">)$,$1UTF-8$2,' > 
> search.ja.html
> iconv: cannot open input file `euc-jp': No such file or directory
> copying search.ja.html to ../../../www/searchtmpl
> make: Leaving directory `/org/www.debian.org/webwml/japanese/searchtmpl'

Sorry I don't understand what you are doing.  However, my "improvement"
is not related to search.ja.html (or translation of search page) at all.

My intension is to enable searching, for example, "Bunsho" (in Kanji),
which means "documentation" in Japanese, at the search page.  It should
be enabled, because there are many Japanese-translated pages at Debian
site and these pages should be targets of searching.  Not translation
of the search page.  (I guess you are trying to prepare Japanese
translation of search page?  I will research this point later.  However,
please note, for Japanese people, that a search page in English which
can search Japanese words is absolutely better than a search page in
Japanese which cannot search Japanese words.)

The problems are:

(1) Though mnogosearch is based on UTF-8 (and should be able to process
all languages for translation of Debian web pages), the support of CJK
languages are disabled.  (Please read the ./configure --help output or
installation instruction of mnogosearch).  The option is just to drop
character code mapping tables between CJK encodings and UTF-8.  This is
why recompilation of mnogosearch is needed.

(2) Japanese and Chinese don't use whitespaces between "words", which
causes indexing (i.e., reading all web pages and store all "words" into
databaase for searching) doesn't work well.  chasen-related packages
are needed to fix this.  (I hope you read my mails which I wrote that
chasen is needed -- please just go back this thread.)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: enable searching East Asian words at search.debian.org

2003-05-05 Thread Tomohiro KUBOTA
Hi,

No reply for more than one week.  Someone please reply.
There are Chinese, Japanese, and Korean translation of www.debian.org
but search.debian.org cannot search words in these languages.

Please do the following:

1. Install libchasen-dev, libchasen0, and ipadic packages to klecker.
2. Add me ([EMAIL PROTECTED]) as a user of postgresql database at klecker.
2. Create a postgresql database for which I have write permission at klecker.

Then I can prove the improvement (or bugfix, I regard) in the last mail
which I cite the whole contents of.


From: Tomohiro KUBOTA <[EMAIL PROTECTED]>
Subject: enable searching East Asian words at search.debian.org
Date: Sat, 26 Apr 2003 09:45:48 +0900 (JST)

> Hi,
> 
> So far search.debian.org doesn't support East Asian languages
> (Chinese, Japanese, and Korean).  I.e., it cannot search Chinese,
> Japanese, nor Korean words.
> 
> I have recently researched this problem and I think I found
> how to fix it.  I tested at my personal machine without 24hr
> internet connection and it works almost fine.
> 
>  1. install libchasen-dev, libchasen0, and ipadic packages.
>  2. recompile mnogosearch (version 3.2.8 or later) with
> --enable-chasen --with-extra-charsets=all option for ./configure .
>  3. invoke "indexer -C" and then "indexer" to rebuild the search database.
> 
> Could someone do this?  Or, can I have a database (postgresql) access
> (write access) permission at klecker to prove this?
> 
> 
> Explanation:
> 
> Chasen packages are needed to extract words from Japanese texts.
> Japanese texts don't use whitespaces between words.  --enable-chasen
> (since version 3.2.8) option for mnogosearch enables usage of chasen
> from mnogosearch.
> 
> Though mnogosearch is Unicode-based software and potentially supports
> East Asian languages, support of these languages is disabled by default.
> To enable this, --with-extra-charsets=all is needed.
> 
> Since the current search database in search.debian.org doesn't have
> any east Asian words, it is needed to rebuild the whole database.
> (Of course it is enough to rebuild database only for *.{ja,ko,zh-cn,
> zh-hk,zh-tw}.html pages but I don't know if it is possible to this.)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




enable searching East Asian words at search.debian.org

2003-04-25 Thread Tomohiro KUBOTA
Hi,

So far search.debian.org doesn't support East Asian languages
(Chinese, Japanese, and Korean).  I.e., it cannot search Chinese,
Japanese, nor Korean words.

I have recently researched this problem and I think I found
how to fix it.  I tested at my personal machine without 24hr
internet connection and it works almost fine.

 1. install libchasen-dev, libchasen0, and ipadic packages.
 2. recompile mnogosearch (version 3.2.8 or later) with
--enable-chasen --with-extra-charsets=all option for ./configure .
 3. invoke "indexer -C" and then "indexer" to rebuild the search database.

Could someone do this?  Or, can I have a database (postgresql) access
(write access) permission at klecker to prove this?


Explanation:

Chasen packages are needed to extract words from Japanese texts.
Japanese texts don't use whitespaces between words.  --enable-chasen
(since version 3.2.8) option for mnogosearch enables usage of chasen
from mnogosearch.

Though mnogosearch is Unicode-based software and potentially supports
East Asian languages, support of these languages is disabled by default.
To enable this, --with-extra-charsets=all is needed.

Since the current search database in search.debian.org doesn't have
any east Asian words, it is needed to rebuild the whole database.
(Of course it is enough to rebuild database only for *.{ja,ko,zh-cn,
zh-hk,zh-tw}.html pages but I don't know if it is possible to this.)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: What means english?

2003-04-20 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: What means english?
Date: Sat, 19 Apr 2003 02:59:35 +0200

> > Though I don't think it is a good idea to mix American and British
> > English, I don't think we *must* avoid to do that.  The requirement
> > should be minimal.
> 
> Sorry, what did you mean by this? It's not overly clear :)

I wanted to say, we don't need to inhibit mixing American and British
English.

My understanding is that: the term

> Only a small question: On devel/website/working exists the
> following section "Use clear and simple English".

means that we should use clear and simple English so that non-English
speakers can understand pages, or we should avoid ambiguous or misleading
expressions.

How about changing the expression of the term, like following:
"Use clear and simple expressions to avoid ambiguity or misunderstanding." ?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




discriminatory expression in DWN?

2003-04-18 Thread Tomohiro KUBOTA
Hi,

I am one of Japanese translators of Debian web pages.
When I was checking translation of 
webwml/english/News/weekly/2003/14/index.wml , I found
the following sentence:

> Debian Universe".  He admits that the current Debian installer
> is ugly but also notes that some people believe that a not so easy installer
   ~~~
> will keep horde of unwashed masses away from Debian who aren't worthy of such
  ~
> a fine OS!  In the article Jonathan describes in detail how the installer
  ~~

I think the sentence (underlined part) means that there are people who
are so foolish that they should not use Debian, and a difficult installer
is a good thing because it can prevent such foolish people to use Debian.
Is this interpretation correct?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: What means english?

2003-04-18 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: What means english?
Date: Fri, 18 Apr 2003 17:47:48 +0200

> No, I don't think we should take sides in that. We haven't had problems with
> the current mix we use, and we should keep it that way.

Though I don't think it is a good idea to mix American and British
English, I don't think we *must* avoid to do that.  The requirement
should be minimal.

On the other hand, it should be a minimal requirement, for example,
not to use expressions, proverbs, and idioms for which people (especially
non-English-speaking people) may feel difficulty to read or even
consult dictionaries.  I.e., "every sane people can understand the
meaning of the page" be a minimal requirement.  I imagine this requirement
sometimes conflicts with usage of literary or wit expressions.

As one of Japanese translators of Debian webpages, I often feel such
difficulty.  One example:
http://lists.debian.org/debian-www/2003/debian-www-200301/msg00256.html

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




The maintainer still isn't listening?

2003-02-17 Thread Tomohiro KUBOTA
Hi,

I am checking Japanese translation of Debian Weekly News
in order to cvs commit it.  However, Japanese translation
team found a difficulty translating an English sentence.

webwml/english/News/weekly/2003/06/index.wml says in the
Qt3 paragraph:

  Several issues haven't been dealt with and the maintainer
  still isn't listening.

None of members of Japanese team don't understand what the
"listening" means.  Is it a metaphor or an idiom?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: lists.debian.org de-localization

2003-02-12 Thread Tomohiro KUBOTA
Hi,

(Remember, the topic is that http://lists.debian.org pages sometimes
use 8bit characters which may break all contents after the character
when east Asian users browse the pages.)

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: lists.debian.org de-localization
Date: Sun, 12 Jan 2003 04:14:45 +0100

> On Sun, Jan 12, 2003 at 10:38:52AM +0900, Tomohiro KUBOTA wrote:
> > However, I don't think this can be a solution now because it will take a
> > very long time that the version will be stable, then the stable version
> > will be adopted into unstable/testing version of Debian distribution, then
> > the distribution will become stable (released), and then the stable
> > distribution will be adopted to master.debian.org .
> 
> Actually, we use a non-.deb mhonarc on lists.d.o so this isn't a problem
> per se.

A new version of MHonArc (2.6.0) was released recently which I think
can solve all encoding-related problem by converting everything into
UTF-8.


> This, on the other hand, is a hassle to handle (backporting or installation
> into subdirs). master.d.o is scheduled to be upgraded to woody after samosa.
> That's all I know. 

Any new information?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: Translation of Debian Hompage to Arabic

2003-02-06 Thread Tomohiro KUBOTA
Hi,

From: Ayman Negm <[EMAIL PROTECTED]>
Subject: Re: Translation of Debian Hompage to Arabic
Date: Thu, 6 Feb 2003 22:26:43 +0100

> I did as it descriped in the README but nothing changed :-(

I am afraid that the Pics script (and Gimp 1.2) doesn't support
multibyte encodings such as UTF-8.  This is why east Asian Pics
are made by hand.  (Since east Asian languages need several
thousands of characters, they must be multibyte.)  Well, I am a
Japanese speaker and I made the Japanese Pics by using Gimp-1.3
with a semi-automatic Script-fu script.

Can you try ISO-8859-6, a singlebyte Arab encoding?  However,
I don't think it supports word-top, word-bottom, word-intermediate,
and independent forms of glyphs.  Otherwise I think you have to
make Pics manually.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: "family name, personal name" in devel/people

2003-01-31 Thread Tomohiro KUBOTA
Hi,

From: Osamu Aoki <[EMAIL PROTECTED]>
Subject: Re: "family name, personal name" in devel/people
Date: Fri, 31 Jan 2003 13:16:23 -0800

> If Oohara Yuuma said "In Japan a family name comes before a personal
> name in _Japanese_ _language_ context.  I want to retain the same order
> in English context too.", it was not stretching the fact.  I am all for
> respecting his defiance wish to the Japanese translation convention.  

Is "Given Family" order related to English (or Western) language?
I don't know.  However, I can say that the order is related to English
(or Western) people.  I.e., Western people have names with "Given
Family" order.  If they visit some other country, they don't change
their name.  Vice versa.


> Please remember that it is a well established convention for Japanese
> name to be flipped in English or French context upon translation.  That
> was my point.  I guess if translated into Hungarian or Chinese, we
> should use surname first as the translation convention.  

I know that convention.  However, the convention is related to Japanese
people's custom, not western languages.  It is proved by your example
of Hungarian or Chinese.  Thus, it is Japanese people (not English
speakers) who have right to determine how to write Japanese people's
name in English.


> Boy, it is a hot topic.  In my daily activities, defending my real first
> name against anglocized name is a real challenge in a hostile
> environment.  I already gave up on my real fist name at the restaurant
> or bar.  :-)

It is because the restaurant and the bar are localted at a specific
country.  It is natural that the restaurant follows the convention
in the country.  I cannot imagine a restaurant which is located at
all countries or "international restaurant".  However, our project
is international.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: future of developers list

2003-01-31 Thread Tomohiro KUBOTA
Hi,

From: Andrew Shugg <[EMAIL PROTECTED]>
Subject: Re: future of developers list
Date: Fri, 31 Jan 2003 21:10:38 +0800

>  - one field for the name to be displayed in the Western convention
>  - one field for the name to be displayed in the local convention (and
>optionally a different character set

How about the current way ("Family, Given") instead of "Western convention"?
I think it is a good compromise, because the comma implies that the whole
expression itself ("Family, Given") is not the complete native name, and
can be free from the name order flamewar.

IMO, the reason why we need "Western" field is that we cannot read all
characters in the world while all of us are expected to be able to read
ASCII alphabets.  If my name would be shown only in native Kanji, many
people in the world could not even write my name.  Also, I cannot read
nor write Arabic/Hebrew/Thai/Armenian characters.

The second field (local convention) can also include Western, like other
local conventions, of course.  However, if the "different character set"
is not achieved, I don't think this is worthwhile to be implemented using
some labor.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: "family name, personal name" in devel/people

2003-01-31 Thread Tomohiro KUBOTA
Hi,

From: Osamu Aoki <[EMAIL PROTECTED]>
Subject: Re: "family name, personal name" in devel/people
Date: Fri, 31 Jan 2003 00:45:16 -0800

> > My family name is Oohara and my personal name is Yuuma.  I am _not_
> > Yuuma Oohara -- in Japan a family name comes before a personal name.
> 
> I know some minority of Japanese people wish to be called the same order
> in English/French context as ithey do in Japanese context.  That is fine
> as preference but you are stretching too far and twisting the fact.

Though I don't adopt this method myself, I understand what they insist.
For example, when I search by using Google the current Korean president:

   "Dae Jung Kim"   2130 hits
   "Kim Dae Jung"  98000 hits

You know, "Kim" is family name and "Dae Jung" is given name and the
native order of Korean is "Family Given".

This means that Korean people preserve their name order even in alphabet
transliteration.  Then, why not Japanese?  I don't want to think that
you think "Western culture is international and eastern culture is local,
so eastern people must act in western way in international projects
such as Debian".

I know this result shows only one aspect of this problem.  This is
just an example of what they say.

My own opinion is that I don't want to discuss the real contents
of this problem.  I just want to ask respecting way of thinking
of each developer.


> All you had to say was "I prefer to be called Oohara Yuuma"
> 
> I am sorry but I have to remind you that you are very likely to be
> called officially "Yuuma Oohara" in the letters your government issues
> in English or French.  That may be where you have to fight :-) I do not
> understand why you are so picky on this issue here ?

England and France are only two countries in the world.  They don't
have rights to determine international way to do something.  If we
were developing English or French localized distribution, you would
be right.


> > The NM application form insisted on the personal-name-family-name
> > order, so it may be the cause of the bug.
> 
> Form is form.  Just follow the instruction.  

If this is a bug, or at least may cause a confusion, let us think
about improvement.

For example, adding a note "Please input your given name and family
name in the form regardless of your native order of name."

If we don't have to know which part is given name and which is
family name, it is a good idea to have only one form to input the
whole name.  In this case, the order is completely free.









> 
> You will be surprised to find many US official forms use surname first,
> given name second, and middle name initial last format.  As long as each
> entry is clearly marked, I see no issues.  
> 
> I see no major threat of cultural imperialism here either.  So relax.
> Japan exports enough pop-culture trashes these days.  Video games,
> anime, to name a few ...  The name order will not trash Japanese
> culture. 



> 
> > I don't see any point in splitting a developer's name into the family
> > name and the personal name, but it is another issue.
> 
> I guess many others see differently.
> 
> -- 
> ~\^o^/~~~ ~\^.^/~~~ ~\^*^/~~~ ~\^_^/~~~ ~\^+^/~~~ ~\^:^/~~~ ~\^v^/~~~ +
> Osamu Aoki <[EMAIL PROTECTED]>   Cupertino CA USA, GPG-key: A8061F32
>  .''`.  Debian Reference: post-installation user's guide for non-developers
>  : :' : http://qref.sf.net and http://people.debian.org/~osamu
>  `. `'  "Our Priorities are Our Users and Free Software" --- Social Contract
> 
> 
> -- 
> To UNSUBSCRIBE, email to [EMAIL PROTECTED]
> with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
> 




Is this character belongs to ISO-8859-2?

2003-01-30 Thread Tomohiro KUBOTA
Hi,

I found webwml/english/devel/tc.data has a 8bit character,
because I found a broken character in Japanese translation page.

Thus I modified the character in tc.data into a SGML entity.
In this case, I assumed the 8bit character is ISO-8859-2 and
the character is lowercase "c" with acute accent, while the
character is "ae" in ISO-8859-1.

I have two reasons:
 - the CVS commit log says "Zoran Obradovic" and the problematic
   character corresponds to the last "c".
 - I imagine Slovene use ISO-8859-2.

Am I right?  I cannot tell which is a natural human name, "Obradovi(c')"
or "Obradovi(ae)".

Zocky, please check
http://www.debian.org/devel/website/translation_coordinators.html
and translated pages for the page.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: future of developers list

2003-01-30 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: future of developers list
Date: Thu, 30 Jan 2003 16:15:37 +0100

> Another added benefit would be that we could then extend our LDAP database
> schemas to include another field for people's names in native character set,
> which would nicely fix the problem with Japanese names -- the non-Japanese
> web pages would print the names of these developers in the western way, and
> the Japanese web pages would print them in the Japanese way.

I think another field for native character is a good idea.  However,
however, why not UTF-8 rather than native character set?  (You mean,
native expression and you didn't want to talk about character encoding?)

I don't understand why "another field for native character (set)" will
fix the name order problem.  Though all Japanese people would agree on
"surname givenname" order in native character, they have argument on
the order in alphabet transliteration expression.  I think the current
devel/people has the best format -- "surname, givenname" for alphabet
transliteration expression is what all Japanese people agree on.
I hope other peoples will also agree on the format.

If we would have another field for native character, I think it is nice
for devel/people to have both of alphabet expression and native expression
like devel/website/translation_coordinators .  I think all translated
pages (devel/people.*.html) should have the same expression.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: "family name, personal name" in devel/people

2003-01-30 Thread Tomohiro KUBOTA
Hi,

From: Michael Stone <[EMAIL PROTECTED]>
Subject: Re: "family name, personal name" in devel/people
Date: Thu, 30 Jan 2003 07:32:49 -0500

> Well, this is simply an example of how you can't please all of the
> people all of the time. You brought these examples up because you didn't
> like how those developer's names appeared. You don't want them to
> conform to the way the majority of developers do things (fair enough),
> you don't want them to conform to the (admitedly ugly) uppercase
> convention (fair enough), but you still want the parser to somehow
> figure out the names (not fair) and complain about some sort of cultural
> imperialism while you're at it (unjustified, and not the first time
> you've done it.)

The parser already has many exception handlers.  I just wanted it
to have two additional handlers.  Since I recently read the parser
to solve other problem, I knew it was easy to add two handlers.

I don't think we can build a perfect parser.  If you think you can,
it is based on the "cultural imperialism".  However, it is a good
idea to design a parser which can reduce the number of exceptions
because it reduces the load of maintainance, which is why the Matt's
improvement is worthwhile.  Anyway, I'd like to praise Josip who
maintains such a script with a cluster of exception handers.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: "family name, personal name" in devel/people

2003-01-29 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: "family name, personal name" in devel/people
Date: Wed, 29 Jan 2003 10:33:58 +0100

> % grep-available -F Maintainer Shuzo -s Maintainer | sort -u
> Maintainer: HATTA Shuzo <[EMAIL PROTECTED]>
> Maintainer: Hatta Shuzo <[EMAIL PROTECTED]>
> 
> % grep-available -F Maintainer Yuuma -s Maintainer | sort -u
> Maintainer: Oohara Yuuma <[EMAIL PROTECTED]>
> 
> Perhaps one of you could politely inform these two developers that they
> might get the westerners to read their name right if they changed the
> ordering? :)

Well, I don't want to do this.  I want nobody to do this.
It is not a very good idea that non-westerners have to follow the
customs of westerners but westerners don't need to follow that of
non-westerners.  Non-westerners already suffer from paying cost to
learn many customs of westerners when we want to do something in
international societies, and I want to reduce the load if possible.

I think I can ask them to write family name in uppercase, it is the
maximum which I can ask them.  I don't know they will accept even
this idea.  Please note that this *is* what I recently mentioned as
a "10-year flamewar" and I *never* want to join it, and even asking
writing familyname in uppercase might arouse the flamewar.  (If I
would ask to change name order, I would certainly stimulate the
core part of flamewar and Japanese members of Debian might drop
their activity as developers.)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: "family name, personal name" in devel/people

2003-01-28 Thread Tomohiro KUBOTA
Hi,

From: Osamu Aoki <[EMAIL PROTECTED]>
Subject: Re: "family name, personal name" in devel/people
Date: Tue, 28 Jan 2003 19:48:35 -0800

> > > "Shuzo, Hatta", where "Hatta" is surname and "Shuzo" is given name.
> > > "Yuuma, Oohara", where "Oohara" is surname and "Yuuma" is given name.
> 
> I am wondering why these strange entry exist.  Are they mistake of
> original data entry?  I never see Japanese name spelled that way.

It is because the web page is automatically generated by a script
from developers database on db.debian.org .  The script assumes
the first part of name is given name and the last part is family
name.  Since there are many names which don't follow the assumption,
the script has many exception handlers.  Josip added two exception
handlers and the page will be fixed in the next build.

You know, some Japanese people write names in their native order,
"Family Given", and such expressions exist in db.debian.org database.

... but I checked the script
(klecker:/org/www.debian.org/cron/people_scripts/people.pl) and
I couldn't find additional handlers.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




"family name, personal name" in devel/people

2003-01-28 Thread Tomohiro KUBOTA
Hi,

I imagine names in http://www.debian.org/devel/people have the
unified format of "Surname, Given name".

I found two exceptions:

"Shuzo, Hatta", where "Hatta" is surname and "Shuzo" is given name.

"Yuuma, Oohara", where "Oohara" is surname and "Yuuma" is given name.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




August 01-04 1999 is future?

2003-01-24 Thread Tomohiro KUBOTA
Hi,

I found that Linux World Conference and Expo in August 01-01, 1999
is listed as a "future event" in
http://www.debian.org/events/1999/index.ja.html .
This problem is only for Japanese page.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




events pages not updated

2003-01-21 Thread Tomohiro KUBOTA
Hi,

I found Japanese translation of events pages per year (for
example, http://www.debian.org/events/2002/index.ja.html) aren't
updated these several days, even though I updated gettext
items for "Past events" and "Future events" and I also updated
several event items to use SGML entities instead of ISO-8859-1.
The pages also sometimes fail to use localized date formats in
po/ctime.ja.po .

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: translation of searchtmpl/search.wml

2003-01-20 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Craig Small)
Subject: Re: translation of searchtmpl/search.wml
Date: Tue, 21 Jan 2003 09:13:57 +1100

> It was before you replied to that mail (or maybe another?).  Essentially
> I was incorrect.  mnogosearch *stores* its indexing in UTF-8 but needs
> all those flags to do the indexing.
> 
> Have you tried a little bit of indexing of Japanese pages to see if it
> does seem to behave itself?

Not yet.  However, I believe that --with-extra-charsets is a necessary
condition, though I am not sure that it is a necessary and sufficient
condition (but I expect so).  Please read:
http://www.mnogosearch.org/board/message.php?id=6350

Note that Japanese (and Chinese) has "The Problem 2" (no spaces between
words) and need the newer version of mnoGoSearch with ChaSen, as you
wrote.  However, I expect Korean will be fully fixed by --with-extra-charsets.
I also expect that Japanese and Chinese words which occasionally appear
independently (i.e., separated by spaces or HTML tags) will be able to
be searched.

I thought about testing it but I don't have enough time to study database,
because I am entirely new on database.  (Also, I could not test another
etc/index..htm problem because search.cgi in klecker:~/public_html
didn't work well and I don't know why.  It may be because of apache's
configuration.)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: translation of searchtmpl/search.wml

2003-01-20 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Denis Barbier)
Subject: Re: translation of searchtmpl/search.wml
Date: Sun, 19 Jan 2003 21:46:34 +0100

> At first glance it sounds very good, but I am not sure this is the way
> to go, because some strings are not handled by gettext, e.g. see
> Catalan strings in webwml/english/template/debian/ctime.wml
> There are also several Perl variables in date.pot, which will be
> displayed according to current locale, and not UTF-8.
> We could certainly play with CUR_LOCALE, but a simpler solution is
> to post-process HTML files with iconv and change their charset
> field in  tags, see attached patch.

I think your idea is better than mine.  I also checked your patch
works well, i.e., builds translated pages in UTF-8.


BTW, do you have any idea why Craig thinks like the following mail?
Maybe I am missing something
http://lists.debian.org/debian-www/2003/debian-www-200301/msg00271.html

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




translation of searchtmpl/search.wml

2003-01-19 Thread Tomohiro KUBOTA
Hi,

As already we know, web pages of searchtmpl/search.html must be encoded
in UTF-8 and it is already done.

  http://lists.debian.org/debian-www/2002/debian-www-200212/msg00260.html

However, though we can write translated (or original) searchtmpl/search.wml
in UTF-8, gettext items are converted into legacy encodings and it breaks
the pages.

I think I found a fix for this problem.  Though this doesn't immediately
enable translated search page, I think this surely fixes one of the
problems.

I found that target encoding of gettext is defined in
webwml/english/template/debian/common_tags.wml .  Thus, the best way
is to redefine CHARSET_WML and CHARSET variables before it.  (These
variables are defined in webwml//.wmlrc files.)  The patch
attached to this mail includes this modification.

It is also needed that translated search.wml files to be written in
UTF-8.  The following patch includes a note on this point.

(The patch for Makefile is to disable conversion from EUC-JP to
ISO-2022-JP for Japanese.)


May I commit these modifications?


PS. The content negotiation
  http://search.debian.org/new/index.en.cgi
  http://search.debian.org/new/index.fr.cgi
seems not work well.  Though I am afraid I am wrong, how about
renaming /org/search.debian.org/etc/search..htm into
renaming /org/search.debian.org/etc/index..htm ?  The source
code of mnoGoSearch seems to substitute ".cgi" with ".htm" to
search the configuration file (src/search.c).
Index: english/searchtmpl/search.data
===
RCS file: /cvs/webwml/webwml/english/searchtmpl/search.data,v
retrieving revision 1.30
diff -u -r1.30 search.data
--- english/searchtmpl/search.data  30 Dec 2002 03:26:24 -  1.30
+++ english/searchtmpl/search.data  19 Jan 2003 14:02:03 -
@@ -1,3 +1,4 @@
+$(CHARSET=UTF-8) $(CHARSET_WML=UTF-8)
 #include "../../english/searchtmpl/search.def"
 
 
 
-$(CHARSET=UTF-8) $(CHARSET_WML=UTF-8)
 #use wml::debian::common_translation HOME="http://www.debian.org";
 $(title=)
 #use wml::debian::languages
Index: english/searchtmpl/Makefile
===
RCS file: /cvs/webwml/webwml/english/searchtmpl/Makefile,v
retrieving revision 1.5
diff -u -r1.5 Makefile
--- english/searchtmpl/Makefile 2 Nov 2002 23:36:01 -   1.5
+++ english/searchtmpl/Makefile 19 Jan 2003 14:05:58 -
@@ -10,6 +10,11 @@
 
 include $(WMLBASE)/Make.lang
 
+ifeq "$(LANGUAGE)" "ja"
+   WMLOUTFILE = $(@F)
+   WMLPROLOG =
+   WMLEPILOG =
+endif
 
 search.$(LANGUAGE).html: search.wml $(ENGLISHSRCDIR)/searchtmpl/search.data \
   $(ENGLISHSRCDIR)/searchtmpl/search.def $(TEMPLDIR)/common_translation.wml \
Index: english/searchtmpl/search.wml
===
RCS file: /cvs/webwml/webwml/english/searchtmpl/search.wml,v
retrieving revision 1.8
diff -u -r1.8 search.wml
--- english/searchtmpl/search.wml   20 Dec 2002 00:23:53 -  1.8
+++ english/searchtmpl/search.wml   19 Jan 2003 14:11:37 -
@@ -10,6 +10,8 @@
 # Of course edit the blurb below (in your language directory) to suit.
 # Email debian-www@lists.debian.org if you want to understand how this
 # horror works.
+#
+# Translated files of this file must be encoded in UTF-8.
 
 #include "$(ENGLISHDIR)/searchtmpl/search.data"
 


Re: UPPERCASE surname?

2003-01-18 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: UPPERCASE surname?
Date: Sat, 18 Jan 2003 13:39:04 +0100

> And I don't believe they exclude web pages that write e.g. "TOKUGAWA
> Ieyasu", as Google ignores case when searching (obviously because many
> people get case wrong, in general).
> 
> I know there can be confusion with this but if we keep the uppercase, we
> lose the consistency and at the same time don't provide any benefit to
> people who already understand what uppercase means.

In this case, "tokugawa" (Tokugawa, TOKUGAWA) is surname.  Therefore,
this is consistent with my usage.  And, the intension of the statistics
is just the order of surname-givenname or givenname-surname, *not*
UPPERCASE or Lowercase.

Do you still try to force me not to use UPPERCASE surname, though
I said I don't care how other people write my name?  I don't like
flamewar.  I just want you not to join Japanese people's flamewar
which continues about 10 years.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: UPPERCASE surname?

2003-01-17 Thread Tomohiro KUBOTA
From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: UPPERCASE surname?
Date: Sat, 18 Jan 2003 01:32:58 +0100

> I think if the document is in English, then English style is just fine.
> The uppercase stuff may be appropriate in e.g. French, but I don't believe
> there's any rule in English that requires it.
> 
> Also, writing whole words uppercase is a sign of yelling in electronic
> communication channels which a lot of us are accustomed to.

I see, but I will continue to write my name as "Tomohiro KUBOTA" in
"From:" mail headers and so on.  I won't force others to follow my
way.


> By default, the first part is the name, and the second part is the surname.
> There would only be confusion if someone wrote "Kubota Tomohiro", and I
> don't see why anyone would do that.

Though we rarely need to know which part is given name and which part
is surname, it is not safe to assume there is a "default".  Though I
everytime write my given name first and surname next in Alphabet
transcription, I don't know about other Japanese or other peoples.

For example, the followings are a statistics of famous Japanese people,
though the results don't exclude Japanese web pages:

  ---
  surname   givenname  Google hits in  Google hits in
   "s-g" order "g-s" order
  ---
  Natsume   Soseki 34101700
  Kawabata  Yasunari   54105370
  MatsuiHideki  5946030
  Tokugawa  Ieyasu 69002010
  Oda   Nobunaga   56601650
  Ito   Hirobumi   1880 633
  Ozawa Seiji  3150   25100
  (Mao  Tse-Tung  827001260)  * Chinese
  -------

I think you cannot "assume" givenname-surname order is "default".

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




UPPERCASE surname?

2003-01-17 Thread Tomohiro KUBOTA
Hi,

I found that webwml/english/News/weekly/2003/02/index.wml was modified
to de-capitalize a surname of a person (well, me).

http://cvs.debian.org/webwml/english/News/weekly/2003/02/index.wml.diff?r1=1.10&r2=1.11&cvsroot=webwml

I don't understand why uppercase surname is not accepted.  Uppercase
surname is widely used in academic world and it is useful to show which
part is the surname.  Especially, because of some confusion of Japanese
(and other east Asian?) way of writing their name in Alphabets
transcription (which Osamu Aoki mentioned recently), it is useful to
capitalize surname (though it is not used by everyone).

Though I am not enthusiastic enough to insist to modify the wml file
again, my concern is how my name (and other capitalized names) should
be handled when these names are newly written somewhere.  My opinion
is to respect their own way to write or the expression in the news
source.

PLEASE DON'T REQUIRE JAPANESE PEOPLE TO UNIFY THE WAY TO WRITE NAMES.
Such an argument continues more than 10 years in Japan.  Thus, the
only realistic solution is never to meddle.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: Japanese name use single space between the last name and the first name?

2003-01-16 Thread Tomohiro KUBOTA
Hi,

From: Gerfried Fuchs <[EMAIL PROTECTED]>
Subject: Re: Japanese name use single space between the last name and the first 
name?
Date: Thu, 16 Jan 2003 10:57:27 +0100

> > I do not care which way to write, IMHO.  But it has to be consistent.
> 
>  I was longing for consistency when I changed it in the DWN pages, too.
> 
>  So, do I need to revert the changes, or was it right?

I think you can leave them.  I will change Japanese names to use
spaces to achieve consistency (however I regard this as rather low
priority work).

Any suggestions?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: difference between "task" and "job"?

2003-01-16 Thread Tomohiro KUBOTA
Hi,

From: Tomohiro KUBOTA <[EMAIL PROTECTED]>
Subject: difference between "task" and "job"?
Date: Wed, 15 Jan 2003 16:14:18 +0900 (JST)

> I am now revieweing a Japanese translation of Debian web page,
> devel/join/nm-step4.
> 
> The page has words like "task" and "job".  In Japanese language,
> there are no distinct words for these two words.  Thus, I don't
> understand the difference between them.

We solved this translation with help of a Japanese AM (Applicant Manager).

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: Japanese name use single space between the last name and the first name?

2003-01-15 Thread Tomohiro KUBOTA
Hi,

From: Osamu Aoki <[EMAIL PROTECTED]>
Subject: Re: Japanese name use single space between the last name and the first 
name?
Date: Wed, 15 Jan 2003 16:15:01 -0800

> I may be the one to be blamed to cause this.
> 
> When I saw my name in document maintainer section in English with first
> and my last name in one piece, I felt strange.  I posted here and since
> no one replied, I fixed that page.

Ok.  I posted the mail because the comment with cvs commit may
mislead non-Japanese people who don't know Japanese custom.
(The comment can read that Japanese name uses spaces everywhere
in every contexts; which is not true.  I'd like Chinese translation
of DWN not to use spaces.)

I think a space may be used for Japanese name in English sentences.
Also, I added spaces even for Japanese translations if the name
is written independently in "()".  However, I didn't add spaces
when the names appear in ordinary sentences because such a expression
is apparently strange.


> Very valid question.  Most name references in the modern newspapers are
> spaceless and all in Japanese characters (I just checked.)  I have been
> spelling my names with separated format since most of the document I sign
> are government/bank document where they have box for each section. Also
> my name tag in my elementary school days tends to separate them. 

Right.  Your explanation is consistent with mine, and I expect
non-Japanese members of this list will trust us.  The keypoint
seems whether the name appears independently (i.e., book author,
sign on government/bank documents, name tags, and so on) or in
ordinary sentences.


> At any rate, mixed character group document is unconventional and I do
> not know what is right.  My intent of adding space in the English was to
> clarify splits between first and last name.

I understand your intent.  However, I am afraid that many people
will misunderstand that "Osamu" is "青木" and "Aoki" is "修",
while the truth is opposite.

> I do not care which way to write, IMHO.  But it has to be consistent.

Ok.  Since it is I who modify English version of DWN for Japanese
names (by a semi-automatic small Perl script), I can change the
policy and the script hereafter.


> Also, getting opinion of Chinese person's preference in English context
> may be interesting.  I see most chinese names in Japanese web pages do
> not use a space between last and first name in Japanese.

I am also interested.  Also, I can add items for my script for
Chinese, Korean, Russian, Greek, Thai, and any other
non-Latin-alphabet people.  Suggestions are welcome.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Japanese name use single space between the last name and the first name?

2003-01-15 Thread Tomohiro KUBOTA
Hi,

I found the following updates:

> NeedToUpdate  News/weekly/2002/19/index   1.8 1.9
> NeedToUpdate  News/weekly/2002/24/index   1.7 1.8
> NeedToUpdate  News/weekly/2002/26/index   1.101.11
> NeedToUpdate  News/weekly/2002/40/index   1.8 1.9
> NeedToUpdate  News/weekly/2002/44/index   1.9 1.10

which is commented:
"Japanese name use single space between the last name and the first name".

Either (single space or no space) will be OK in Japanese, though I
don't know typesetting rule when Japanese names in Kanji appear in
*English* sentences.

IMO, I feel single space can be used when I write my name independently
to fill out some forms.  However, I feel single space is funny when I
write my name in Japanese sentence, because Japanese sentence uses
few (or no) white spaces.  (Look the Japanese translations of DWN,
they don't use white spaces.)

Thus, though there are more Japanese names in DWN which don't use
white space between family name and individual name, I thing they
can be left as such.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: search.debian.org is online

2003-01-15 Thread Tomohiro KUBOTA
From: [EMAIL PROTECTED] (Craig Small)
Subject: Re: search.debian.org is online
Date: Thu, 16 Jan 2003 09:35:00 +1100

> On Wed, Jan 15, 2003 at 03:32:43PM +0900, Tomohiro KUBOTA wrote:
> > 
> > I'd like the mnoGoSearch of search.debian.org to be recompiled
> > with extra-charsets enabled, because it (I expect) immediately
> > benefits Korean.  (Note that Korean doesn't have the problem 2).
> > Since it doesn't need the newer version of mnoGoSearch with ChaSen
> > support (CVS version 3.2.8, to solve problem 2), it can be done now!
> 
> Except we're using UTF-8, so it shouldn't matter, I think.

mnoGoSearch uses Unicode internally for their indexing and searching
in the current configuration, as you wrote.  Thus, it needs to convert
HTML files into Unicode before processing them and it needs converters.
The default compilation of mnoGoSearch omits converters to Unicode from
east Asian encodings (ISO-2022-JP, EUC-KR, Big5, GB2312), and this is
why it cannot index nor search east Asian pages.

Compilation with the ./configure option will enable this.
Though Japanese and Chinese have further problem (problem 2), Korean
should be solved by this.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




difference between "task" and "job"?

2003-01-15 Thread Tomohiro KUBOTA
Hi,

I am now revieweing a Japanese translation of Debian web page,
devel/join/nm-step4.

The page has words like "task" and "job".  In Japanese language,
there are no distinct words for these two words.  Thus, I don't
understand the difference between them.

Are these words used just by chance?  (In English class, Japanese
people learn that English speakers like to use different words
to mention one object when the object is mentioned multiple times.)
Or, does the difference between them have some meaning?

I guess "task" means four fields (or classification) of "jobs":

 - Package Management 
 - Documentation 
 - Debugging and Testing 
 - Infrastructure 

However, I don't understand what "Alternative demonstration tasks"
means.  Does it mean jobs which cannot be classified into the four
fields and demonstration jobs to prove that the applicant can do it?
Or, just some jobs which _can be_ classified into the four fields
but the applicant wants to prove his/her skill in different way?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/





Re: search.debian.org is online

2003-01-15 Thread Tomohiro KUBOTA
Hi,

From: Tomohiro KUBOTA <[EMAIL PROTECTED]>
Subject: Re: search.debian.org is online
Date: Sun, 12 Jan 2003 16:51:31 +0900 (JST)

> > 1. handling of two-byte characters
> > 2. extraction of words from sentences without whitespaces
> 
> I think I found the reason of the problem 1.  Though mnogosearch
> supports multibyte languages, it doesn't support them by default.
> To support them, recompilation is needed.
> 
> 
> mnogosearch-3.2.7$ ./configure --help
>  .
>   --with-extra-charsets=CHARSET[,CHARSET,...]
>   Use additional non-default charsets:
>   none, all or a list from this set:
>   big5 gb2312 gbk japanese euc-kr gujarati tscii
>  .

I'd like the mnoGoSearch of search.debian.org to be recompiled
with extra-charsets enabled, because it (I expect) immediately
benefits Korean.  (Note that Korean doesn't have the problem 2).
Since it doesn't need the newer version of mnoGoSearch with ChaSen
support (CVS version 3.2.8, to solve problem 2), it can be done now!

I think --with-extra-charset=all or
--with-extra-charset=big5,gb2312,japanese,euc-kr is a good idea
because it enables sane "search results" page.


> Note that "japanese" means Shift_JIS, which is not the encoding for
> Debian Japanese web pages.  Debian Japanese web pages are written
> using ISO-2022-JP which seems not be supported by mnogosearch.

During browsing the source code of mnoGoSearch, I found that
version 3.2.6 seems to support ISO-2022-JP encoding which is
used for Debian Japanese pages, though it is not documented.
(Of course "japanese" extra-charsets must be enabled in 
./configure time.)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: search.debian.org is online

2003-01-12 Thread Tomohiro KUBOTA
Hi,

From: Tomohiro KUBOTA <[EMAIL PROTECTED]>
Subject: Re: search.debian.org is online
Date: Mon, 30 Dec 2002 19:53:31 +0900 (JST)

> 1. handling of two-byte characters
> 2. extraction of words from sentences without whitespaces

I think I found the reason of the problem 1.  Though mnogosearch
supports multibyte languages, it doesn't support them by default.
To support them, recompilation is needed.


mnogosearch-3.2.7$ ./configure --help
 .
  --with-extra-charsets=CHARSET[,CHARSET,...]
  Use additional non-default charsets:
  none, all or a list from this set:
  big5 gb2312 gbk japanese euc-kr gujarati tscii
 .


Note that "japanese" means Shift_JIS, which is not the encoding for
Debian Japanese web pages.  Debian Japanese web pages are written
using ISO-2022-JP which seems not be supported by mnogosearch.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: lists.debian.org de-localization

2003-01-12 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: lists.debian.org de-localization
Date: Sun, 12 Jan 2003 04:14:45 +0100

> This, on the other hand, is a hassle to handle (backporting or installation
> into subdirs). master.d.o is scheduled to be upgraded to woody after samosa.
> That's all I know. 

This is a good news.  Then I will work later on various encoding support.

Anyway, I don't expect the new master.d.o will have development version
of MHonArc (with encoding-assuming feature for raw 8bit headers) even if
it comes from non-Debian-package version.  Thus I think we will have to
have some method to handle raw 8bit headers.

Here is a "filter" to convert 8bit characters (assumed to be KOI8-R) to
"&#;" expression, which I wrote by imitating iso8859.pl, CharEnt.pm,
and UTF8.pm .  This filter is used for raw 7bit/8bit strings.  Since
7bit part of KOI8-R is identical to ASCII, it doesn't harm legal ASCII
headers.  The filter is to be installed into 
org/lists.debian.org/mhonarc/share/mhonarc/MHonArc/DEBIAN.pm and doesn't
depend on the version of MHonArc or Debian.
##  DEBIAN.pm by Tomohiro KUBOTA <[EMAIL PROTECTED]>
##
##  CHARSETCONVERTER module that assume input string to be KOI8-R
##  and convert it into &#xxx; expression where xxx is decimal Unicode
##  codepoint.

package DEBIAN;

%US_ASCII_To_Ent = (
  #--
  # Hex CodeEntity Ref  # ISO external entity and description
  #--
0x22,   """,   # ISOnum : Quotation mark
0x26,   "&",# ISOnum : Ampersand
0x3C,   "<", # ISOnum : Less-than sign
0x3E,   ">", # ISOnum : Greater-than sign
);

%KOI8_R_To_Ent = (
  #--
  # Hex CodeEntity Ref  # ISO external entity and description
  #--
0x80,   "─",  # BOX DRAWINGS LIGHT HORIZONTAL
0x81,   "│",  # BOX DRAWINGS LIGHT VERTICAL
0x82,   "┌",  # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x83,   "┐",  # BOX DRAWINGS LIGHT DOWN AND LEFT
0x84,   "└",  # BOX DRAWINGS LIGHT UP AND RIGHT
0x85,   "┘",  # BOX DRAWINGS LIGHT UP AND LEFT
0x86,   "├",  # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x87,   "┤",  # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x88,   "┬",  # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x89,   "┴",  # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x8a,   "┼",  # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x8b,   "▀",  # UPPER HALF BLOCK
0x8c,   "▄",  # LOWER HALF BLOCK
0x8d,   "█",  # FULL BLOCK
0x8e,   "▌",  # LEFT HALF BLOCK
0x8f,   "▐",  # RIGHT HALF BLOCK
0x90,   "░",  # LIGHT SHADE
0x91,   "▒",  # MEDIUM SHADE
0x92,   "▓",  # DARK SHADE
0x93,   "⌠",  # TOP HALF INTEGRAL
0x94,   "■",  # BLACK SQUARE
0x95,   "∙",  # BULLET OPERATOR
0x96,   "√",  # SQUARE ROOT
0x97,   "≈",  # ALMOST EQUAL TO
0x98,   "≤",  # LESS-THAN OR EQUAL TO
0x99,   "≥",  # GREATER-THAN OR EQUAL TO
0x9a,   " ",   # NO-BREAK SPACE
0x9b,   "⌡",  # BOTTOM HALF INTEGRAL
0x9c,   "°",   # DEGREE SIGN
0x9d,   "²",   # SUPERSCRIPT TWO
0x9e,   "·",   # MIDDLE DOT
0x9f,   "÷",   # DIVISION SIGN
0xa0,   "═",  # BOX DRAWINGS DOUBLE HORIZONTAL
0xa1,   "║",  # BOX DRAWINGS DOUBLE VERTICAL
0xa2,   "╒",  # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0xa3,   "ё",  # CYRILLIC SMALL LETTER IO
0xa4,   "╓",  # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0xa5,   "╔",  # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0xa6,   "╕",  # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0xa7,   "╖",  # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0xa8,   "╗",  # BOX DRAWINGS DOUBLE DOWN AND LEFT
0xa9,   "╘",  # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0xaa,   "╙",  # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0xab,   "╚",  # BOX DRAWINGS DOUBLE UP AND RIGHT
0xac,   "

Encoding of db.debian.org page

2003-01-11 Thread Tomohiro KUBOTA
Hi,

I found that http://db.debian.org/ is written in ISO-8859-1, as
the page says:

   

I hope it were UTF-8, if the backend database is working on UTF-8
or other encodings of Unicode.  It will enable users to input UTF-8
characters in the web forms and developers can store their names in
their own native expressions.  (In such a case, Debian developer
database should have additional items of ASCII expressions of their
names, because I cannot read or even input from keyboard Arab or Thai
characters.)

Please note I don't know about database nor I have no experience using
database.







Re: lists.debian.org de-localization

2003-01-11 Thread Tomohiro KUBOTA
Hi,

From: Tomohiro KUBOTA <[EMAIL PROTECTED]>
Subject: Re: lists.debian.org de-localization
Date: Tue, 07 Jan 2003 21:45:05 +0900 (JST)

> I think more important problem is how to deal with raw 8bit mail
> headers without encoding specification or encodings which are not
> supported by the current set-up but used in Debian mailing lists
> (GB2312, BIG5, and KOI8-R).

I heard that the current development version of MHonArc has a feature
to assume raw 8bit characters as some specified encoding .  However,
I don't think this can be a solution now because it will take a very 
long time that the version will be stable, then the stable version will
be adopted into unstable/testing version of Debian distribution, then
the distribution will become stable (released), and then the stable
distribution will be adopted to master.debian.org .

Anyway, I can write a KOI8-R -> SGML entity (or "&#;" expression)
filter very easily.  My plan is to assume raw 8bit characters to be
KOI8-R Russian and I think this can be achieved easily.

Remained problem is: how to handle unsupported encodings such as
GB2312 and Big5.  I found that the current set-up of lists.debian.org
mhonarc converts GB2312 and Big5 into raw 8bit streams (or can be said
16bit streams because these encodings are multibyte) and they also
cause encoding conflicts and loss of following "<" in "".  Thus
I'd like these encodings to be converted into "&#;" expressions.

(Also, debian-esperanto people may want to use ISO-8859-3 and UTF-8.)

I found
master.debian.org:/org/lists.debian.org/mhonarc/share/mhonarc/MHonArc/UTF8.pm
but I don't think this will work well because it depends on
Unicode::MapUTF8 module which is available as libunicode-maputf8-perl
package since Woody, where master.debian.org is Potato.

Then, I might be able to write an original filter using libtext-unicode-perl
but the package is also available since Woody.


I don't know any other ways.  Any suggestions?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages

2003-01-10 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages
Date: Thu, 9 Jan 2003 14:22:19 +0100

> Oh, yeah, that's in another sub. I'll find it and have it use
> from_utf8_or_iso88591_to_sgml() as well, of course.

How about this patch?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
--- people.pl   2003-01-09 07:31:58.0 +0900
+++ people.pl.new   2003-01-11 09:05:07.0 +0900
@@ -442,15 +442,23 @@
   foreach (`ldapsearch -P 2 -x -h db.debian.org -b dc=debian,dc=org uid=\* cn 
mn sn labeledurl`) {
 chop; $line = $_;
 if ($line =~ /^(dn: )?uid=(.+),.+$/) { $name = $2; }
-elsif ($line =~ /^cn(=|: )(.+)$/) { $ldap_cn = $2; }
+elsif ($line =~ /^cn(=|: )(.+)$/) {
+  $ldap_cn = from_utf8_or_iso88591_to_sgml($2);
+}
 elsif ($line =~ /^mn(=|: )(.+)$/) { next; }
-elsif ($line =~ /^sn(=|: )(.+)$/) { $ldap_sn = $2; }
+elsif ($line =~ /^sn(=|: )(.+)$/) {
+  $ldap_sn = from_utf8_or_iso88591_to_sgml($2);
+}
 elsif ($line =~ /^(\w+):: (.+)$/) {
   use MIME::Base64;
   my $namepart = $1;
   my $worddata = decode_base64($2);
-  if ($namepart eq "cn") { $ldap_cn = $worddata; }
-  elsif ($namepart eq "sn") { $ldap_sn = $worddata; }
+  if ($namepart eq "cn") {
+   $ldap_cn = from_utf8_or_iso88591_to_sgml($worddata);
+  }
+  elsif ($namepart eq "sn") {
+   $ldap_sn = from_utf8_or_iso88591_to_sgml($worddata);
+  }
   elsif ($namepart ne "mn") {
 die "something went wrong, a non-name field is BASE64 encoded";
   }


Re: webwml/english/mirror/Mirrors.masterlist

2003-01-09 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: webwml/english/mirror/Mirrors.masterlist
Date: Thu, 9 Jan 2003 14:31:01 +0100

> Haha. Good one. I think it's easiest to make mirror_list.pl replace "&\s"
> with "&\s", i.e. have it work properly with "foo & bar" but not mess
> with entities. "foo&bar" shouldn't happen, and we don't have any mirrors at
> AT&T. :)

A good idea, but I think it is clearer to modify "&" as "&" manually
in Mirrors.masterlist file, because it will lead more consistent principle
on how "&" is handled in the Mirrors.masterlist file or other files in
webwml CVS repository.

However, I don't mind whether your idea or my idea will be adopted.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Translated pages unavailable for DWN 2003

2003-01-09 Thread Tomohiro KUBOTA
Hi,

I found that translated pages of DWN 2003 #01 are not available,
though its title is found translated in index pages like
http://www.debian.org/News/weekly/2003/index.
where  is "de", "fr", "ja", and so on.

Indeed, http://www.debian.org/News/weekly/2003/01/index..html
doesn't exist even for "en" and the language-less page 2003/01/index.html
doesn't have language chooser in its footer.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




webwml/english/mirror/Mirrors.masterlist

2003-01-09 Thread Tomohiro KUBOTA
Hi,

As I wrote in the "Unidetified subject!" thread, I rewrite the
Mirrors.masterlist file to use ASCII characters only (including
SGML entity expressions which themselves are written in ASCII).

Howevere, I found that these SGML entities themselves appear in
the webpage of

   http://www.debian.org/mirror/official_sponsors.en.html

against my intension.  This is because "&" is processed in some
scripts and modified into "&".

I think there are several ways of solutions.

1) Modify the scripts not to modify "&" into "&", and leave
   the Mirrors.masterlist as it is.

2) Restore the Mirrors.masterlist and modify the scripts to change
   8bit characters into SGML entities.

3) Modify the Mirrors.masterlist to use UTF-8 and modify the scripts
   to change UTF-8 characters into SGML entities.

Now, 2) cannot be a solution because I found Mirrors.masterlist
uses not only ISO-8859-1 but also ISO-8859-2 characters.

How about 1) or 3) ?


PS. I found

   http://www.debian.org/mirror/sponsors.en.html

is too old.  Why isn't it regenerated since Nov 11, 2002 ?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages

2003-01-09 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages
Date: Wed, 8 Jan 2003 15:02:21 +0100

> Sounds very good, thanks.

Thank you very much for adopting my patch, but I found the renewed
devel/people webpage still has several 8bit characters.

There are seven 8bit characters and all of them are from "homepage"
lines, like:

Dahlqvist, Andr* (http://jota.sm.luth.se/~anedah-9/";>home page)

where "*" is \xe9.

I imagine that above lines are generated separatedly from canonical_names()
and therefore aren't filtered by from_utf8_or_iso88591_to_sgml().

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




webwml/english/mirror/Mirrors.masterlist (Re: [no subject])

2003-01-08 Thread Tomohiro KUBOTA
Hi,

Sorry for sending a funny mail.  It seems that I wrongly removed ":"
after "Cc" in the mail header.  (However, the content of the mail is
intended one.)

From: <[EMAIL PROTECTED]>
Subject: Unidentified subject!
Date: Thu, 09 Jan 2003 07:17:59 +0900 (JST)

> Cc Josip Rodin <[EMAIL PROTECTED]>
> Subject: webwml/english/mirror/Mirrors.masterlist
> From: Tomohiro KUBOTA <[EMAIL PROTECTED]>
> X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI)
> Mime-Version: 1.0
> Content-Type: Text/Plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
> 
> Hi,
(snip).

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages

2003-01-07 Thread Tomohiro KUBOTA
Hi,

From: Tomohiro KUBOTA <[EMAIL PROTECTED]>
Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages
Date: Tue, 07 Jan 2003 21:29:24 +0900 (JST)

> Anyway, though I don't know such a module, your way can be very easily
> implemented.  I think the easiest one is like following:
> 
>   $name =~ s/([\x80-\xff])/"&#".ord($1).";"/eg;

I wrote a new filter which
  - assume the input string is UTF-8 if it can be interpreted as such,
  - assume it is ISO-8859-1 if not.

Since UTF-8 encoding method is relatively strict, it is not likely that
ISO-8859-1-intended string is wrongly assumed to be UTF-8.  I confirmed
that people.names has no octet stream which can be interpreted as UTF-8.
(Individual 8bit character must not be UTF-8; in UTF-8, 8bit character
must appear in series.)

With this filter, my concern is completely solved.  Also you don't need
to think about future maintainance labor when a new maintainer uses 8bit
characters for his/her name.

#!/usr/bin/perl

sub from_utf8_or_iso88591_to_sgml ($) {
my $str=$_[0];
my $strsave = $str;
if ($str !~ /[\x80-\xff]/) {
# return ASCII string for less machine-time consumption.
return $str;
}
$str =~ s/([\xf0-\xf7])([\x80-\xbf])([\x80-\xbf])([\x80-\xbf])/
"&#" .
((ord($1)&0x7)* 0x4 +
(ord($2)&0x3f)* 0x1000 +
(ord($3)&0x3f)* 0x40 +
(ord($4)&0x3f)) . ";"/eg;
$str =~ s/([\xe0-\xef])([\x80-\xbf])([\x80-\xbf])/
"&#" .
((ord($1)&0xf)* 0x1000 +
(ord($2)&0x3f)* 0x40 +
(ord($3)&0x3f)) . ";"/eg;
$str =~ s/([\xc0-\xdf])([\x80-\xbf])/
"&#" .
((ord($1)&0x1f)* 0x40 +
(ord($2)&0x3f)) . ";"/eg;
if ($str !~ /[\x80-\xff]/) {
# $str is UTF-8 compliant, assume UTF-8.
return $str;
} else {
# $str is not UTF-8 compliant, assume ISO-8859-1.
$strsave =~ s/([\x80-\xff])/"&#".ord($1).";"/eg;
return $strsave;
}
}

while(<>) {
chomp($_);
print from_utf8_or_iso88591_to_sgml($_);
}



Re: lists.debian.org de-localization

2003-01-07 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: lists.debian.org de-localization
Date: Tue, 7 Jan 2003 11:41:36 +0100

> Hm, but doesn't the section on character sets cover the mails themselves as
> well? There are a bit under twenty thousand indices, which is a large amount
> by itself, but the mails themselves are really problematic.

I think the priority of the mails themselves is much lower than that
of the lists, because 8bit character in only one mail affects the whole
month in a list.

Anyway, I don't think the regeneration or modification is needed *now*,
because our solution is not yet perfect.  It is obviously waste of
machine time to modify both *now* and in future when better solution
will be available.

I think more important problem is how to deal with raw 8bit mail
headers without encoding specification or encodings which are not
supported by the current set-up but used in Debian mailing lists
(GB2312, BIG5, and KOI8-R).

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages

2003-01-07 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages
Date: Tue, 7 Jan 2003 12:31:37 +0100

> Hmm, I see how the already-hardcoded ones need to be fixed immediately, but
> shouldn't there be a Perl module which we could use to convert stuff
> automatically? I never bothered to look for it, but the number of hardcoded
> names just because of the character set shouldn't be growing, it's a pain to
> maintain.

I (weakly) prefer my way to your way because your way will declare that
*all* 8bit characters in maintainer names are ISO-8859-1 even in future.
I think it is not a very good idea because I hope ISO-8859-1 would fade
out and be substituted by UTF-8 in future.

Anyway, though I don't know such a module, your way can be very easily
implemented.  I think the easiest one is like following:

  $name =~ s/([\x80-\xff])/"&#".ord($1).";"/eg;

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: lists.debian.org de-localization

2003-01-06 Thread Tomohiro KUBOTA
Hi,

From: Marco d'Itri <[EMAIL PROTECTED]>
Subject: Re: lists.debian.org de-localization
Date: Tue, 7 Jan 2003 01:10:29 +0100

> On Jan 06, Tomohiro KUBOTA <[EMAIL PROTECTED]> wrote:
> 
>  >> This is not needed, only spammers put raw latin-1 characters in mail
>  >> headers.
>  >The key point is that when we receive a mail with raw 8bit characters,
> The key point is that we should not even accept mail with raw 8bit
> characters in the headers.

Though I agree with you, it is an ideal solution.  As Stephen said,
there are people who use raw 8bit characters (intended to be KOI8-R).
If you could force them to use "right" MUAs, I would fully agree with you.

Anyway, in the current set-up of lists.debian.org, encodings such as
GB2312 and BIG5 (used in debian-chinese-gb and debian-chinese-big5,
respectively) are not supported and processed just like raw 8bit
characters.  We also have to deal with them.

I am now interested in MHonArc::UTF8.pm .  I had been thinking
that it converts all UTF-8 characters (besides ASCII) into &#;
expression and doesn't support east Asians, which was wrong.
It seems to convert *from* all non-UTF8 encodings *to* UTF-8
and seems to support east Asians also (because Unicode::MapUTF8
supports east Asian encodings).

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: lists.debian.org de-localization

2003-01-06 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: lists.debian.org de-localization
Date: Mon, 6 Jan 2003 16:07:49 +0100

> Future only. Is there a pressing need to regenerate the old mails?
> I would rather avoid it...

I have an idea about an easy modification to old list pages.

Add the following line to all 
http://lists.debian.org/*/*/threads.html ,
http://lists.debian.org/*/*/maillist.html ,
http://lists.debian.org/*/*/subject.html , and
http://lists.debian.org/*/*/author.html

   

besides following exceptions:

for debian-chinese-gb,

   

for debian-chinese-big5,

   

for debian-russian,

   

I expect that this can be done by much less machine time than to
regenerate all above pages by using MHonArc, because this doesn't
need to access any individual mails.

I am also now reading MHonArc documents so that these above headers
can be added to future pages also (or, if everything can be migrated
into UTF-8, it will be much better).

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: lists.debian.org de-localization

2003-01-06 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: lists.debian.org de-localization
Date: Mon, 6 Jan 2003 16:07:49 +0100

> On Mon, Jan 06, 2003 at 11:42:44PM +0900, Tomohiro KUBOTA wrote:
> > Thank you for commiting the modification of debian.rc .  Does the change
> > affect future archives only?  Or all past and future archives?
> 
> Future only. Is there a pressing need to regenerate the old mails?
> I would rather avoid it...

I also think it is not a good idea to regenerate all pages now.
A part of the reason is that it is too heavy, but another part of
the reason is that the solution is not very good yet.  In future,
when MHonArc can handle all popular encodings and convert them
into UTF-8, and when it can handle raw 8bit headers by some intelligent
way, we may think about regenerate all pages by a very low priority
background task.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: lists.debian.org de-localization

2003-01-06 Thread Tomohiro KUBOTA
Hi,

From: Josip Rodin <[EMAIL PROTECTED]>
Subject: Re: lists.debian.org de-localization
Date: Mon, 6 Jan 2003 15:09:47 +0100

> > (permission of klecker:/org/www.debian.org/cron/people_scripts/people.pl
> 
> I have no idea how you came from mhonarc to people.pl, but okay. :)

Ah, I was confused.  It is from another thread in debian-www list
about similar 8bit problem on http://www.debian.org/devel/people.ja.html .

Please read the recent thread named "automatically-generated ISO-8859-1
characters in mulbibyte webpages" for detail.


Thank you for commiting the modification of debian.rc .  Does the change
affect future archives only?  Or all past and future archives?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: lists.debian.org de-localization

2003-01-06 Thread Tomohiro KUBOTA
Hi,

From: Edmund GRIMLEY EVANS <[EMAIL PROTECTED]>
Subject: Re: lists.debian.org de-localization
Date: Mon, 6 Jan 2003 13:45:47 +

> If the headers contain 8-bit octets and are valid as UTF-8, it's
> fairly safe to assume that they really are UTF-8. Otherwise, you could
> look for a Content-Type field or make it depend on the mailing list.

A good idea, but I think people who use UTF-8 today are those who
know well on character encodings and don't send raw 8bit headers.


> I thought some Japanese non-spammers use iso-2022-jp in headers, which
> isn't 8-bit, but it isn't us-ascii, either. Am I out of date?

Sometimes I read raw iso-2022-jp headers.  However, fortunately,
there are no Japanese mailing lists in Debian.  (debian-japanese
is an English mailing list.)  And more, MHonArc seems not to have
features to convert Japanese into SGML entity or &#; expression
and we cannot support Japanese headers anyhow.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: lists.debian.org de-localization

2003-01-06 Thread Tomohiro KUBOTA
Hi,

From: Marco d'Itri <[EMAIL PROTECTED]>
Subject: Re: lists.debian.org de-localization
Date: Mon, 6 Jan 2003 13:34:17 +0100

>  >Again, speaking about lists.debian.org, my original idea is to assume
>  >all 8bit raw characters to be ISO-8859-1, though I don't know this is
>  >technically possible or not.
> This is not needed, only spammers put raw latin-1 characters in mail
> headers.

The key point is that when we receive a mail with raw 8bit characters,
we don't have an easy and relyable method to tell the characters are
from ISO-8859-1 or KOI8-R or other character sets.

Anyway, in debian-russian mailing list, raw 8bit characters in mail
headers should be allowed and they should be assumed to be KOI8-R
on building lists.debian.org pages.

In any cases, using raw 8bit characters in lists.debian.org pages
must be avoided (so that the pages are not broken), and thus, raw
8bit characters in mail headers must be converted into something
(or must be deleted).

An easy way is to assume *all* raw 8bit characters to be KOI8-R and
convert into SGML entity.  However, I don't know whether there are
some other languages where a certain amount of non-spammer people
use raw 8bit characters.  If they exist, they will complain on this
idea.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/






Re: free in free beer? (News/2003/20030102.wml)

2003-01-06 Thread Tomohiro KUBOTA
Hi,

From: Osamu Aoki <[EMAIL PROTECTED]>
Subject: Re: free in free beer? (News/2003/20030102.wml)
Date: Sun, 5 Jan 2003 23:51:55 -0800

> This is "free bear".  Translation shall be:

Thank you.  I committed the file.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: free in free beer? (News/2003/20030102.wml)

2003-01-06 Thread Tomohiro KUBOTA
Hi,

From: "James A. Treacy" <[EMAIL PROTECTED]>
Subject: Re: free in free beer? (News/2003/20030102.wml)
Date: Mon, 6 Jan 2003 00:40:49 -0500

> > > The Test Drive Program is a free service of HP.
> > 
> > I think that the "free" here is "free beer", not "free speech".
> > 
> Correct.

I see.  I'd like to ask the meaning of another "free" in the
following sentence:

   When you
   register, you get a free shell account you can use to log into the
   wide variety of systems on the Test Drive network and try out the
   software and operating systems running on them.

I think it (free shell account) is also "free beer".  However, the
original translator seems to think about "free speech", i.e.,
"a shell account with which you can run anything you like".

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: lists.debian.org de-localization

2003-01-06 Thread Tomohiro KUBOTA
Hi,

From: "Stephen J. Turnbull" <[EMAIL PROTECTED]>
Subject: Re: lists.debian.org de-localization (Re: automatically-generated 
ISO-8859-1 characters in mulbibyte webpages)
Date: Sun, 05 Jan 2003 16:10:02 +0900

> This is a fairly small sample (about 100 subscribers, 25 regular
> posters).  However, the Russian spam I've seen (isn't it funny how you
> can identify spam even though you can't read the language it's written
> in?) invariably fails either the addressee tests (implicit, too many),
> the known spam software test, or the HTML-only test.  So (FWIW) I've
> disabled the 8-bit test and so far the Russian subscribers are happy.

IMO, in such a case, allowing raw 8bit mails is better (i.e., its merit
is larger than its demerit) than disabling them.

Again, speaking about lists.debian.org, my original idea is to assume
all 8bit raw characters to be ISO-8859-1, though I don't know this is
technically possible or not.  In this case, Russian people will be
annoyed browsing lists.debian.org pages.

If it is possible to have "assumption encoding" for each mailing list,
that of debian-russian list will be KOI8-R, that of debian-chinese-gb
will be GB2312, and so on, and all others ISO-8859-1.

I also hope there are some UTF-8 filters.  (There seems a writer
who uses UTF-8 name (From:) in debian-esperanto.)

However, I don't know at all about MHonArc

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




free in free beer? (News/2003/20030102.wml)

2003-01-05 Thread Tomohiro KUBOTA
Hi,

Now I am reviewing a Japanese translation of 20030102.wml
news page in Debian web site.

The page says:

> The Test Drive Program is a free service of HP.

I think that the "free" here is "free beer", not "free speech".

Debian always says "Debian is free software, in free speech
meaning, not free beer meaning" and we always fight against
"free-of-charge software" interpretation.  Thus, I think that
the word "free" without any comments must mean "free in free
speech", not "free beer", when the word is spoken by Debian.
If Debian wants to mention about "free beer", the word "free"
has to have some comments.

Now the Test Drive's "free" means "free beer", I think.  Thus,
I propose to write like following:

   The Test Drive Program is a free-of-charge service of HP.

Any comments?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: lists.debian.org de-localization

2003-01-05 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Denis Barbier)
Subject: Re: lists.debian.org de-localization (Re: automatically-generated 
ISO-8859-1 characters in mulbibyte webpages)
Date: Sun, 5 Jan 2003 15:33:41 +0100

> > Why not use iso_8859::str2sgml; instead of mhonarc::htmlize for iso-8859-1?
[...]
> Sounds like a very good idea.

Who should I ask for this modification?

(permission of klecker:/org/www.debian.org/cron/people_scripts/people.pl
is 755 owner=joy group=debwww, but I don't know whether klecker is the
rignt place to do because I checked klecker just by chance.  I also
checked gluck(=www.debian.org) and master but they don't have the file.
Where can I find a document on how /org/* are processed?)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: date of events

2003-01-05 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Denis Barbier)
Subject: Re: date of events
Date: Mon, 6 Jan 2003 01:18:09 +0100

> > I don't see any worthwhile advantage of this new setup over the old one.
> > Or am I missing something?
> 
> Tomohiro KUBOTA explained many times that having several encodings in
> a single file is painful.  With this new setup, the problem reported
> here could also not occur.

Right.  I had to edit ctime.wml with a broken (in meaning of i18n) editor
which doesn't regard any multibyte encodings (not to break 8bit encoding
parts) but happens to display multibyte character well (to input multibyte
character).

Usage of gettext is a wonderful advantage.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: DWN #49 2002 typo?

2003-01-05 Thread Tomohiro KUBOTA
From: Peter Karlsson <[EMAIL PROTECTED]>
Subject: Re: DWN #49 2002 typo?
Date: Sun, 5 Jan 2003 23:26:35 +0100 (CET)

> Tomohiro KUBOTA:
> 
> > I hope more plain words without fear of misunderstanding are used.
> > Please don't use minor meaning of a word with multiple meanings.
> 
> Isn't that why we are translating the pages to the different languages?
> To make sure people understand the pages without needing to know the
> finer aspects of the source language?

It is too simplification.  Since I am (and none of members of Japanese
translation teams are) professional translator, we sometimes fail to
translate very difficult English.

IMO, translation is useful because:

 - it omits labor and time from readers - it is like avoiding "reinvention
   of wheels".
 - many people would not read English because of such labor and time;
   thus translation is a good way to advertise Debian to such many
   people.
 - of course there are many people who don't know the basic of English
   and dictionaries cannot help them.

> I often look up words in my dictionary when I translate to make sure I
> get it right. I most often do, but sometimes I don't. I translate to
> make sure that the people that cannot read the English text well enough
> can understand the translation.

Sure, in this case, I should consult my dictionary.  However, I am
sometimes unsure that such "lower meaning" of a word is used, because
I don't understand the reason why such lower meaning must be used
even when we have more plain words with the same meaning.

In this case, "advise" did have a meaning of "tell" or "inform" in
my dictionary but it was the last meaning in the list of meanings.
Since a dictionary lists meanings from more possible to less possible,
I think it is hardly possible the last meaning is used here.  Furthermore,
there are more clear words like "tell" or "inform" to say the same thing.

In this case, we are lucky because I found the mistranslation.  However,
such a mistranslation may be missed if the mistranslation is
self-consistent.  To avoid this type of mistranslation, we have to
consult *every* words with dictionary.  It is impossible.

I think English writers are *now* trying to write plain English,
because it is required in http://www.debian.org/devel/website/working .
However, I fear that English writers sometimes don't know which type
of clearness and simpleness translators need.  Thus, I say now that
please use the most significant meaning of a word as far as possible.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: Norwegian Bokm(â)l

2003-01-05 Thread Tomohiro KUBOTA
Hi,

From: Tollef Fog Heen <[EMAIL PROTECTED]>
Subject: Re: Norwegian Bokm(â)l
Date: 05 Jan 2003 10:53:19 +0100

> | 2. modify "Bokm*l" to "Bokmâl".
> 
> It is å, not â

Thank you for your correction.  I used correct one to modify
language_names.wml but forgot to mention about that.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: Norwegian Bokm(â)l

2003-01-04 Thread Tomohiro KUBOTA
Hi,

From: Tomohiro KUBOTA <[EMAIL PROTECTED]>
Subject: Norwegian Bokm(â)l
Date: Sun, 05 Jan 2003 10:45:44 +0900 (JST)

> 1. "Bokm*l" and "Nynorsk" to be translation items.
> 
> or
> 
> 2. modify "Bokm*l" to "Bokmâl".

I did both, i.e., modify webwml/english/template/debian/language_names.wml
to have additional two items for Bokm*l and Nynorsk and gave the SGML entity
expression as default translation (i.e., msgid).

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Norwegian Bokm(â)l

2003-01-04 Thread Tomohiro KUBOTA
Hi,

I found the page

http://www.debian.org/international/l10n/po/index.ja.html

is broken where "Norwegian Bokm*l" (* is a with circ) is
written.  This is because the 8bit character (ISO-8859-1)
is regarded as the first byte of multibyte character of
Japanese EUC-JP.

It is recently when the Debian webpage adopted gettext to
translate items.  Before then, "Norwegian Bokm*l" and
"Norgegian Nynorsk" were targets to be translated and
such problem didn't occur because I translated these
words into Japanese.  However, now, these translations
are lost and the page is brokwn.

I think there are two ways of solution.

1. "Bokm*l" and "Nynorsk" to be translation items.

or

2. modify "Bokm*l" to "Bokmâl".

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




lists.debian.org de-localization (Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages)

2003-01-04 Thread Tomohiro KUBOTA
Hi,

From: Tomohiro KUBOTA <[EMAIL PROTECTED]>
Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages
Date: Fri, 03 Jan 2003 09:06:43 +0900 (JST)

> BTW, I found similar trouble in lists.debian.org pages.  In thread-list
> pages or date-list pages like
> 
>   http://lists.debian.org/debian-devel/2002/debian-devel-200212/threads.html,
> 
> there are no charset specification.  In such cases, web browsers will
> assume these pages according to user preference.  Naturally, Japanese
> people configure web browsers to "assume Japanese encoding for pages
> without charset specification".  On the other hand, the thread-list
> pages show senders' names in  format, and threfore, a tag 
> follows the name.  If the last letter of the name is 8bit, the tag
> is broken.  The result is that all following part are shown in 
> (italic) format.
>
> The test is easy: please configure your browser to "assume Japanese
> encoding for pages without charset specification" and load the above
> page.
>
>
> However, in this case, the solution is a bit complicated.  All mails
> should have encoding information in MIME format.  Thus, the best
> solution would be to parse MIME.  On the other hand, the simplest
> makeshift solution is to add "charset=iso8859-1" for all pages
> but there are mailing lists where most of 8bit characters are
> cyrillic and so on.


I found that MHonArc has a feature to solve this problem.

  http://www.mhonarc.org/MHonArc/doc/faq/mime.html#nonascii

I checked /org/lists.debian.org/mhonarc/debian.rc and found
that it seems to ssume that any 8bit characters are ISO-8859-1.

> 
> plain;  mhonarc::htmlize;
> us-ascii;   mhonarc::htmlize;
> iso-8859-1; mhonarc::htmlize;
> iso-8859-2; iso_8859::str2sgml; iso8859.pl
> iso-8859-3; iso_8859::str2sgml; iso8859.pl

Why not use iso_8859::str2sgml; instead of mhonarc::htmlize for iso-8859-1?

(Though I am new to MHonArc, I imagine that iso_8859::str2sgml converts
ISO-8859 8bit characters into SGML entity like "ö".)

It would be nice if we can convert raw 8bit mail headers (though it is
illegal; it sometimes happens and may cause breaking the lists.debian.org
pages) to SGML entities by assuming they are ISO-8859-1.  Since this may
annoy Russian (and other non-ISO-8859-1) people who happen to use MUAs
which generates illegal mail headers with 8bit characters without charset
specification, I'd like to hear from people from various countries.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages

2003-01-04 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Denis Barbier)
Subject: Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages
Date: Thu, 2 Jan 2003 16:24:59 +0100

> I find only 18 names in people.names containing non-ASCII letters,
> so /org/www.debian.org/cron/people_scripts/people.pl could contain
> some extra elsif in its canonical_names function to replace
> non-ASCII letters by HTML entities.  Most names seem to be ISO-8859-1
> encoded.

I implemented your idea.  Here is a patch.  I assumed all 8bit
characters to be ISO-8859-1.

Could someone apply this?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/


people.pl.DIFF
Description: Binary data


Re: DWN #49 2002 typo?

2003-01-03 Thread Tomohiro KUBOTA
Hi,

From: Peter Karlsson <[EMAIL PROTECTED]>
Subject: Re: DWN #49 2002 typo?
Date: Fri, 3 Jan 2003 18:55:08 +0100 (CET)

> > It reads that Thomas advised Susan to rewrite the entire manual page;
> > though the link[1] says that Susan has already rewritten the page.
> 
> No, Thomas did not advise Susan, Thomas advised the readers. Advise can be
> seen as a synonym for "announce" here.

Thank you for explanation.  Now I understand.  

I hope more plain words without fear of misunderstanding are used.
Please don't use minor meaning of a word with multiple meanings.
In real, the original Japanese translator translated as Thomas
said Susan to rewrite the entire manual pages and several reviewers
of the translation could not find the mistranslation.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




  1   2   >