Re: Release announcement simplified Chinese translation update
Dongsheng Song writes: > 2009/2/18 Vern Sun : >> on 三, 2009-02-18 at 02:43 +0800, Anthony Wong wrote: >>> I suggest 1. to convert all existing Chinese WML files for the Debian >>> website >>> from Big5 to UTF-8 >>> >>> Any comments? >>> >> 如果全部转换成 UTF-8 格式可能会存在问题,假设有两个用户(一个简体,一个繁体)都 >> 贡献了一个翻译: >> >> % cat foo.tc >> 中國 >> >> % cat foo.sc >> 中国 >> >> % enca foo.sc foo.tc >> foo.sc: Universal transformation format 8 bits; UTF-8 >> foo.tc: Universal transformation format 8 bits; UTF-8 >> >> 把简体用户贡献的翻译从 UTF-8 转到 GB2312 是正常的 >> ~% iconv -f utf8 -t gb2312 foo.sc > foo.sc.gb >> >> 但是把繁体用户贡献的翻译从 UTF-8 转到 GB2312 是错误的 >> ~% iconv -f utf8 -t gb2312 foo.tc > foo.tc.gb >> iconv: illegal input sequence at position 3 >> >> 同理,把简体用户贡献的翻译从 UTF-8 转到 BIG5 也是错误的 >> ~% iconv -f utf8 -t big5 foo.sc > foo.sc.big >> iconv: illegal input sequence at position 3 >> >> ~% iconv -f utf8 -t big5 foo.tc > foo.tc.big >> > > 我不明白,为什么还死抱着 GB2312/Big5 不放手,直接使用 UTF-8 不好吗? > sc <=> tc 应该只转换内容,不应该多此一举的转换到过时的编码。 > > --- > Dongsheng Song Vern Sun 的考虑是:如果使用 GB2312/BIG5 编码,是可以直接知道该文档编码是 简体或者繁体,进而知道是否需要先进行简繁转换再编码转换。而 UTF-8 本身是 两者都可以同时存在的。 不过,使用 UTF-8 应该仍然可以知道当前的汉字是简体还是繁体,而且可以省去编 码转换步骤,所以应该不成问题。 Moreover, when posting to international mailing lists, please prefer using English so that non-Chinese speaker can follow. To non-Chinese speakers: the above discussion is a concern about using UTF-8 might lead to premature encoding conversion where converting Traditional Chinese characters to GB2312 will result in failure, as UTF-8 can hold both character sets. This is a valid concern, but shouldn't be a problem if checking the character set is done before hand, and if possible, there will be no need to do encoding conversions as UTF-8 can handle both character sets uniformly. Regards, Deng Xiyue -- To UNSUBSCRIBE, email to debian-www-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Release announcement simplified Chinese translation update
Deng Xiyue writes: > Anthony Wong writes: > >> 2009/2/17 Arne Goetje >>> >>> Matt Kraai wrote: >>> > On Mon, Feb 16, 2009 at 11:58:20PM +0800, Arne Goetje wrote: >>> >> Matt Kraai wrote: >>> >>> We already appear to use a single source version for all three Chinese >>> >>> translations: Big5. Whether it's possible to change to UTF-8 is for >>> >>> someone more familiar with Chinese to say. It's not sufficient to >>> >>> just switch the encoding of this file, though: >>> >>> >>> >>> $ make >>> >>> cd . && wml -q -D CUR_YEAR=2009 -o >> undef...@ucnucnhkucntw:20090214.zh-cn.html@g+w -o >> undef...@uhkucnhkuhktw:20090214.zh-hk.html@g+w -o >> undef...@utwucntwuhktw:20090214.zh-tw.html@g+w --prolog=../../bin/ >> fix_big5.pl 20090214.wml >>> >>> * Converting: [zh_CN.GB2312], /usr/bin/iconv: illegal input sequence >>> >>> at >> position 233 >>> >>> make: *** [20090214.zh-cn.html] Error 1 >>> >>> >>> >> Doesn't surprise me. A number of characters which are present in Big5 >>> >> are not present in GB2312 (and vice versa). Using iconv to convert those >>> >> characters will lead to such errors. >>> >> >>> >> zh-autoconvert might give better results. >>> >> >>> >> Else, if you can give me the link to the source, then I can take a look. >>> > >>> > Sure, it's available in the webwml CVS module at >>> > chinese/News/2009/20090214.wml. You can find instructions for >>> > accessing the repository at >>> > >>> > http://www.debian.org/devel/website/using_cvs >>> > >>> >>> OK, attached are the results for review. >>> >>> Build-Depends: zh-autoconvert >>> >>> To convert from Big5 into GB2312: >>>autob5 -o gb < 20090214.wml > 20090214_gb2312.wml >>> To convert from Big5 into UTF-8: >>>autob5 -o utf8 < 20090214.wml > 20090214_zht_utf8.wml >>> To convert from Big5 into simplified Chinese UTF-8: >>>autob5 -o gb < 20090214.wml | autogb -o utf8 > 20090214_zhs_utf8.wml >>> >>> I used the latter two commands to generate the attached files. >>> >>> The difference between iconv and zh-autoconvert is that iconv simply >>> tries to convert the codepoints one to one and zh-autoconvert uses a >>> dictionary to map traditional characters to their simplified >>> counterparts. Since the database is quite old, it may not work for >>> simplified <-> traditional mappings where simplified characters have >>> been added later (GBK) or where the document contains HKSCS characters, >>> which use the Big5 Private Use Area. Those characters cannot be converted. >>> I have long wanted to create a new library where a full Unicode >>> compatible mapping takes place. Unfortunately I don't have the time for >>> that. But if there are any volunteers out there, I'm willing to >>> coordinate such a project. >>> >>> Cheers >>> Arne >> >> Hi all, >> >> I have been thinking that using Big5 as the primary encoding for both >> TC (Traditional Chinese) and SC (Simplified Chinese) versions of >> Debian website are detrimental to user contributions. To summarize the >> current situation of the Chinese versions of Debian website, >> translations must be done in Big5 WML files, TC version is basically >> converted simply from WML to HTML, but to generate the SC versions, >> Big5 files must be converted to GB2312 first. It is done so due to the >> one-to-many SC-TC mappings problem. To deal with the differences of >> terms for the same meaning in TC and SC, like 文件 and 檔案, we use a >> simple mapping table written in Perl and for some terms that are >> rarely used, inline WML substitution syntax is used, like [CN:文 >> 件:][HKTW:檔案:]. >> >> This puts a hurdle for SC users to submit translations to Debian, >> because they write in SC but then have to use whatever method to >> convert it to Big5 for submission. And there is also the possibility >> that the converted Big5 file may not contain proper TC >> words/phrases. It also gives people the impression that SC >> contributors are treated like "second-class citizens" (am I too >> sensitive?). Not to mention that Big5 and GB2312 are both considered >
Re: Release announcement simplified Chinese translation update
the translators use and the burden for them to use Big5 is lifted. > > For MediaWiki's Chinese conversion system, please see: > > 1. http://meta.wikimedia.org/wiki/ > Automatic_conversion_between_simplified_and_traditional_Chinese > 2. http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/ > ZhConversion.php?revision=47314&view=markup > 3. http://zh.wikipedia.org/wiki/MediaWiki:Conversiontable/zh-hant > 4. http://zh.wikipedia.org/wiki/MediaWiki:Conversiontable/zh-hans > > > Any comments? > > -- > Anthony It is great to migrate to UTF-8 encoding to ease encoding conversion. However, I'm a little bit concerned with solution for automatic dialect handling in mediawiki, which is complicated and possibly error-prone. It'll be good if the inline diversion solution currently in use can be retained. Plus, several diversions that are synonyms can be unified, as the example given above. Ideas? Regards, Deng Xiyue -- To UNSUBSCRIBE, email to debian-www-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Release announcement simplified Chinese translation update
Matt Kraai writes: > On Mon, Feb 16, 2009 at 01:08:42AM +0800, Deng Xiyue wrote: >> I don't know that translation of release announcement are put on web >> using local locale instead of UTF-8. I have checked the simplified >> Chinese page[1], which is encoded in GB2312. Maybe that is required by >> policy. I don't know which is the preferred way to store this in SVN >> repository, so now I'm attaching both the UTF-8 version as >> well as the GB2312 version for reference. These 2 versions can be >> converted to each other by "iconv" utility. > > You seem to have sent two copies of the UTF-8-encoded version. > > According to the Chinese .wmlrc, the character set used for the > Chinese translation is Big5. I tried to convert the file directly, > which failed: > > $ iconv -f UTF-8 -t BIG5 lenny-announcement.zh_CN.wml.utf8 > Debian GNU/Linux 5.0 iconv: illegal input sequence at > position 43 > > I was able to convert it by going through GB2312, however: > > $ iconv -f UTF-8 -t GB2312 lenny-announcement.zh_CN.wml.utf8 | iconv -f > GB2312 -t BIG5 > ... > > I committed this version. Would you please inspect the results and > let me know if there are problems? If all Chinese translation are by default converted to BIG5 then it is problematic. Simplified Chinese, as used in zh_CN locales, are supposed to be converted to GB* encodings (GB2312, GBK, and GB18030, where GB18030 is the current standard encoding); whereas Traditional Chinese, as used in zh_TW, zh_HK, zh_SG or other traditional chinese locales, are supposed to be converted to BIG5 encodings. Maybe the .wmlrc needs update. Moreover, the garbled characters around "vendors" link still remains in zh_CN, however doesn't show up in zh_TW. CCing simplified Chinese list for further insights. Sincerely, Deng Xiyue -- To UNSUBSCRIBE, email to debian-www-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Release announcement simplified Chinese translation update
Alexander Reichle-Schmehl writes: > Hi! > > Deng Xiyue schrieb: >> Congratulations for Lenny! And glad to see that the release >> announcement has been sent out, however I hope it is still worthy to >> update translation to the latest version (rev 154) for future web >> reference. Especially this version is supposed to fix a partially >> garbled character which didn't show up in source and might be caused by >> an adjacent tag and Chinese character. Finally, Thanks for the >> excellent release announcement. > > Could you please contact debian-www@lists.debian.org with that request? > I have problems with the encoding of that file (which I think must be > "big5" for our website), and the "trick" I was told last night to > convert your file doesn't work any more :( > CCing debian-www@lists.debian.org as suggested. I don't know that translation of release announcement are put on web using local locale instead of UTF-8. I have checked the simplified Chinese page[1], which is encoded in GB2312. Maybe that is required by policy. I don't know which is the preferred way to store this in SVN repository, so now I'm attaching both the UTF-8 version as well as the GB2312 version for reference. These 2 versions can be converted to each other by "iconv" utility. [1] http://www.debian.org/News/2009/20090214.zh-cn.html Also, the filled paragraph as done in my last update produces poor appearance with unnecessary space on line break, so I have rejoined all the lines in this version. Thanks for considering. > > Best regards, > Alexander Sincerely, Deng Xiyue Debian GNU/Linux 5.0 发布 2009-02-14 #use wml::debian::news Debian 计划高兴地宣布在经历了 22 个月坚持不懈地开发之后,Debian GNU/Linux 5.0 版(代号Lenny)正式发布了。Debian GNU/Linux 是一个自由的操作系统,它支持 12 种处理器架构并带有KDE、Gnome、Xfce 和 LXDE 桌面环境。同时它和 FHS v2.3 兼容,其软件针对 LSB 3.2 版开发。 Debian GNU/Linux 可以在多种计算机上运行,从掌上机和手持系统到超级计算机,在它们之间的几乎任何机型都可以。它支持十二种架构:Sun SPARC (sparc)、HP Alpha (alpha)、Motorola/IBM PowerPC (powerpc)、Intel IA-32 (i386)、IA-64 (ia64)、HP PA-RISC (hppa)、MIPS (mips, mipsel)、ARM (arm, armel)、IBM S/390 (s390) 以及 AMD64 和 INTEL EM64T (amd64)。 Debian GNU/Linux 5.0 Lenny 加入了对 Marvell 的 Orion 平台的支持,它被用于许多存储设备。所支持的存储设备包括 QNAP Turbo Station、HP mv2120 和 Buffalo Kurobox Pro。另外,Lenny 现在支持多种网络笔记本电脑(Netbook),特别是华硕出品的 Eee PC。Lenny 还包含了用于 Emdebian 的编译工具,用它可以对 Debian 源代码包进行交叉编译并收缩,使之适用于嵌入式 ARM 系统。 Debian GNU/Linux 5.0 Lenny 包括了新的 ARM EABI 移植,armel。这个新的移植可以更有效地使用现代和未来的 ARM 处理器。因此,老的 ARM 移植 (arm) 就过时了。 本次发布包括了多种升级过的软件包,比如 K 桌面环境 3.5.10 (KDE)、升级的 GNOME 桌面环境 2.22.2、Xfce 4.4.2 桌面环境、LXDE 0.3.2.1、GNUstep 桌面 7.3、X.Org 7.3、OpenOffice.org 2.4.1、GIMP 2.4.7、Iceweasel 3.0.6 (去除品牌版本的 Mozilla Firefox)、Icedove 2.0.0.19 (去除品牌版本的 Mozilla Thunderbird)、PostgreSQL 8.3.6、MySQL 5.1.30 和 5.0.51a、GNU 编译器集合 (GCC) 4.3.2、Linux 内核 2.6.26 版、Apache 2.2.9、Samba 3.2.5、Python 2.5.2 和 2.4.6、Perl 5.10.0、PHP 5.2.6、Asterisk 1.4.21.2、Emacs 22、Inkscape 0.46、Nagios 3.06、Xen Hypervisor 3.2.1 (dom0 及 domU 支持)、OpenJDK 6b11,以及超过 23,000 个其他完全可用的软件包(从 12,000 个源码包编译而成)。 由于集成 X.Org 7.3,X server 可以对绝大多数硬件进行自动配置。新引入的软件包可以完全地支持 NTFS 文件系统,并且能够直接使用绝大多数多媒体按键。通过 swfdec 或 Gnash 插件可以支持 Adobe® Flash® 格式的文件。对笔记本电脑的支持得到了全面提升,比如 CPU 频率自动调节的原生支持。新加入的几个游戏可用来消磨闲暇时光,包括解谜游戏以及第一人称射击游戏。还有值得一提的是新增加的 goplay,它是一个图形化游戏管理器,提供了过滤器、搜索、抓屏以及对 Debian 中的游戏进行介绍等功能。 由 Debian GNU/Linux 5.0 新加入和更新版本的 OpenJDK,GNU Java 编译器,GNU Java 字节码解释器,Classpath 和其他自由版本的 Sun 的 Java 科技使我们可以在 Debian 的 main 软件仓库下发布基于 Java 的应用程序了。 系统安全方面的改进有在安装后第一次启动之前就安全所有安全更新、减少标准安装 setuid root 可执行文件和打开的端口,以及使用 GCC 加强特性编译多个对安全有严格要求的软件包。其他多种软件也都有特别的改进,比如 PHP 现在已经使用 Suhosin 的加强补丁编译。 对于非英语母语的用户,包管理系统目前已经支持软件包描述的翻译,如果已经翻译,它会自动显示用户母语版本的软件包描述。 Debian GNU/Linux 可以用多种介质进行安装,比如 DVD、CD、USB 闪存和软驱以及网络。GNOME 是缺省的桌面环境,包含在第一张 CD 中。其他桌面环境 — KDE、XFce 或 LXDE — 则可以通过两张新的替代 CD 镜像进行安装。同样还提供 Debian GNU/Linux 5.0 的多架构安装 CD 和 DVD,用它们可以从一张光盘上安装多种架构的电脑;同时本发行还提供蓝光光盘,这就让在一张安装盘上提供一个架构的全部软件成为可能。 除常规的安装,Debian GNU/Linux 也可以直接使用而无需安装。这种特殊的镜像也称为 live 镜像,可用于 CD、USB 闪存以及多种形式的网络启动。起步阶段只提供 amd64 和 i386 架构的 Live 镜像。 Debian GNU/Linux 5.0 的安装过程也已经从多个方面进行了改进:在多个其他改进中,重新加入了对从多张 CD 或 DVD 进行安装的支持,某些设备需要的固件可以从可移动介质上加载,还支持通过布莱叶显示器进行安装。安装程序的启动过程同样也得到很多关注:可以在图形界面上选择安装前端和桌面环境,同时也能选择专家或救援模式。Debian GNU/Linux 的安装系统目前已经被翻译成 63 种语言。 Debian GNU/Linux 现在已经可以通过 bittorrent (推荐使用)、jigdo 或 HTTP 方式下载;更多信息请参见 光盘上的 Debian GNU/Linux。很快多个 供应商 就将开始提供 DVD、CD-ROM 以及蓝光光盘。 对绝大多数的配置来说,从先前版本即 Debian GNU/Linux 4.0 (代号 Etch) 升级到 Debian GNU/Linux 5.0 可以由 aptitude 包管理工具自动进行,一定程度上也可以用 apt-get 包管理工具进行。一如往常,Debian GNU/Linux 系统可以平稳无痛地升级,没有任何必须的当机时间,但强烈建议事先阅读发行注记,以便了解可能的问题,并获取详细的安装和升级指示。本发行注记将在发行之后的数周内进行更新,并将翻译成其他语言。 献给 在此特将 Debian GNU/Linux 5.0 Lenny 献给 Thiemo Seufer,他是一位 Debian 开发者,在2008年12月26日的一场不幸的车祸中去世。Thiemo 通过多种方式参与了 Debian。他维护着多个软件包并且是 MIPS 的 Debian 移植的主要支持者。他还是我们内核组及 Debian 安装程序组的成员。他的贡献远远超出Debian 项目。他还致力于 Linux 内核的 MIPS 移植以及 qemu 的 MIPS 仿真等工作,同时还参与多个小项目,这里难以一一述及。 我们将永远怀念 Thiemo 的工作、奉献、广泛的技术知识以及同其他人分享的能力。Thiemo 的贡献不会被遗忘。Thiemo 工作的高标准我们永难企及。