Re: [drlvm] proposals for VM internationalization
Hello all. Salikh Zakirov wrote: far below are results of my experiments with Log4cxx's ResourceBundle. (I've managed to find it in Log4cxx documentation after carefully rereading your original post). The good news is that it does localization (severely limited). The prototype has following good properties * The unlocalized message is used as the message key The message key should be message pattern (not a message), because some parameters may be in this message. E.g.: Message pattern with integer parameter: %d or Message pattern with one parameter: {0} * No extra entities were introduced (like non-printable message keys) What about very long message pattern (e.g. see help message from VM)? For these cases messageId key should be used. * The localizable messages are marked by _() notation and can be extracted from the source code automatically To my mind solution is graceful for localizable messages extraction. But should we care about this? Once, these messages should be gathered and put into properties file. I propose the following solution: Modify VM's LoggerString class. The first parameter of composite message should be message key. If it equals empty string then the message should not be localized. E.g.: WARN( Not localizable message with two parameters: 1 and 10) WARN(localizable message with two parameters: %d and %d 1 10) or WARN(localizable message with two parameters: {0} and {1} 1 10) The things that I have not implemented yet (to save time and make at least something available): * loading the system locale value * reading the locale-specific localization file * converting the localized messages to locale-specific encoding What do you mean there? * converting the unlocalized messages from source encoding (US-ASCII) to UTF-16 (wchar_t[]) There is big question. We can use there char[] strings. Log4cxx automatically converts char* to wchar_t*. Also, we can use utf8 coding for wide characters. The issues that I have encountered but haven't yet worked out a solution: * PropertyResourceBundle.getString().c_str() returns the pointer to the stack location. To make it work, I had to use wcsdup(), thus introducing an unacceptable memory leak. I think there must be some way to get the pointer to original bundle contents, but haven't figured out how to achieve it. May be that's the way: LOG4CXX_DECODE_WCHAR(chstr, wchrstr); LOG4CXX_ENCODE_CHAR(charstr, chstr); charstr.c_str() * PropertyResourceBundle expects the good property format, so the unlocalized messages needs to be mangled to property-compatible form (in the patch below, the only transformation replaced spaces ' ' with underscores '_', but it needs to be generalized). I agree with you. Given the number of issues PropertyResourceBundle introduces, and the number of services it provides (parsing property-format and constructing in-memory hashmap), I think that it would be easier to reimplement the functionality without using PropertyResourceBundle, and change the storage on-disk file format to allow unmangled messages be the keys. In conclusion there are my suggestions for VM's internationalization: 1. Extend log4cxx::helpers::PropertyResourceBundle class which should allow lazy (on demand) load of properties. 2. Extend log4cxx::helpers::Properties class to allow string with spaces as a key. 3. Choose model: a. _(message key) – localizable ; message – not localizable b. message key – localizable ; – not localizable. 4. Decide between two variants: printf format specifications or {number} should be used inside message pattern for parameters. Thanks Dmitry. === From: Salikh Zakirov [EMAIL PROTECTED] Date: Thu, 13 Jul 2006 12:06:05 +0400 Subject: [PATCH] Dummy l10n implemenation based on Log4cxx --- vm/include/l10n.h | 31 +++ vm/port/include/loggerstring.h |9 + vm/vmcore/src/init/l10n.cpp| 66 vm/vmcore/src/init/vm_main.cpp |2 + 4 files changed, 108 insertions(+), 0 deletions(-) diff --git a/vm/include/l10n.h b/vm/include/l10n.h new file mode 100755 index 000..bb3edfe --- /dev/null +++ b/vm/include/l10n.h @@ -0,0 +1,31 @@ +#ifndef _L10N_H +#define _L10N_H + +#include string +#include log4cxx/helpers/propertyresourcebundle.h +#include log4cxx/helpers/exception.h +#include wchar.h +#include cxxlog.h + +extern log4cxx::helpers::ResourceBundlePtr l10n_resource_bundle; + +inline const wchar_t* _(const wchar_t* message) +{ +if (!l10n_resource_bundle) return message; +try { +wchar_t* mangled = wcsdup(message); +wchar_t* c = mangled; +while (*c) { +if (*c == L' ') *c = L'_'; +c++; +} +std::wstring localized = l10n_resource_bundle-getString(mangled); +free(mangled); +return wcsdup(localized.c_str()); // FIXME: leak +} catch (log4cxx::helpers::MissingResourceException )
Re: [drlvm] proposals for VM internationalization
Salikh Zakirov wrote: Geir Magnusson Jr wrote: I'll state the obvious... there is another thread going on about how do to similar things with Classlib. Maybe you can find common ground for message bundles and such... geir 1. The launcher already packages some translations in property-format, it makes me believe that launcher localization was once completed at IBM. Though I wasn't able to find anything about localization in launcher sources. Tim, Mark, could you provide more information about localization already implemented in classlib natives? There is support for getting localized messages from resource files in the Harmony port library functions. See: http://svn.apache.org/viewvc/incubator/harmony/enhanced/classlib/trunk/doc/vm_doc/html/hynls_8c.html?view=co Regards, Tim 2. As far as I can see, the only common thing that natives l10n can have with java l10n is translation files. Generally, this is a good goal, as it would make the translators job more straightforward, keeping the number of formats and message systems at minimum. 3. I personally consider the property-based design of l10n in Java inferior, because it requires the keys to be property-name-compatible (e.g. no spaces), and it often results in developers choosing to introduce short localization key names bearing no meaning. For example, see the harmony_*.properties in classlib: EXEL051=... Should the localization system fail, the only thing that user will get is EXEL051. The developers reading the code which prints localizable message, has no clue too. To find out the value of message, one needs to consult default localization file. Furthermore, when introducing new localizable message, one needs to edit 3(!) different places: add the message code, add the key, and add the printable message to default localization file. This particular design choice is ineffective in using developers' time, is less robust and less maintainable. And if the key names are used in construction of unlocalized messages, then it introduces runtime cost of mangling the unlocalized message to some property-name-compatible form. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Tim Ellison ([EMAIL PROTECTED]) IBM Java technology centre, UK. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
Vladimir Gorr wrote: Continue this discussion? Vladimir, far below are results of my experiments with Log4cxx's ResourceBundle. (I've managed to find it in Log4cxx documentation after carefully rereading your original post). The good news is that it does localization (severely limited). The prototype has following good properties * The unlocalized message is used as the message key * No extra entities were introduced (like non-printable message keys) * The localizable messages are marked by _() notation and can be extracted from the source code automatically The things that I have not implemented yet (to save time and make at least something available): * loading the system locale value * reading the locale-specific localization file * converting the localized messages to locale-specific encoding * converting the unlocalized messages from source encoding (US-ASCII) to UTF-16 (wchar_t[]) The issues that I have encountered but haven't yet worked out a solution: * PropertyResourceBundle.getString().c_str() returns the pointer to the stack location. To make it work, I had to use wcsdup(), thus introducing an unacceptable memory leak. I think there must be some way to get the pointer to original bundle contents, but haven't figured out how to achieve it. * PropertyResourceBundle expects the good property format, so the unlocalized messages needs to be mangled to property-compatible form (in the patch below, the only transformation replaced spaces ' ' with underscores '_', but it needs to be generalized). Given the number of issues PropertyResourceBundle introduces, and the number of services it provides (parsing property-format and constructing in-memory hashmap), I think that it would be easier to reimplement the functionality without using PropertyResourceBundle, and change the storage on-disk file format to allow unmangled messages be the keys. === From: Salikh Zakirov [EMAIL PROTECTED] Date: Thu, 13 Jul 2006 12:06:05 +0400 Subject: [PATCH] Dummy l10n implemenation based on Log4cxx --- vm/include/l10n.h | 31 +++ vm/port/include/loggerstring.h |9 + vm/vmcore/src/init/l10n.cpp| 66 vm/vmcore/src/init/vm_main.cpp |2 + 4 files changed, 108 insertions(+), 0 deletions(-) diff --git a/vm/include/l10n.h b/vm/include/l10n.h new file mode 100755 index 000..bb3edfe --- /dev/null +++ b/vm/include/l10n.h @@ -0,0 +1,31 @@ +#ifndef _L10N_H +#define _L10N_H + +#include string +#include log4cxx/helpers/propertyresourcebundle.h +#include log4cxx/helpers/exception.h +#include wchar.h +#include cxxlog.h + +extern log4cxx::helpers::ResourceBundlePtr l10n_resource_bundle; + +inline const wchar_t* _(const wchar_t* message) +{ +if (!l10n_resource_bundle) return message; +try { +wchar_t* mangled = wcsdup(message); +wchar_t* c = mangled; +while (*c) { +if (*c == L' ') *c = L'_'; +c++; +} +std::wstring localized = l10n_resource_bundle-getString(mangled); +free(mangled); +return wcsdup(localized.c_str()); // FIXME: leak +} catch (log4cxx::helpers::MissingResourceException ) {} +return message; +} + +void init_l10n(); + +#endif // _L10N_H diff --git a/vm/port/include/loggerstring.h b/vm/port/include/loggerstring.h old mode 100644 new mode 100755 index 1efe5d2..1eae5c1 --- a/vm/port/include/loggerstring.h +++ b/vm/port/include/loggerstring.h @@ -41,6 +41,15 @@ public: return (const char*)logger_string.c_str(); } +LoggerString operator(const wchar_t* message) { +const wchar_t* c = message; +while (*c) { +logger_string += (char)*c; +c++; +} +return *this; +} + LoggerString operator(const char* message) { logger_string += message; return *this; diff --git a/vm/vmcore/src/init/l10n.cpp b/vm/vmcore/src/init/l10n.cpp new file mode 100755 index 000..c8fd746 --- /dev/null +++ b/vm/vmcore/src/init/l10n.cpp @@ -0,0 +1,66 @@ +#include apr_env.h +#include assert.h +#include fstream +#include string.h + +#include cxxlog.h +#include l10n.h +#include platform_lowlevel.h + +#include log4cxx/helpers/locale.h + +using namespace log4cxx; +using namespace log4cxx::helpers; + +ResourceBundlePtr l10n_resource_bundle; + +void init_l10n() +{ +INFO2(info, starting l10n initialization); + +/* +apr_pool_t *pool; +apr_pool_create(pool, 0); assert(pool); +char *lang = NULL; + +apr_env_get(lang, LANG, pool); +if (!lang) lang = C; + +char *encoding = strchr(lang,'.'); +if (encoding != NULL) { +*encoding = '\0'; +encoding += 1; +} +char *region = strchr(lang,'_'); +if (region != NULL) { +*region = '\0'; +region += 1; +} +INFO2(info, lang = lang , region = region + , encoding = encoding);
Re: [drlvm] proposals for VM internationalization
Geir Magnusson Jr wrote: I'll state the obvious... there is another thread going on about how do to similar things with Classlib. Maybe you can find common ground for message bundles and such... geir 1. The launcher already packages some translations in property-format, it makes me believe that launcher localization was once completed at IBM. Though I wasn't able to find anything about localization in launcher sources. Tim, Mark, could you provide more information about localization already implemented in classlib natives? 2. As far as I can see, the only common thing that natives l10n can have with java l10n is translation files. Generally, this is a good goal, as it would make the translators job more straightforward, keeping the number of formats and message systems at minimum. 3. I personally consider the property-based design of l10n in Java inferior, because it requires the keys to be property-name-compatible (e.g. no spaces), and it often results in developers choosing to introduce short localization key names bearing no meaning. For example, see the harmony_*.properties in classlib: EXEL051=... Should the localization system fail, the only thing that user will get is EXEL051. The developers reading the code which prints localizable message, has no clue too. To find out the value of message, one needs to consult default localization file. Furthermore, when introducing new localizable message, one needs to edit 3(!) different places: add the message code, add the key, and add the printable message to default localization file. This particular design choice is ineffective in using developers' time, is less robust and less maintainable. And if the key names are used in construction of unlocalized messages, then it introduces runtime cost of mangling the unlocalized message to some property-name-compatible form. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
Salikh Zakirov wrote: Geir Magnusson Jr wrote: I'll state the obvious... there is another thread going on about how do to similar things with Classlib. Maybe you can find common ground for message bundles and such... geir 1. The launcher already packages some translations in property-format, it makes me believe that launcher localization was once completed at IBM. Though I wasn't able to find anything about localization in launcher sources. Who cares what was once completed at IBM? They had their reasons, their uses... This is Apache Harmony :) We can do what we feel is best (including keeping what was donated...) Tim, Mark, could you provide more information about localization already implemented in classlib natives? 2. As far as I can see, the only common thing that natives l10n can have with java l10n is translation files. Generally, this is a good goal, as it would make the translators job more straightforward, keeping the number of formats and message systems at minimum. +1 3. I personally consider the property-based design of l10n in Java inferior, because it requires the keys to be property-name-compatible (e.g. no spaces), and it often results in developers choosing to introduce short localization key names bearing no meaning. For example, see the harmony_*.properties in classlib: EXEL051=... Should the localization system fail, the only thing that user will get is EXEL051. Don't we have far bigger problems if the localization system in the JVM fails? The developers reading the code which prints localizable message, has no clue too. To find out the value of message, one needs to consult default localization file. Furthermore, when introducing new localizable message, one needs to edit 3(!) different places: add the message code, add the key, and add the printable message to default localization file. This particular design choice is ineffective in using developers' time, is less robust and less maintainable. And if the key names are used in construction of unlocalized messages, then it introduces runtime cost of mangling the unlocalized message to some property-name-compatible form. I understand what you are saying, and certainly agree that if we can find some way to use meaningful keys, so much the better. I guess the question is what does that cost us, versus the likelyhood that the localization system will fail... geir - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[drlvm] proposals for VM internationalization
Hi Harmony community. I'd like to discuss with you a design for the VM native code internationalization (attached below). We'd like to consider this approach for the DRLVM first of all. However it can be suitable for other parts of Harmony project I suppose. Please let me know your opinions/objections. Thanks, Vladimir . --- Internationalization design *1. Introduction* The VM's output needs to be internationalized in order to provide localized versions of our product. The key idea is to use ResourceBundle class from apache log4cxx which allow to store and effective use bundles with localized messages. The document describes: · ResourceBundle naming conventions for bundles with localized messages. · Structure of* *ResourceBundle file. MessageId (keys for localized message in ResourceBundle) development guidelines. · Requirements. · How it works inside VM. *Definitions: * I18n – internationalization L10n – localization L7d – localized *2. ResourceBundle naming conventions for bundles with localized messages. * We offer to use ResourceBundle class from apache log4cxx as storage of localized messages. At first time all Resourcebundles are files. After VM starts, on VM's logging subsystem initialization stage, logging system chooses appropriate set of ResourceBundles according to values of environment variables: LC_ALL, LC_MESSAGES, and LANG. Chosen ResourceBundles should be used for printing localized messages from VM. E.g. If the environment variable LANG is equal to ru_RU then the following set of ResourceBundles should be used (see naming conventions below): · java_ru_RU.properties · java_ru.properties · java.properties Each file which presents ResourceBundle class should have the following name: *java_language_country_variant.properties *where: _language is a language e.g. _ru (Russian language). It may be empty. _country is a country e.g. _RU (Russian federation ). It may be empty. _variant is a variant. It may be empty. The main ResourceBundle file (with messages on English) should be java.properties. *3. Structure of ResourceBundle file. MessageId development guidelines. * The structure of ResourceBundle file should be the following: MessageId1=localized message1 MessageId2=localized message2 …. Where: MessageId{i} – ASCII string on English language. It should consist of vm's subcomponent name ( e.g. init, port, gc.) and short description of message. E.g. init.help is localized help message from init subcomponent of VM. Localized message{i} – localized message. Localized message can contain parameters. E.g. localized message pattern: This is message on English with two parameters: parameter number one – {0}, and parameter number two – {1}. We can print it again and in back order: {1}, {0}. For the first parameter is equal to integer value 1 and the second is equal to string two the message for pattern above should be: This is message on English with two parameters: parameter number one – 1, and parameter number two – two. We can print it again and in back order: two, 1. * * *4. Requirements. * - All localized messages may be printed through apache log4cxx logger. - Parameters may be present in localized messages. - VM-I18N subsystem should automatically detect user's locale according to values of environment variables. - Minimize performance impact.
Re: [drlvm] proposals for VM internationalization
Hi, Vladimir Log4c and log4cpp are both good tools, but if our requirements are just message internationalization, maybe log4cxx is overkill? After all, as a complete log framework, it provides supports to i18n, category, layout And if we talk about ResourceBundle only, I'd suggest consider ICU4C as a candidate, which provides many i18n features including ResourceBundle[1] support to c as well as c++, and more important, it has been included as Harmony dependencies. Of course, if you think VM needs more complicated log mechanism support, this will be another story. [1]http://icu.sourceforge.net/apiref/icu4c/ Vladimir Gorr wrote: Hi Harmony community. I'd like to discuss with you a design for the VM native code internationalization (attached below). We'd like to consider this approach for the DRLVM first of all. However it can be suitable for other parts of Harmony project I suppose. Please let me know your opinions/objections. Thanks, Vladimir . --- Internationalization design *1. Introduction* The VM's output needs to be internationalized in order to provide localized versions of our product. The key idea is to use ResourceBundle class from apache log4cxx which allow to store and effective use bundles with localized messages. The document describes: · ResourceBundle naming conventions for bundles with localized messages. · Structure of* *ResourceBundle file. MessageId (keys for localized message in ResourceBundle) development guidelines. · Requirements. · How it works inside VM. *Definitions: * I18n – internationalization L10n – localization L7d – localized *2. ResourceBundle naming conventions for bundles with localized messages. * We offer to use ResourceBundle class from apache log4cxx as storage of localized messages. At first time all Resourcebundles are files. After VM starts, on VM's logging subsystem initialization stage, logging system chooses appropriate set of ResourceBundles according to values of environment variables: LC_ALL, LC_MESSAGES, and LANG. Chosen ResourceBundles should be used for printing localized messages from VM. E.g. If the environment variable LANG is equal to ru_RU then the following set of ResourceBundles should be used (see naming conventions below): · java_ru_RU.properties · java_ru.properties · java.properties Each file which presents ResourceBundle class should have the following name: *java_language_country_variant.properties *where: _language is a language e.g. _ru (Russian language). It may be empty. _country is a country e.g. _RU (Russian federation ). It may be empty. _variant is a variant. It may be empty. The main ResourceBundle file (with messages on English) should be java.properties. *3. Structure of ResourceBundle file. MessageId development guidelines. * The structure of ResourceBundle file should be the following: MessageId1=localized message1 MessageId2=localized message2 …. Where: MessageId{i} – ASCII string on English language. It should consist of vm's subcomponent name ( e.g. init, port, gc.) and short description of message. E.g. init.help is localized help message from init subcomponent of VM. Localized message{i} – localized message. Localized message can contain parameters. E.g. localized message pattern: This is message on English with two parameters: parameter number one – {0}, and parameter number two – {1}. We can print it again and in back order: {1}, {0}. For the first parameter is equal to integer value 1 and the second is equal to string two the message for pattern above should be: This is message on English with two parameters: parameter number one – 1, and parameter number two – two. We can print it again and in back order: two, 1. * * *4. Requirements. * - All localized messages may be printed through apache log4cxx logger. - Parameters may be present in localized messages. - VM-I18N subsystem should automatically detect user's locale according to values of environment variables. - Minimize performance impact. -- Paulex Yang China Software Development Lab IBM - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
Vladimir Gorr wrote: Internationalization design *1. Introduction* ... The key idea is to use ResourceBundle class from apache log4cxx which allow to store and effective use bundles with localized messages. Why not use GNU gettext -- de facto standard i18n system on GNU/Linux systems? I think the developers' API can be a designed to allow a wide range of i18n implementations, just like we did with logging. (* DRLVM logging system was designed in such a way, that its implementation could be rewritten completely from scratch. It was in fact rewritten once to use log4cxx. No client code modifications were required *) I think we could devise a simple localization API, which even could be dummy to get us started, like 8- vm/include/l10n.h #define _(x) (x) inline void init_l10n() {} --- Scan over the DRLVM code, mark the translatable strings with _(), and then evolve the l10n system independently of the development efforts. MessageId1=localized message1 MessageId2=localized message2 Where: MessageId{i} – ASCII string on English language. It should consist of vm's subcomponent name ( e.g. init, port, gc.) and short description of message. E.g. init.help is localized help message from init subcomponent of VM. The gettext has an advantage, that the unlocalized messages are used as the keys for the translation, thus, the developers do not need to care about l10n at all. On the other hand, in the system you propose, to create a message, one will need to 1) come up with the message identifier 2) add the message identifier and it's unlocalized text to the resource file and, most annoyingly, 3) consult resource file each time s/he wants to know, what message is printed, because in most cases, the message key will bear no meaning. (* Compare with the issue we've come across recently: SecurityException: K00Ec *) 4) Add to this that most of the developers will not know where the localized messages are kept, and you'll get the situation when most of the messages are not localized in any way. With gettext, localizing for developers is as easy as putting _() around your string message, and leaving *everything* else up to the translators. Even the source code scanning to extract messages that need to be translated is done automatically with 'xgettext'. Localized message can contain parameters. E.g. localized message pattern: This is message on English with two parameters: parameter number one – {0}, ... with gettext, parameters in localized messages is a non-issue. You can use printf or cout with gettext without any restrictions. You even can teach your program to use correct plurals. (* In slavic languages, there is two kind of plurals: 2-4 is dual plural, 5-9 is multiple plural, see the concrete example below *) - All localized messages may be printed through apache log4cxx logger. gettext's job is to translate strings, and then it's up to developer to choose how to print the message, so this requirement is satisfied by gettext. - Minimize performance impact. Below is the simple example of using gettext in a toy application to count apples: ---8--- apples.c #include locale.h #include libintl.h #define _(String) gettext(String) int main() { bindtextdomain(apples, .); textdomain(apples); setlocale(LC_ALL, NULL); printf(_(internationalized message\n)); { int i; for (i = 0; i 27; i++) { printf(ngettext(%d apple\n, %d apples\n, i), i); } } return 0; } ---8--- The translators job then would be to fill in a template with translated messages, like --8 ru/LC_MESSAGES/apples.po msgid internationalized message\n msgstr русское сообщение\n msgid %d apple\n msgid_plural %d apples\n msgstr[0] %d яблоко\n msgstr[1] %d яблока\n msgstr[2] %d яблок\n --- - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
On 7/13/06, Paulex Yang [EMAIL PROTECTED] wrote: Hi, Vladimir Log4c and log4cpp are both good tools, but if our requirements are just message internationalization, maybe log4cxx is overkill? After all, as a complete log framework, it provides supports to i18n, category, layout And if we talk about ResourceBundle only, I'd suggest consider ICU4C as a candidate, which provides many i18n features including ResourceBundle[1] support to c as well as c++, and more important, it has been included as Harmony dependencies. Of course, if you think VM needs more complicated log mechanism support, this will be another story. Yes, this is such case. I suppose some of us will want to read the debug output from JIT in the Chinese language. Why not? However you're right the ICU4C can be also used for some cases. Maybe it makes sense to combine these two mechanisms (one for user messages, other for internal needs). Just the log4cxx is used as logging system for DRLVM and we think a little of efforts will need to internationalize the native code in this case. Thanks, Vladimir. [1]http://icu.sourceforge.net/apiref/icu4c/ Vladimir Gorr wrote: Hi Harmony community. I'd like to discuss with you a design for the VM native code internationalization (attached below). We'd like to consider this approach for the DRLVM first of all. However it can be suitable for other parts of Harmony project I suppose. Please let me know your opinions/objections. Thanks, Vladimir . --- Internationalization design *1. Introduction* The VM's output needs to be internationalized in order to provide localized versions of our product. The key idea is to use ResourceBundle class from apache log4cxx which allow to store and effective use bundles with localized messages. The document describes: · ResourceBundle naming conventions for bundles with localized messages. · Structure of* *ResourceBundle file. MessageId (keys for localized message in ResourceBundle) development guidelines. · Requirements. · How it works inside VM. *Definitions: * I18n – internationalization L10n – localization L7d – localized *2. ResourceBundle naming conventions for bundles with localized messages. * We offer to use ResourceBundle class from apache log4cxx as storage of localized messages. At first time all Resourcebundles are files. After VM starts, on VM's logging subsystem initialization stage, logging system chooses appropriate set of ResourceBundles according to values of environment variables: LC_ALL, LC_MESSAGES, and LANG. Chosen ResourceBundles should be used for printing localized messages from VM. E.g. If the environment variable LANG is equal to ru_RU then the following set of ResourceBundles should be used (see naming conventions below): · java_ru_RU.properties · java_ru.properties · java.properties Each file which presents ResourceBundle class should have the following name: *java_language_country_variant.properties *where: _language is a language e.g. _ru (Russian language). It may be empty. _country is a country e.g. _RU (Russian federation ). It may be empty. _variant is a variant. It may be empty. The main ResourceBundle file (with messages on English) should be java.properties. *3. Structure of ResourceBundle file. MessageId development guidelines. * The structure of ResourceBundle file should be the following: MessageId1=localized message1 MessageId2=localized message2 …. Where: MessageId{i} – ASCII string on English language. It should consist of vm's subcomponent name ( e.g. init, port, gc.) and short description of message. E.g. init.help is localized help message from init subcomponent of VM. Localized message{i} – localized message. Localized message can contain parameters. E.g. localized message pattern: This is message on English with two parameters: parameter number one – {0}, and parameter number two – {1}. We can print it again and in back order: {1}, {0}. For the first parameter is equal to integer value 1 and the second is equal to string two the message for pattern above should be: This is message on English with two parameters: parameter number one – 1, and parameter number two – two. We can print it again and in back order: two, 1. * * *4. Requirements. * - All localized messages may be printed through apache log4cxx logger. - Parameters may be present in localized messages. - VM-I18N subsystem should automatically detect user's locale according to values of environment variables. - Minimize performance impact. -- Paulex Yang China Software Development Lab IBM - Terms of use :
Re: [drlvm] proposals for VM internationalization
Vladimir Gorr wrote: On 7/13/06, Paulex Yang [EMAIL PROTECTED] wrote: Hi, Vladimir Log4c and log4cpp are both good tools, but if our requirements are just message internationalization, maybe log4cxx is overkill? After all, as a complete log framework, it provides supports to i18n, category, layout And if we talk about ResourceBundle only, I'd suggest consider ICU4C as a candidate, which provides many i18n features including ResourceBundle[1] support to c as well as c++, and more important, it has been included as Harmony dependencies. Of course, if you think VM needs more complicated log mechanism support, this will be another story. Yes, this is such case. I suppose some of us will want to read the debug output from JIT in the Chinese language. Why not? Cool! You are right I do want to ;-) ! However you're right the ICU4C can be also used for some cases. Maybe it makes sense to combine these two mechanisms (one for user messages, other for internal needs). Maybe, if the ICU4C usage can introduce less footprint/performance impact to DRLVM in non-debug mode. (I have no idea about this actually, just guess if ICU4C only cares about ResourceBundle while log4cxx has much more things in its mind...) Just the log4cxx is used as logging system for DRLVM and we think a little of efforts will need to internationalize the native code in this case. I see, this is what I expected, thank you to clarify this. :) Thanks, Vladimir. [1]http://icu.sourceforge.net/apiref/icu4c/ Vladimir Gorr wrote: Hi Harmony community. I'd like to discuss with you a design for the VM native code internationalization (attached below). We'd like to consider this approach for the DRLVM first of all. However it can be suitable for other parts of Harmony project I suppose. Please let me know your opinions/objections. Thanks, Vladimir . --- Internationalization design *1. Introduction* The VM's output needs to be internationalized in order to provide localized versions of our product. The key idea is to use ResourceBundle class from apache log4cxx which allow to store and effective use bundles with localized messages. The document describes: · ResourceBundle naming conventions for bundles with localized messages. · Structure of* *ResourceBundle file. MessageId (keys for localized message in ResourceBundle) development guidelines. · Requirements. · How it works inside VM. *Definitions: * I18n – internationalization L10n – localization L7d – localized *2. ResourceBundle naming conventions for bundles with localized messages. * We offer to use ResourceBundle class from apache log4cxx as storage of localized messages. At first time all Resourcebundles are files. After VM starts, on VM's logging subsystem initialization stage, logging system chooses appropriate set of ResourceBundles according to values of environment variables: LC_ALL, LC_MESSAGES, and LANG. Chosen ResourceBundles should be used for printing localized messages from VM. E.g. If the environment variable LANG is equal to ru_RU then the following set of ResourceBundles should be used (see naming conventions below): · java_ru_RU.properties · java_ru.properties · java.properties Each file which presents ResourceBundle class should have the following name: *java_language_country_variant.properties *where: _language is a language e.g. _ru (Russian language). It may be empty. _country is a country e.g. _RU (Russian federation ). It may be empty. _variant is a variant. It may be empty. The main ResourceBundle file (with messages on English) should be java.properties. *3. Structure of ResourceBundle file. MessageId development guidelines. * The structure of ResourceBundle file should be the following: MessageId1=localized message1 MessageId2=localized message2 …. Where: MessageId{i} – ASCII string on English language. It should consist of vm's subcomponent name ( e.g. init, port, gc.) and short description of message. E.g. init.help is localized help message from init subcomponent of VM. Localized message{i} – localized message. Localized message can contain parameters. E.g. localized message pattern: This is message on English with two parameters: parameter number one – {0}, and parameter number two – {1}. We can print it again and in back order: {1}, {0}. For the first parameter is equal to integer value 1 and the second is equal to string two the message for pattern above should be: This is message on English with two parameters: parameter number one – 1, and parameter number two – two. We can print it again and in back order: two, 1. * * *4. Requirements. * - All localized messages may
Re: [drlvm] proposals for VM internationalization
In my opinion using the gettext() for the i18n goals will involve too big re-factoring of source code. I also disagree with the 'no-meaning' for message key. All we need is to create the sensible ID for these messages. *4) Add to this that most of the developers will not know where the localized messages are kept, and you'll get the situation when most of the messages are not localized in any way. * I'm not sure the gettext() will eliminate this issue. Thanks, Vladimir. On 7/13/06, Salikh Zakirov [EMAIL PROTECTED] wrote: Vladimir Gorr wrote: Internationalization design *1. Introduction* ... The key idea is to use ResourceBundle class from apache log4cxx which allow to store and effective use bundles with localized messages. Why not use GNU gettext -- de facto standard i18n system on GNU/Linux systems? I think the developers' API can be a designed to allow a wide range of i18n implementations, just like we did with logging. (* DRLVM logging system was designed in such a way, that its implementation could be rewritten completely from scratch. It was in fact rewritten once to use log4cxx. No client code modifications were required *) I think we could devise a simple localization API, which even could be dummy to get us started, like 8- vm/include/l10n.h #define _(x) (x) inline void init_l10n() {} --- Scan over the DRLVM code, mark the translatable strings with _(), and then evolve the l10n system independently of the development efforts. MessageId1=localized message1 MessageId2=localized message2 Where: MessageId{i} �C ASCII string on English language. It should consist of vm's subcomponent name ( e.g. init, port, gc.) and short description of message. E.g. init.help is localized help message from init subcomponent of VM. The gettext has an advantage, that the unlocalized messages are used as the keys for the translation, thus, the developers do not need to care about l10n at all. On the other hand, in the system you propose, to create a message, one will need to 1) come up with the message identifier 2) add the message identifier and it's unlocalized text to the resource file and, most annoyingly, 3) consult resource file each time s/he wants to know, what message is printed, because in most cases, the message key will bear no meaning. (* Compare with the issue we've come across recently: SecurityException: K00Ec *) 4) Add to this that most of the developers will not know where the localized messages are kept, and you'll get the situation when most of the messages are not localized in any way. With gettext, localizing for developers is as easy as putting _() around your string message, and leaving *everything* else up to the translators. Even the source code scanning to extract messages that need to be translated is done automatically with 'xgettext'. Localized message can contain parameters. E.g. localized message pattern: This is message on English with two parameters: parameter number one �C {0}, ... with gettext, parameters in localized messages is a non-issue. You can use printf or cout with gettext without any restrictions. You even can teach your program to use correct plurals. (* In slavic languages, there is two kind of plurals: 2-4 is dual plural, 5-9 is multiple plural, see the concrete example below *) - All localized messages may be printed through apache log4cxx logger. gettext's job is to translate strings, and then it's up to developer to choose how to print the message, so this requirement is satisfied by gettext. - Minimize performance impact. Below is the simple example of using gettext in a toy application to count apples: ---8--- apples.c #include locale.h #include libintl.h #define _(String) gettext(String) int main() { bindtextdomain(apples, .); textdomain(apples); setlocale(LC_ALL, NULL); printf(_(internationalized message\n)); { int i; for (i = 0; i 27; i++) { printf(ngettext(%d apple\n, %d apples\n, i), i); } } return 0; } ---8--- The translators job then would be to fill in a template with translated messages, like --8 ru/LC_MESSAGES/apples.po msgid internationalized message\n msgstr русское сообщение\n msgid %d apple\n msgid_plural %d apples\n msgstr[0] %d яблоко\n msgstr[1] %d яблока\n msgstr[2] %d яблок\n --- - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
Vladimir Gorr wrote: In my opinion using the gettext() for the i18n goals will involve too big re-factoring of source code. I also disagree with the 'no-meaning' for message key. All we need is to create the sensible ID for these messages. I think this is the case of good intentions, which pave the well-known road. As soon as the message key is *not* the thing that is printed, it is inevitable that the message sensible for one engineer, will have no meaning to the other. Do you think K00Ec is sensible? I think not. Do you think the developer had no chance to choose better key? IMHO, this kind of things must be *enforced*, for example, by making sure that the message key is printed in C locale (default case for most developers). *4) Add to this that most of the developers will not know where the localized messages are kept, and you'll get the situation when most of the messages are not localized in any way. * I'm not sure the gettext() will eliminate this issue. gettext effectively splits this issue into two *independent* tasks: - the task of the developer is to code - the task of the translator is to find translatable messages and translate them The greatest advantage of it is that developer do not need to care about translations, besides putting _() occasionally, and translators do not need to care about coding. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
Vladimir Gorr wrote: I'd like to discuss with you a design for the VM native code internationalization (attached below). ... Please let me know your opinions/objections. To make my point clearer, I would repeat my suggestion. 0) Agree on a design decision that the message key is the *unlocalized message itself*, rather some intermediary constant. 1) Start l10n with the below patch (untested) 2) Start marking localizable strings with _() in the DRLVM source code. The interface is very simple and does not impose any restrictions. 3) Implement the localization in any way we like, be it icu4c, log4cxx or gettext. Or may be even leave it configurable at compile time. --- /dev/null +++ b/vm/include/l10n.h @@ -0,0 +1,8 @@ +#ifndef _L10N_H +#define _L10N_H + +#define _(message) (message) + +void init_l10n(); + +#endif // _L10N_H diff --git a/vm/vmcore/src/init/vm_main.cpp b/vm/vmcore/src/init/vm_main.cpp index 9db56e5..96e9a8c 100644 --- a/vm/vmcore/src/init/vm_main.cpp +++ b/vm/vmcore/src/init/vm_main.cpp @@ -42,6 +42,7 @@ #include dll_jit_intf.h #include dll_gc.h #include em_intf.h #include port_filepath.h +#include l10n.h union Scalar_Arg { int i; @@ -283,6 +284,7 @@ static int run_java_shutdown() void create_vm(Global_Env *p_env, JavaVMInitArgs* vm_arguments) { +init_l10n(); #ifdef PLATFORM_POSIX init_linux_thread_system(); #elif defined(PLATFORM_NT) diff --git a/vm/vmcore/src/l10n.cpp b/vm/vmcore/src/l10n.cpp new file mode 100644 index 000..d9d380a --- /dev/null +++ b/vm/vmcore/src/l10n.cpp @@ -0,0 +1,4 @@ +#include l10n.h + +void init_l10n() { +} - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
On 7/13/06, Salikh Zakirov [EMAIL PROTECTED] wrote: Vladimir Gorr wrote: I'd like to discuss with you a design for the VM native code internationalization (attached below). ... Please let me know your opinions/objections. To make my point clearer, I would repeat my suggestion. 0) Agree on a design decision that the message key is the *unlocalized message itself*, rather some intermediary constant. 1) Start l10n with the below patch (untested) 2) Start marking localizable strings with _() in the DRLVM source code. The interface is very simple and does not impose any restrictions. 3) Implement the localization in any way we like, be it icu4c, log4cxx or gettext. Or may be even leave it configurable at compile time. There is an essential obstacle to use the *gettext* approach. It's impossible to run VM on Windows platform if not to take into account CYGWIN environment. I'm not clear as well how we will merge the previous .po catalogs (already translated) with new ones (when new strings to be added). In any case, a manual work needs for doing this. IMO the gettext is very convenient to generate the initial template of message catalogs. However _() should be inserted for all strings (and then deleted?) to achieve this. It involves too big efforts. Therefore my preference is to use more universal approach, namely, ICU4C or LOG4CXX or combination of them. Any comments? Thanks, Vladimir. --- /dev/null +++ b/vm/include/l10n.h @@ -0,0 +1,8 @@ +#ifndef _L10N_H +#define _L10N_H + +#define _(message) (message) + +void init_l10n(); + +#endif // _L10N_H diff --git a/vm/vmcore/src/init/vm_main.cpp b/vm/vmcore/src/init/vm_main.cpp index 9db56e5..96e9a8c 100644 --- a/vm/vmcore/src/init/vm_main.cpp +++ b/vm/vmcore/src/init/vm_main.cpp @@ -42,6 +42,7 @@ #include dll_jit_intf.h #include dll_gc.h #include em_intf.h #include port_filepath.h +#include l10n.h union Scalar_Arg { int i; @@ -283,6 +284,7 @@ static int run_java_shutdown() void create_vm(Global_Env *p_env, JavaVMInitArgs* vm_arguments) { +init_l10n(); #ifdef PLATFORM_POSIX init_linux_thread_system(); #elif defined(PLATFORM_NT) diff --git a/vm/vmcore/src/l10n.cpp b/vm/vmcore/src/l10n.cpp new file mode 100644 index 000..d9d380a --- /dev/null +++ b/vm/vmcore/src/l10n.cpp @@ -0,0 +1,4 @@ +#include l10n.h + +void init_l10n() { +} - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
Salikh Zakirov wrote: Vladimir Gorr wrote: Internationalization design *1. Introduction* ... The key idea is to use ResourceBundle class from apache log4cxx which allow to store and effective use bundles with localized messages. Why not use GNU gettext -- de facto standard i18n system on GNU/Linux systems? I think the developers' API can be a designed to allow a wide range of i18n implementations, just like we did with logging. Isn't it under the GPL? - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
I'll state the obvious... there is another thread going on about how do to similar things with Classlib. Maybe you can find common ground for message bundles and such... geir Vladimir Gorr wrote: Hi Harmony community. I'd like to discuss with you a design for the VM native code internationalization (attached below). We'd like to consider this approach for the DRLVM first of all. However it can be suitable for other parts of Harmony project I suppose. Please let me know your opinions/objections. Thanks, Vladimir . --- Internationalization design *1. Introduction* The VM's output needs to be internationalized in order to provide localized versions of our product. The key idea is to use ResourceBundle class from apache log4cxx which allow to store and effective use bundles with localized messages. The document describes: · ResourceBundle naming conventions for bundles with localized messages. · Structure of* *ResourceBundle file. MessageId (keys for localized message in ResourceBundle) development guidelines. · Requirements. · How it works inside VM. *Definitions: * I18n – internationalization L10n – localization L7d – localized *2. ResourceBundle naming conventions for bundles with localized messages. * We offer to use ResourceBundle class from apache log4cxx as storage of localized messages. At first time all Resourcebundles are files. After VM starts, on VM's logging subsystem initialization stage, logging system chooses appropriate set of ResourceBundles according to values of environment variables: LC_ALL, LC_MESSAGES, and LANG. Chosen ResourceBundles should be used for printing localized messages from VM. E.g. If the environment variable LANG is equal to ru_RU then the following set of ResourceBundles should be used (see naming conventions below): · java_ru_RU.properties · java_ru.properties · java.properties Each file which presents ResourceBundle class should have the following name: *java_language_country_variant.properties *where: _language is a language e.g. _ru (Russian language). It may be empty. _country is a country e.g. _RU (Russian federation ). It may be empty. _variant is a variant. It may be empty. The main ResourceBundle file (with messages on English) should be java.properties. *3. Structure of ResourceBundle file. MessageId development guidelines. * The structure of ResourceBundle file should be the following: MessageId1=localized message1 MessageId2=localized message2 …. Where: MessageId{i} – ASCII string on English language. It should consist of vm's subcomponent name ( e.g. init, port, gc.) and short description of message. E.g. init.help is localized help message from init subcomponent of VM. Localized message{i} – localized message. Localized message can contain parameters. E.g. localized message pattern: This is message on English with two parameters: parameter number one – {0}, and parameter number two – {1}. We can print it again and in back order: {1}, {0}. For the first parameter is equal to integer value 1 and the second is equal to string two the message for pattern above should be: This is message on English with two parameters: parameter number one – 1, and parameter number two – two. We can print it again and in back order: two, 1. * * *4. Requirements. * - All localized messages may be printed through apache log4cxx logger. - Parameters may be present in localized messages. - VM-I18N subsystem should automatically detect user's locale according to values of environment variables. - Minimize performance impact. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
On 7/13/06, Vladimir Gorr [EMAIL PROTECTED] wrote: On 7/13/06, Salikh Zakirov [EMAIL PROTECTED] wrote: Vladimir Gorr wrote: I'd like to discuss with you a design for the VM native code internationalization (attached below). ... Please let me know your opinions/objections. To make my point clearer, I would repeat my suggestion. 0) Agree on a design decision that the message key is the *unlocalized message itself*, rather some intermediary constant. 1) Start l10n with the below patch (untested) 2) Start marking localizable strings with _() in the DRLVM source code. The interface is very simple and does not impose any restrictions. 3) Implement the localization in any way we like, be it icu4c, log4cxx or gettext. Or may be even leave it configurable at compile time. There is an essential obstacle to use the *gettext* approach. It's impossible to run VM on Windows platform if not to take into account CYGWIN environment. I'm not clear as well how we will merge the previous .po catalogs (already translated) with new ones (when new strings to be added). In any case, a manual work needs for doing this. IMO the gettext is very convenient to generate the initial template of message catalogs. However _() should be inserted for all strings (and then deleted?) to achieve this. It involves too big efforts. Therefore my preference is to use more universal approach, Agree. namely, ICU4C or LOG4CXX or combination of them. As Paulex mentioned, it depends on the requirement. If only for i18n, icu4c is prefered. Otherwise, if grain control of logging is required, log4cxx may be the choice. Thanks! Any comments? Thanks, Vladimir. --- /dev/null +++ b/vm/include/l10n.h @@ -0,0 +1,8 @@ +#ifndef _L10N_H +#define _L10N_H + +#define _(message) (message) + +void init_l10n(); + +#endif // _L10N_H diff --git a/vm/vmcore/src/init/vm_main.cpp b/vm/vmcore/src/init/vm_main.cpp index 9db56e5..96e9a8c 100644 --- a/vm/vmcore/src/init/vm_main.cpp +++ b/vm/vmcore/src/init/vm_main.cpp @@ -42,6 +42,7 @@ #include dll_jit_intf.h #include dll_gc.h #include em_intf.h #include port_filepath.h +#include l10n.h union Scalar_Arg { int i; @@ -283,6 +284,7 @@ static int run_java_shutdown() void create_vm(Global_Env *p_env, JavaVMInitArgs* vm_arguments) { +init_l10n(); #ifdef PLATFORM_POSIX init_linux_thread_system(); #elif defined(PLATFORM_NT) diff --git a/vm/vmcore/src/l10n.cpp b/vm/vmcore/src/l10n.cpp new file mode 100644 index 000..d9d380a --- /dev/null +++ b/vm/vmcore/src/l10n.cpp @@ -0,0 +1,4 @@ +#include l10n.h + +void init_l10n() { +} - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Andrew Zhang China Software Development Lab, IBM
Re: [drlvm] proposals for VM internationalization
Geir Magnusson Jr wrote: Salikh Zakirov wrote: Why not use GNU gettext -- de facto standard i18n system on GNU/Linux systems? Isn't it under the GPL? The runtime part (libintl) is LGPL, so it allows linking to non-GPL programs. The tools are indeed GPL, but Harmony project is not going either link with them. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [drlvm] proposals for VM internationalization
-Original Message- From: Salikh Zakirov [mailto:[EMAIL PROTECTED] Sent: Thursday, July 13, 2006 10:32 AM To: harmony-dev@incubator.apache.org Subject: Re: [drlvm] proposals for VM internationalization Geir Magnusson Jr wrote: Salikh Zakirov wrote: Why not use GNU gettext -- de facto standard i18n system on GNU/Linux systems? Isn't it under the GPL? The runtime part (libintl) is LGPL, so it allows linking to non-GPL programs. The tools are indeed GPL, but Harmony project is not going either link with them. Do you mean there won't be any runtime dependencies? We can't distribute LGPL-ed binaries. Geir - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
Vladimir Gorr wrote: Have you looked at the patch I've sent? It doesn't use gettext. It just proposes the way to move forward towards propertly localized DRLVM. I think we will be able to use ICU4C java-like localization in the following way: * extract localizable strings from .cpp files using xgettext * process resulting .po files and autogenerate resource bundles from them * use resource bundles for translation and distribution There is an essential obstacle to use the *gettext* approach. It's impossible to run VM on Windows platform if not to take into account CYGWIN environment. I do not insist on using gettext, but will answer just for the record: there exists a project to port libintl to native Windows using MinGW [1] I'm not clear as well how we will merge the previous .po catalogs (already translated) with new ones (when new strings to be added). Again, I do not insist on using gettext, however, gettext has the tool exactly for this task: msgmerge [2] However _() should be inserted for all strings (and then deleted?) to achieve this. It involves too big efforts. The effort could be big, but is needed for any localization system we use. The task to classify the messages to translatable (visible to user on a day-to-day basis) and non-translatable (internal errors and debug logging) is needed anyway, because we do not want to overload translators with useless work of translating every string in the project. IMHO, _() marker is visually the prettiest way mark localizable strings. (compared to // NON-NLS comments and resource bundle constants) Therefore my preference is to use more universal approach, namely, ICU4C or LOG4CXX or combination of them. I've looked through Log4cxx manual and haven't found anything concerning both localization and internationalization. By the way, DRLVM already uses Log4cxx. ICU4C provides both internationalization and localization services [3]. It's native system uses ResourceBundles and looks similar to Java localization system, and it suffers from the same drawback: the message keys are constants, which are never printed, but have to be defined and referenced in multiple places. The developer overhead to make a localizable message is as high as * define a new constant in some file * add a message to the default resource bundle and inolves editing multiple files. I have no doubt that this overhead significantly higher than putting three characters to mark the string in _() way. -- Salikh. [1] http://gnuwin32.sourceforge.net/packages/libintl.htm [2] http://www.gnu.org/software/gettext/manual/html_mono/gettext.html#SEC36 [3] http://icu.sourceforge.net/userguide/localizing.html - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
Andrew Zhang wrote: Vladimir Gorr wrote: namely, ICU4C or LOG4CXX or combination of them. log4cxx is already used in DRLVM. It does not provide localization services. If only for i18n, icu4c is prefered. So, would the following solution be acceptable to all? 1 mark the localizable strings with _() in .cpp files 2 write a tool to extract localizable messages from .cpp files and autogenerate ICU4C .txt resource bundles. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
Magnusson, Geir wrote: Do you mean there won't be any runtime dependencies? We can't distribute LGPL-ed binaries. In this case, libintl is definitely out of question. However, I like the simplicity of _() interface. I think we can use it with ICU4C too. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
I don't know if this is being considered, but 1) Classlib has lots of java internationalization needs, and some native internationalization needs 2) DRLVM has lots of native internationalization needs, and some java needs (kernel classes). So it seems clear to me we need to at least try for a common approach. geir Salikh Zakirov wrote: Andrew Zhang wrote: Vladimir Gorr wrote: namely, ICU4C or LOG4CXX or combination of them. log4cxx is already used in DRLVM. It does not provide localization services. If only for i18n, icu4c is prefered. So, would the following solution be acceptable to all? 1 mark the localizable strings with _() in .cpp files 2 write a tool to extract localizable messages from .cpp files and autogenerate ICU4C .txt resource bundles. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
Geir Magnusson Jr wrote: I don't know if this is being considered, but 1) Classlib has lots of java internationalization needs, and some native internationalization needs 2) DRLVM has lots of native internationalization needs, and some java needs (kernel classes). FWIW, I as far as I can figure from both [drlvm] and [classlib] discussions, the topic is *localization*, i.e. providing the user with the messages in native language of the user. Concerning *internationalization* Java code is internationalized by design, and DRLVM needs some fixes to achieve it, at least * accept non-ascii class names in locale-specific encoding For more information about i18n vs l10n, see http://www.w3.org/International/questions/qa-i18n - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm] proposals for VM internationalization
we stand corrected. Both need localization, and we should be able to find commonality in the approaches. geir Salikh Zakirov wrote: Geir Magnusson Jr wrote: I don't know if this is being considered, but 1) Classlib has lots of java internationalization needs, and some native internationalization needs 2) DRLVM has lots of native internationalization needs, and some java needs (kernel classes). FWIW, I as far as I can figure from both [drlvm] and [classlib] discussions, the topic is *localization*, i.e. providing the user with the messages in native language of the user. Concerning *internationalization* Java code is internationalized by design, and DRLVM needs some fixes to achieve it, at least * accept non-ascii class names in locale-specific encoding For more information about i18n vs l10n, see http://www.w3.org/International/questions/qa-i18n - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]