Re: [drlvm] proposals for VM internationalization

2006-07-20 Thread Dmitry Yershov

Hello all.

Salikh Zakirov wrote:

far below are results of my experiments with Log4cxx's ResourceBundle.
(I've managed to find it in Log4cxx documentation after carefully
rereading your original post).

The good news is that it does localization (severely limited).
The prototype has following good properties
* The unlocalized message is used as the message key


The message key should be message pattern (not a message), because
some parameters may be in this message. E.g.:

Message pattern with integer parameter: %d
or
Message pattern with one parameter: {0}


* No extra entities were introduced (like non-printable message keys)


What about very long message pattern (e.g. see help message from
VM)? For these cases messageId key should be used.


* The localizable messages are marked by _() notation and can
be extracted from the source code automatically


To my mind solution is graceful for localizable messages extraction.
But should we care about this? Once, these messages should be gathered
and put into properties file.

I propose the following solution:

Modify VM's LoggerString class. The first parameter of composite
message should be message key. If it equals empty string then the
message should not be localized. E.g.:
WARN(  Not localizable message with two parameters:   1  and  10)
WARN(localizable message with two parameters: %d and %d  1  10)
or
WARN(localizable message with two parameters: {0} and {1}  1  10)



The things that I have not implemented yet (to save time and make at least
something available):
* loading the system locale value
* reading the locale-specific localization file
* converting the localized messages to locale-specific encoding


What do you mean there?


* converting the unlocalized messages from source encoding (US-ASCII) to
UTF-16 (wchar_t[])


There is big question. We can use there char[] strings. Log4cxx
automatically converts char* to wchar_t*.
Also, we can use utf8 coding for wide characters.



The issues that I have encountered but haven't yet worked out a solution:

* PropertyResourceBundle.getString().c_str() returns the
pointer to the stack
location. To make it work, I had to use wcsdup(), thus introducing
an unacceptable memory leak.
I think there must be some way to get the pointer to original bundle
contents,
but haven't figured out how to achieve it.



May be that's the way:

LOG4CXX_DECODE_WCHAR(chstr, wchrstr);
LOG4CXX_ENCODE_CHAR(charstr, chstr);
charstr.c_str()


* PropertyResourceBundle expects the good property format, so the
unlocalized
messages needs to be mangled to property-compatible form
(in the patch below, the only transformation replaced spaces ' ' with
underscores '_',
  but it needs to be generalized).


I agree with you.



Given the number of issues PropertyResourceBundle introduces, and the number
of
services it provides (parsing property-format and constructing in-memory
hashmap),
I think that it would be easier to reimplement the functionality without
using PropertyResourceBundle,
and change the storage on-disk file format to allow unmangled messages be
the keys.



In conclusion there are my suggestions for VM's internationalization:

1. Extend log4cxx::helpers::PropertyResourceBundle class which should
allow lazy (on demand) load of properties.
2. Extend log4cxx::helpers::Properties class to allow string with
spaces as a key.
3. Choose model:
   a. _(message key) – localizable ; message – not localizable
   b. message key – localizable ;  – not localizable.
4. Decide between two variants: printf format specifications or
{number} should be used inside message pattern for parameters.

Thanks Dmitry.


===
From: Salikh Zakirov [EMAIL PROTECTED] 
Date: Thu, 13 Jul 2006 12:06:05 +0400
Subject: [PATCH] Dummy l10n implemenation based on Log4cxx
---
vm/include/l10n.h  |   31 +++
vm/port/include/loggerstring.h |9 +
vm/vmcore/src/init/l10n.cpp|   66

vm/vmcore/src/init/vm_main.cpp |2 +
4 files changed, 108 insertions(+), 0 deletions(-)

diff --git a/vm/include/l10n.h b/vm/include/l10n.h
new file mode 100755
index 000..bb3edfe
--- /dev/null
+++ b/vm/include/l10n.h
@@ -0,0 +1,31 @@
+#ifndef _L10N_H
+#define _L10N_H
+
+#include string
+#include log4cxx/helpers/propertyresourcebundle.h
+#include log4cxx/helpers/exception.h
+#include wchar.h
+#include cxxlog.h
+
+extern log4cxx::helpers::ResourceBundlePtr
l10n_resource_bundle;
+
+inline const wchar_t* _(const wchar_t* message)
+{
+if (!l10n_resource_bundle) return message;
+try {
+wchar_t* mangled = wcsdup(message);
+wchar_t* c = mangled;
+while (*c) {
+if (*c == L' ') *c = L'_';
+c++;
+}
+std::wstring  localized =
l10n_resource_bundle-getString(mangled);
+free(mangled);
+return wcsdup(localized.c_str()); // FIXME: leak
+} catch (log4cxx::helpers::MissingResourceException )

Re: [drlvm] proposals for VM internationalization

2006-07-18 Thread Tim Ellison
Salikh Zakirov wrote:
 Geir Magnusson Jr wrote:
 I'll state the obvious... there is another thread going on about how do
 to similar things with Classlib.  Maybe you can find common ground for
 message bundles and such...

 geir
 
 1. The launcher already packages some translations in property-format,
 it makes me believe that launcher localization was once completed at IBM.
 Though I wasn't able to find anything about localization in launcher sources.
 
 Tim, Mark, could you provide more information about localization already 
 implemented
 in classlib natives?

There is support for getting localized messages from resource files in
the Harmony port library functions.

See:
http://svn.apache.org/viewvc/incubator/harmony/enhanced/classlib/trunk/doc/vm_doc/html/hynls_8c.html?view=co


Regards,
Tim

 2. As far as I can see, the only common thing that natives l10n can have with 
 java l10n
 is translation files. Generally, this is a good goal, as it would make the 
 translators job
 more straightforward, keeping the number of formats and message systems at 
 minimum.
 
 3. I personally consider the property-based design of l10n in Java inferior, 
 because it requires the keys to be property-name-compatible (e.g. no spaces), 
 and
 it often results in developers choosing to introduce short localization key 
 names
 bearing no meaning. For example, see the harmony_*.properties in classlib:
EXEL051=...
 Should the localization system fail, the only thing that user will get is 
 EXEL051.
 The developers reading the code which prints localizable message, has no clue 
 too.
 To find out the value of message, one needs to consult default localization 
 file.
 Furthermore, when introducing new localizable message, one needs to edit 3(!) 
 different places:
 add the message code, add the key, and add the printable message to default 
 localization file.
 This particular design choice is ineffective in using developers' time, is 
 less robust
 and less maintainable. 
 
 And if the key names are used in construction of unlocalized messages, then 
 it introduces
 runtime cost of mangling the unlocalized message to some 
 property-name-compatible form.
 
 
 
 -
 Terms of use : http://incubator.apache.org/harmony/mailing.html
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

-- 

Tim Ellison ([EMAIL PROTECTED])
IBM Java technology centre, UK.

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-17 Thread Salikh Zakirov
Vladimir Gorr wrote:
 Continue this discussion?

Vladimir,

far below are results of my experiments with Log4cxx's ResourceBundle.
(I've managed to find it in Log4cxx documentation after carefully
rereading your original post).

The good news is that it does localization (severely limited).
The prototype has following good properties
* The unlocalized message is used as the message key
* No extra entities were introduced (like non-printable message keys)
* The localizable messages are marked by _() notation and can
  be extracted from the source code automatically

The things that I have not implemented yet (to save time and make at least
something available):
* loading the system locale value
* reading the locale-specific localization file
* converting the localized messages to locale-specific encoding
* converting the unlocalized messages from source encoding (US-ASCII) to UTF-16 
(wchar_t[])

The issues that I have encountered but haven't yet worked out a solution:

* PropertyResourceBundle.getString().c_str() returns the pointer to the stack
  location. To make it work, I had to use wcsdup(), thus introducing
  an unacceptable memory leak.
  I think there must be some way to get the pointer to original bundle contents,
  but haven't figured out how to achieve it.

* PropertyResourceBundle expects the good property format, so the unlocalized
  messages needs to be mangled to property-compatible form
  (in the patch below, the only transformation replaced spaces ' ' with 
underscores '_',
   but it needs to be generalized).

Given the number of issues PropertyResourceBundle introduces, and the number of 
services it provides (parsing property-format and constructing in-memory 
hashmap),
I think that it would be easier to reimplement the functionality without using 
PropertyResourceBundle,
and change the storage on-disk file format to allow unmangled messages be the 
keys.

===
From: Salikh Zakirov [EMAIL PROTECTED]
Date: Thu, 13 Jul 2006 12:06:05 +0400
Subject: [PATCH] Dummy l10n implemenation based on Log4cxx
---
 vm/include/l10n.h  |   31 +++
 vm/port/include/loggerstring.h |9 +
 vm/vmcore/src/init/l10n.cpp|   66 
 vm/vmcore/src/init/vm_main.cpp |2 +
 4 files changed, 108 insertions(+), 0 deletions(-)

diff --git a/vm/include/l10n.h b/vm/include/l10n.h
new file mode 100755
index 000..bb3edfe
--- /dev/null
+++ b/vm/include/l10n.h
@@ -0,0 +1,31 @@
+#ifndef _L10N_H
+#define _L10N_H
+
+#include string
+#include log4cxx/helpers/propertyresourcebundle.h
+#include log4cxx/helpers/exception.h
+#include wchar.h
+#include cxxlog.h
+
+extern log4cxx::helpers::ResourceBundlePtr l10n_resource_bundle;
+
+inline const wchar_t* _(const wchar_t* message)
+{
+if (!l10n_resource_bundle) return message;
+try {
+wchar_t* mangled = wcsdup(message);
+wchar_t* c = mangled;
+while (*c) { 
+if (*c == L' ') *c = L'_';
+c++;
+}
+std::wstring  localized = l10n_resource_bundle-getString(mangled);
+free(mangled);
+return wcsdup(localized.c_str()); // FIXME: leak
+} catch (log4cxx::helpers::MissingResourceException ) {}
+return message;
+}
+
+void init_l10n();
+
+#endif // _L10N_H
diff --git a/vm/port/include/loggerstring.h b/vm/port/include/loggerstring.h
old mode 100644
new mode 100755
index 1efe5d2..1eae5c1
--- a/vm/port/include/loggerstring.h
+++ b/vm/port/include/loggerstring.h
@@ -41,6 +41,15 @@ public:
 return (const char*)logger_string.c_str();
 }
 
+LoggerString operator(const wchar_t* message) {
+const wchar_t* c = message;
+while (*c) {
+logger_string += (char)*c;
+c++;
+}
+return *this;
+}
+
 LoggerString operator(const char* message) {
 logger_string += message;
 return *this;
diff --git a/vm/vmcore/src/init/l10n.cpp b/vm/vmcore/src/init/l10n.cpp
new file mode 100755
index 000..c8fd746
--- /dev/null
+++ b/vm/vmcore/src/init/l10n.cpp
@@ -0,0 +1,66 @@
+#include apr_env.h
+#include assert.h
+#include fstream
+#include string.h
+
+#include cxxlog.h
+#include l10n.h
+#include platform_lowlevel.h
+
+#include log4cxx/helpers/locale.h
+
+using namespace log4cxx;
+using namespace log4cxx::helpers;
+
+ResourceBundlePtr l10n_resource_bundle;
+
+void init_l10n()
+{
+INFO2(info, starting l10n initialization);
+
+/*
+apr_pool_t *pool;
+apr_pool_create(pool, 0); assert(pool);
+char *lang = NULL;
+
+apr_env_get(lang, LANG, pool);
+if (!lang) lang = C;
+
+char *encoding = strchr(lang,'.');
+if (encoding != NULL) {
+*encoding = '\0';
+encoding += 1;
+}
+char *region = strchr(lang,'_');
+if (region != NULL) {
+*region = '\0';
+region += 1;
+}
+INFO2(info, lang =   lang  ,   region =   region
+ , encoding =   encoding);

Re: [drlvm] proposals for VM internationalization

2006-07-17 Thread Salikh Zakirov
Geir Magnusson Jr wrote:
 I'll state the obvious... there is another thread going on about how do
 to similar things with Classlib.  Maybe you can find common ground for
 message bundles and such...
 
 geir

1. The launcher already packages some translations in property-format,
it makes me believe that launcher localization was once completed at IBM.
Though I wasn't able to find anything about localization in launcher sources.

Tim, Mark, could you provide more information about localization already 
implemented
in classlib natives?

2. As far as I can see, the only common thing that natives l10n can have with 
java l10n
is translation files. Generally, this is a good goal, as it would make the 
translators job
more straightforward, keeping the number of formats and message systems at 
minimum.

3. I personally consider the property-based design of l10n in Java inferior, 
because it requires the keys to be property-name-compatible (e.g. no spaces), 
and
it often results in developers choosing to introduce short localization key 
names
bearing no meaning. For example, see the harmony_*.properties in classlib:
   EXEL051=...
Should the localization system fail, the only thing that user will get is 
EXEL051.
The developers reading the code which prints localizable message, has no clue 
too.
To find out the value of message, one needs to consult default localization 
file.
Furthermore, when introducing new localizable message, one needs to edit 3(!) 
different places:
add the message code, add the key, and add the printable message to default 
localization file.
This particular design choice is ineffective in using developers' time, is less 
robust
and less maintainable. 

And if the key names are used in construction of unlocalized messages, then it 
introduces
runtime cost of mangling the unlocalized message to some 
property-name-compatible form.



-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-17 Thread Geir Magnusson Jr


Salikh Zakirov wrote:
 Geir Magnusson Jr wrote:
 I'll state the obvious... there is another thread going on about how do
 to similar things with Classlib.  Maybe you can find common ground for
 message bundles and such...

 geir
 
 1. The launcher already packages some translations in property-format,
 it makes me believe that launcher localization was once completed at IBM.
 Though I wasn't able to find anything about localization in launcher sources.

Who cares what was once completed at IBM?  They had their reasons, their
uses... This is Apache Harmony :)  We can do what we feel is best
(including keeping what was donated...)

 Tim, Mark, could you provide more information about localization already 
 implemented
 in classlib natives?
 
 2. As far as I can see, the only common thing that natives l10n can have with 
 java l10n
 is translation files. Generally, this is a good goal, as it would make the 
 translators job
 more straightforward, keeping the number of formats and message systems at 
 minimum.

+1

 
 3. I personally consider the property-based design of l10n in Java inferior, 
 because it requires the keys to be property-name-compatible (e.g. no spaces), 
 and
 it often results in developers choosing to introduce short localization key 
 names
 bearing no meaning. For example, see the harmony_*.properties in classlib:
EXEL051=...
 Should the localization system fail, the only thing that user will get is 
 EXEL051.

Don't we have far bigger problems if the localization system in the JVM
fails?

 The developers reading the code which prints localizable message, has no clue 
 too.
 To find out the value of message, one needs to consult default localization 
 file.
 Furthermore, when introducing new localizable message, one needs to edit 3(!) 
 different places:
 add the message code, add the key, and add the printable message to default 
 localization file.
 This particular design choice is ineffective in using developers' time, is 
 less robust
 and less maintainable. 
 
 And if the key names are used in construction of unlocalized messages, then 
 it introduces
 runtime cost of mangling the unlocalized message to some 
 property-name-compatible form.

I understand what you are saying, and certainly agree that if we can
find some way to use meaningful keys, so much the better.  I guess the
question is what does that cost us, versus the likelyhood that the
localization system will fail...

geir


-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[drlvm] proposals for VM internationalization

2006-07-13 Thread Vladimir Gorr

Hi Harmony community.



I'd like to discuss with you a design for the VM native code
internationalization (attached below).

We'd like to consider this approach for the DRLVM first of all. However it
can be suitable for other parts of Harmony project I suppose.

Please let me know your opinions/objections.



Thanks,

Vladimir .



---

Internationalization design *1. Introduction*



The VM's output needs to be internationalized in order to provide localized
versions of our product.

The key idea is to use ResourceBundle class from apache log4cxx which allow
to store and effective use bundles with localized messages.

The document describes:

· ResourceBundle naming conventions for bundles with localized
messages.

· Structure of* *ResourceBundle file. MessageId (keys for
localized message in ResourceBundle) development guidelines.

· Requirements.

· How it works inside VM.



*Definitions: *



I18n – internationalization

L10n – localization

L7d – localized



*2. ResourceBundle naming conventions for bundles with localized messages. *



We offer to use ResourceBundle class from apache log4cxx as storage of
localized messages. At first time all Resourcebundles are files.

After VM starts, on VM's logging subsystem initialization stage, logging
system chooses appropriate set of ResourceBundles

according to values of environment variables: LC_ALL, LC_MESSAGES, and LANG.

Chosen ResourceBundles should be used for printing localized messages from
VM.



E.g. If the environment variable LANG is equal to ru_RU then the following
set of ResourceBundles should be used (see naming conventions below):

· java_ru_RU.properties

· java_ru.properties

· java.properties



Each file which presents ResourceBundle class should have the following
name:

*java_language_country_variant.properties *where:



_language is a language e.g. _ru (Russian language). It may be empty.

_country is a country e.g. _RU (Russian federation ). It may be empty.

_variant is a variant. It may be empty.



   The main ResourceBundle file (with messages on English) should
be java.properties.



*3. Structure of ResourceBundle file. MessageId development guidelines. *



The structure of ResourceBundle file should be the following:



MessageId1=localized message1

MessageId2=localized message2

….

Where:

MessageId{i} – ASCII string on English language. It should consist of vm's
subcomponent name ( e.g. init, port, gc.) and short description of message.

E.g. init.help is localized help message from init subcomponent of VM.

Localized message{i} – localized message.



Localized message can contain parameters. E.g. localized message pattern:
This is message on English with two parameters: parameter number one – {0},


and parameter number two – {1}. We can print it again and in back order:
{1}, {0}.   For the first parameter is equal to integer value 1

and the second is equal to string two the message for pattern above should
be:

This is message on English with two parameters: parameter number one – 1,
and parameter number two – two. We can print it again and in back order:
two, 1.

*  *

*4. Requirements. *



  - All localized messages may be printed through apache log4cxx logger.

  - Parameters may be present in localized messages.
  - VM-I18N subsystem should automatically detect user's locale
  according to values of environment variables.
  - Minimize performance impact.


Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Paulex Yang

Hi, Vladimir

Log4c and log4cpp are both good tools, but if our requirements are just 
message internationalization, maybe log4cxx is overkill? After all, as a 
complete log framework, it provides supports to  i18n, category, layout


And if we talk about ResourceBundle only,  I'd suggest consider ICU4C as 
a candidate, which provides many i18n features including 
ResourceBundle[1] support to c as well as c++, and more important, it 
has been included as Harmony dependencies.


Of course, if you think VM needs more complicated log mechanism support, 
this will be another story.


[1]http://icu.sourceforge.net/apiref/icu4c/

Vladimir Gorr wrote:

Hi Harmony community.



I'd like to discuss with you a design for the VM native code
internationalization (attached below).

We'd like to consider this approach for the DRLVM first of all. 
However it

can be suitable for other parts of Harmony project I suppose.

Please let me know your opinions/objections.



Thanks,

Vladimir .



--- 



Internationalization design *1. Introduction*



The VM's output needs to be internationalized in order to provide 
localized

versions of our product.

The key idea is to use ResourceBundle class from apache log4cxx which 
allow

to store and effective use bundles with localized messages.

The document describes:

· ResourceBundle naming conventions for bundles with localized
messages.

· Structure of* *ResourceBundle file. MessageId (keys for
localized message in ResourceBundle) development guidelines.

· Requirements.

· How it works inside VM.



*Definitions: *



I18n – internationalization

L10n – localization

L7d – localized



*2. ResourceBundle naming conventions for bundles with localized 
messages. *




We offer to use ResourceBundle class from apache log4cxx as storage of
localized messages. At first time all Resourcebundles are files.

After VM starts, on VM's logging subsystem initialization stage, logging
system chooses appropriate set of ResourceBundles

according to values of environment variables: LC_ALL, LC_MESSAGES, and 
LANG.


Chosen ResourceBundles should be used for printing localized messages 
from

VM.



E.g. If the environment variable LANG is equal to ru_RU then the 
following

set of ResourceBundles should be used (see naming conventions below):

· java_ru_RU.properties

· java_ru.properties

· java.properties



Each file which presents ResourceBundle class should have the following
name:

*java_language_country_variant.properties *where:



_language is a language e.g. _ru (Russian language). It may be empty.

_country is a country e.g. _RU (Russian federation ). It may be 
empty.


_variant is a variant. It may be empty.



   The main ResourceBundle file (with messages on English) should
be java.properties.



*3. Structure of ResourceBundle file. MessageId development 
guidelines. *




The structure of ResourceBundle file should be the following:



MessageId1=localized message1

MessageId2=localized message2

….

Where:

MessageId{i} – ASCII string on English language. It should consist of 
vm's
subcomponent name ( e.g. init, port, gc.) and short description of 
message.


E.g. init.help is localized help message from init subcomponent of 
VM.


Localized message{i} – localized message.



Localized message can contain parameters. E.g. localized message pattern:
This is message on English with two parameters: parameter number one 
– {0},



and parameter number two – {1}. We can print it again and in back order:
{1}, {0}.   For the first parameter is equal to integer value 1

and the second is equal to string two the message for pattern above 
should

be:

This is message on English with two parameters: parameter number one 
– 1,

and parameter number two – two. We can print it again and in back order:
two, 1.

*  *

*4. Requirements. *



  - All localized messages may be printed through apache log4cxx logger.

  - Parameters may be present in localized messages.
  - VM-I18N subsystem should automatically detect user's locale
  according to values of environment variables.
  - Minimize performance impact.




--
Paulex Yang
China Software Development Lab
IBM



-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Salikh Zakirov
Vladimir Gorr wrote:
 Internationalization design *1. Introduction*
 ...
 The key idea is to use ResourceBundle class from apache log4cxx which allow
 to store and effective use bundles with localized messages.

Why not use GNU gettext -- de facto standard i18n system on GNU/Linux systems?
I think the developers' API can be a designed to allow a wide range of
i18n implementations, just like we did with logging.

(* DRLVM logging system was designed in such a way, that its implementation
   could be rewritten completely from scratch. It was in fact rewritten once
   to use log4cxx. No client code modifications were required *)

I think we could devise a simple localization API, which even could be dummy
to get us started, like

8- vm/include/l10n.h
#define _(x) (x)
inline void init_l10n() {}
---

Scan over the DRLVM code, mark the translatable strings with _(),
and then evolve the l10n system independently of the development efforts.

 MessageId1=localized message1
 
 MessageId2=localized message2
 
 Where:
 
 MessageId{i} – ASCII string on English language. It should consist of vm's
 subcomponent name ( e.g. init, port, gc.) and short description of message.
 
 E.g. init.help is localized help message from init subcomponent of VM.

The gettext has an advantage, that the unlocalized messages are used as the
keys for the translation, thus, the developers do not need to care about
l10n at all. 

On the other hand, in the system you propose, to create a message,
one will need to
1) come up with the message identifier
2) add the message identifier and it's unlocalized text to the resource file

and, most annoyingly,

3) consult resource file each time s/he wants to know, what message is printed,
because in most cases, the message key will bear no meaning.

(* Compare with the issue we've come across recently: SecurityException: 
K00Ec *)

4) Add to this that most of the developers will not know where the localized 
messages are kept,
and you'll get the situation when most of the messages are not localized in any 
way.

With gettext, localizing for developers is as easy as putting _() around your 
string message,
and leaving *everything* else up to the translators. Even the source code 
scanning to extract
messages that need to be translated is done automatically with 'xgettext'.

 Localized message can contain parameters. E.g. localized message pattern:
 This is message on English with two parameters: parameter number one –
 {0}, ...

with gettext, parameters in localized messages is a non-issue. You can use 
printf
or cout with gettext without any restrictions. You even can teach your program
to use correct plurals.

(* In slavic languages, there is two kind of plurals: 2-4 is dual plural, 5-9 
is multiple plural,
see the concrete example below *)

   - All localized messages may be printed through apache log4cxx logger.

gettext's job is to translate strings, and then it's up to developer to choose
how to print the message, so this requirement is satisfied by gettext.

   - Minimize performance impact.


Below is the simple example of using gettext in a toy application to count 
apples:

---8--- apples.c
#include locale.h
#include libintl.h

#define _(String) gettext(String)

int main() {

bindtextdomain(apples, .);
textdomain(apples);
setlocale(LC_ALL, NULL);

printf(_(internationalized message\n));
{ 
int i;
for (i = 0; i  27; i++) {
printf(ngettext(%d apple\n, %d apples\n, i), i);
}
}
return 0;
}
---8---

The translators job then would be to fill in a template with translated 
messages, like
--8 ru/LC_MESSAGES/apples.po
msgid internationalized message\n
msgstr русское сообщение\n

msgid %d apple\n
msgid_plural %d apples\n
msgstr[0] %d яблоко\n 
msgstr[1] %d яблока\n 
msgstr[2] %d яблок\n
---


-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Vladimir Gorr

On 7/13/06, Paulex Yang [EMAIL PROTECTED] wrote:


Hi, Vladimir

Log4c and log4cpp are both good tools, but if our requirements are just
message internationalization, maybe log4cxx is overkill? After all, as a
complete log framework, it provides supports to  i18n, category,
layout

And if we talk about ResourceBundle only,  I'd suggest consider ICU4C as
a candidate, which provides many i18n features including
ResourceBundle[1] support to c as well as c++, and more important, it
has been included as Harmony dependencies.

Of course, if you think VM needs more complicated log mechanism support,
this will be another story.



Yes, this is such case. I suppose some of us will want to read the debug
output from JIT
in the Chinese language. Why not? However you're right the ICU4C can be also
used for some cases.
Maybe it makes sense to combine these two mechanisms (one for user messages,
other for internal needs).
Just the log4cxx is used as logging system for DRLVM and we think a little
of efforts will need to internationalize
the native code in this case.

Thanks,
Vladimir.



[1]http://icu.sourceforge.net/apiref/icu4c/

Vladimir Gorr wrote:
 Hi Harmony community.



 I'd like to discuss with you a design for the VM native code
 internationalization (attached below).

 We'd like to consider this approach for the DRLVM first of all.
 However it
 can be suitable for other parts of Harmony project I suppose.

 Please let me know your opinions/objections.



 Thanks,

 Vladimir .




---


 Internationalization design *1. Introduction*



 The VM's output needs to be internationalized in order to provide
 localized
 versions of our product.

 The key idea is to use ResourceBundle class from apache log4cxx which
 allow
 to store and effective use bundles with localized messages.

 The document describes:

 · ResourceBundle naming conventions for bundles with localized
 messages.

 · Structure of* *ResourceBundle file. MessageId (keys for
 localized message in ResourceBundle) development guidelines.

 · Requirements.

 · How it works inside VM.



 *Definitions: *



 I18n – internationalization

 L10n – localization

 L7d – localized



 *2. ResourceBundle naming conventions for bundles with localized
 messages. *



 We offer to use ResourceBundle class from apache log4cxx as storage of
 localized messages. At first time all Resourcebundles are files.

 After VM starts, on VM's logging subsystem initialization stage, logging
 system chooses appropriate set of ResourceBundles

 according to values of environment variables: LC_ALL, LC_MESSAGES, and
 LANG.

 Chosen ResourceBundles should be used for printing localized messages
 from
 VM.



 E.g. If the environment variable LANG is equal to ru_RU then the
 following
 set of ResourceBundles should be used (see naming conventions below):

 · java_ru_RU.properties

 · java_ru.properties

 · java.properties



 Each file which presents ResourceBundle class should have the following
 name:

 *java_language_country_variant.properties *where:



 _language is a language e.g. _ru (Russian language). It may be
empty.

 _country is a country e.g. _RU (Russian federation ). It may be
 empty.

 _variant is a variant. It may be empty.



The main ResourceBundle file (with messages on English)
should
 be java.properties.



 *3. Structure of ResourceBundle file. MessageId development
 guidelines. *



 The structure of ResourceBundle file should be the following:



 MessageId1=localized message1

 MessageId2=localized message2

 ….

 Where:

 MessageId{i} – ASCII string on English language. It should consist of
 vm's
 subcomponent name ( e.g. init, port, gc.) and short description of
 message.

 E.g. init.help is localized help message from init subcomponent of
 VM.

 Localized message{i} – localized message.



 Localized message can contain parameters. E.g. localized message
pattern:
 This is message on English with two parameters: parameter number one
 – {0},


 and parameter number two – {1}. We can print it again and in back order:
 {1}, {0}.   For the first parameter is equal to integer value 1

 and the second is equal to string two the message for pattern above
 should
 be:

 This is message on English with two parameters: parameter number one
 – 1,
 and parameter number two – two. We can print it again and in back order:
 two, 1.

 *  *

 *4. Requirements. *



   - All localized messages may be printed through apache log4cxx logger.

   - Parameters may be present in localized messages.
   - VM-I18N subsystem should automatically detect user's locale
   according to values of environment variables.
   - Minimize performance impact.



--
Paulex Yang
China Software Development Lab
IBM



-
Terms of use : 

Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Paulex Yang

Vladimir Gorr wrote:

On 7/13/06, Paulex Yang [EMAIL PROTECTED] wrote:


Hi, Vladimir

Log4c and log4cpp are both good tools, but if our requirements are just
message internationalization, maybe log4cxx is overkill? After all, as a
complete log framework, it provides supports to  i18n, category,
layout

And if we talk about ResourceBundle only,  I'd suggest consider ICU4C as
a candidate, which provides many i18n features including
ResourceBundle[1] support to c as well as c++, and more important, it
has been included as Harmony dependencies.

Of course, if you think VM needs more complicated log mechanism support,
this will be another story.



Yes, this is such case. I suppose some of us will want to read the debug
output from JIT
in the Chinese language. Why not? 

Cool! You are right I do want to ;-) !

However you're right the ICU4C can be also
used for some cases.
Maybe it makes sense to combine these two mechanisms (one for user 
messages,

other for internal needs).
Maybe, if the ICU4C usage can introduce less footprint/performance 
impact to DRLVM in non-debug mode. (I have no idea about this actually, 
just guess if ICU4C only cares about ResourceBundle while log4cxx has 
much more things in its mind...)
Just the log4cxx is used as logging system for DRLVM and we think a 
little

of efforts will need to internationalize
the native code in this case.

I see, this is what I expected, thank you to clarify this. :)


Thanks,
Vladimir.



[1]http://icu.sourceforge.net/apiref/icu4c/

Vladimir Gorr wrote:
 Hi Harmony community.



 I'd like to discuss with you a design for the VM native code
 internationalization (attached below).

 We'd like to consider this approach for the DRLVM first of all.
 However it
 can be suitable for other parts of Harmony project I suppose.

 Please let me know your opinions/objections.



 Thanks,

 Vladimir .




--- 




 Internationalization design *1. Introduction*



 The VM's output needs to be internationalized in order to provide
 localized
 versions of our product.

 The key idea is to use ResourceBundle class from apache log4cxx which
 allow
 to store and effective use bundles with localized messages.

 The document describes:

 · ResourceBundle naming conventions for bundles with localized
 messages.

 · Structure of* *ResourceBundle file. MessageId (keys for
 localized message in ResourceBundle) development guidelines.

 · Requirements.

 · How it works inside VM.



 *Definitions: *



 I18n – internationalization

 L10n – localization

 L7d – localized



 *2. ResourceBundle naming conventions for bundles with localized
 messages. *



 We offer to use ResourceBundle class from apache log4cxx as storage of
 localized messages. At first time all Resourcebundles are files.

 After VM starts, on VM's logging subsystem initialization stage, 
logging

 system chooses appropriate set of ResourceBundles

 according to values of environment variables: LC_ALL, LC_MESSAGES, and
 LANG.

 Chosen ResourceBundles should be used for printing localized messages
 from
 VM.



 E.g. If the environment variable LANG is equal to ru_RU then the
 following
 set of ResourceBundles should be used (see naming conventions below):

 · java_ru_RU.properties

 · java_ru.properties

 · java.properties



 Each file which presents ResourceBundle class should have the 
following

 name:

 *java_language_country_variant.properties *where:



 _language is a language e.g. _ru (Russian language). It may be
empty.

 _country is a country e.g. _RU (Russian federation ). It may be
 empty.

 _variant is a variant. It may be empty.



The main ResourceBundle file (with messages on English)
should
 be java.properties.



 *3. Structure of ResourceBundle file. MessageId development
 guidelines. *



 The structure of ResourceBundle file should be the following:



 MessageId1=localized message1

 MessageId2=localized message2

 ….

 Where:

 MessageId{i} – ASCII string on English language. It should consist of
 vm's
 subcomponent name ( e.g. init, port, gc.) and short description of
 message.

 E.g. init.help is localized help message from init subcomponent of
 VM.

 Localized message{i} – localized message.



 Localized message can contain parameters. E.g. localized message
pattern:
 This is message on English with two parameters: parameter number one
 – {0},


 and parameter number two – {1}. We can print it again and in back 
order:

 {1}, {0}.   For the first parameter is equal to integer value 1

 and the second is equal to string two the message for pattern above
 should
 be:

 This is message on English with two parameters: parameter number one
 – 1,
 and parameter number two – two. We can print it again and in back 
order:

 two, 1.

 *  *

 *4. Requirements. *



   - All localized messages may 

Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Vladimir Gorr

In my opinion using the gettext() for the i18n goals will involve too big
re-factoring of source code.
I also disagree with the 'no-meaning' for message key. All we need is to
create the sensible ID for these messages.

*4) Add to this that most of the developers will not know where the
localized messages are kept,
and you'll get the situation when most of the messages are not localized in
any way.
*
I'm not sure the gettext() will eliminate this issue.

Thanks,
Vladimir.

On 7/13/06, Salikh Zakirov [EMAIL PROTECTED] wrote:


Vladimir Gorr wrote:
 Internationalization design *1. Introduction*
 ...
 The key idea is to use ResourceBundle class from apache log4cxx which
allow
 to store and effective use bundles with localized messages.

Why not use GNU gettext -- de facto standard i18n system on GNU/Linux
systems?
I think the developers' API can be a designed to allow a wide range of
i18n implementations, just like we did with logging.

(* DRLVM logging system was designed in such a way, that its
implementation
  could be rewritten completely from scratch. It was in fact rewritten
once
  to use log4cxx. No client code modifications were required *)

I think we could devise a simple localization API, which even could be
dummy
to get us started, like

8- vm/include/l10n.h
#define _(x) (x)
inline void init_l10n() {}
---

Scan over the DRLVM code, mark the translatable strings with _(),
and then evolve the l10n system independently of the development efforts.

 MessageId1=localized message1

 MessageId2=localized message2

 Where:

 MessageId{i} �C ASCII string on English language. It should consist of
vm's
 subcomponent name ( e.g. init, port, gc.) and short description of
message.

 E.g. init.help is localized help message from init subcomponent of
VM.

The gettext has an advantage, that the unlocalized messages are used as
the
keys for the translation, thus, the developers do not need to care about
l10n at all.

On the other hand, in the system you propose, to create a message,
one will need to
1) come up with the message identifier
2) add the message identifier and it's unlocalized text to the resource
file

and, most annoyingly,

3) consult resource file each time s/he wants to know, what message is
printed,
because in most cases, the message key will bear no meaning.

(* Compare with the issue we've come across recently: SecurityException:
K00Ec *)

4) Add to this that most of the developers will not know where the
localized messages are kept,
and you'll get the situation when most of the messages are not localized
in any way.

With gettext, localizing for developers is as easy as putting _() around
your string message,
and leaving *everything* else up to the translators. Even the source code
scanning to extract
messages that need to be translated is done automatically with 'xgettext'.

 Localized message can contain parameters. E.g. localized message
pattern:
 This is message on English with two parameters: parameter number one �C
 {0}, ...

with gettext, parameters in localized messages is a non-issue. You can use
printf
or cout with gettext without any restrictions. You even can teach your
program
to use correct plurals.

(* In slavic languages, there is two kind of plurals: 2-4 is dual
plural, 5-9 is multiple plural,
see the concrete example below *)

   - All localized messages may be printed through apache log4cxx logger.

gettext's job is to translate strings, and then it's up to developer to
choose
how to print the message, so this requirement is satisfied by gettext.

   - Minimize performance impact.


Below is the simple example of using gettext in a toy application to count
apples:

---8--- apples.c
#include locale.h
#include libintl.h

#define _(String) gettext(String)

int main() {

   bindtextdomain(apples, .);
   textdomain(apples);
   setlocale(LC_ALL, NULL);

   printf(_(internationalized message\n));
   {
   int i;
   for (i = 0; i  27; i++) {
   printf(ngettext(%d apple\n, %d apples\n, i), i);
   }
   }
   return 0;
}
---8---

The translators job then would be to fill in a template with translated
messages, like
--8 ru/LC_MESSAGES/apples.po
msgid internationalized message\n
msgstr русское сообщение\n

msgid %d apple\n
msgid_plural %d apples\n
msgstr[0] %d яблоко\n
msgstr[1] %d яблока\n
msgstr[2] %d яблок\n
---


-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Salikh Zakirov
Vladimir Gorr wrote:
 In my opinion using the gettext() for the i18n goals will involve too big
 re-factoring of source code.
 I also disagree with the 'no-meaning' for message key. All we need is to
 create the sensible ID for these messages.

I think this is the case of good intentions, which pave the well-known road.
As soon as the message key is *not* the thing that is printed, 
it is inevitable that the message sensible for one engineer, will have no 
meaning
to the other.

Do you think K00Ec is sensible? I think not.
Do you think the developer had no chance to choose better key?
IMHO, this kind of things must be *enforced*, for example, by making
sure that the message key is printed in C locale (default case for most 
developers).

 *4) Add to this that most of the developers will not know where the
 localized messages are kept,
 and you'll get the situation when most of the messages are not localized in
 any way.
 *
 I'm not sure the gettext() will eliminate this issue.

gettext effectively splits this issue into two *independent* tasks:
- the task of the developer is to code
- the task of the translator is to find translatable messages and translate them

The greatest advantage of it is that developer do not need to care about 
translations,
besides putting _() occasionally, and translators do not need to care about 
coding.

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Salikh Zakirov
Vladimir Gorr wrote:
 I'd like to discuss with you a design for the VM native code
 internationalization (attached below).
 ...
 Please let me know your opinions/objections.

To make my point clearer, I would repeat my suggestion.

0) Agree on a design decision that the message key
   is the *unlocalized message itself*, rather some intermediary constant.

1) Start l10n with the below patch (untested)

2) Start marking localizable strings with _() in the DRLVM source code.
   The interface is very simple and does not impose any restrictions.

3) Implement the localization in any way we like, be it icu4c, log4cxx
   or gettext. Or may be even leave it configurable at compile time.

--- /dev/null
+++ b/vm/include/l10n.h
@@ -0,0 +1,8 @@
+#ifndef _L10N_H
+#define _L10N_H
+
+#define _(message) (message)
+
+void init_l10n();
+
+#endif // _L10N_H
diff --git a/vm/vmcore/src/init/vm_main.cpp b/vm/vmcore/src/init/vm_main.cpp
index 9db56e5..96e9a8c 100644
--- a/vm/vmcore/src/init/vm_main.cpp
+++ b/vm/vmcore/src/init/vm_main.cpp
@@ -42,6 +42,7 @@ #include dll_jit_intf.h
 #include dll_gc.h
 #include em_intf.h
 #include port_filepath.h
+#include l10n.h
 
 union Scalar_Arg {
 int i;
@@ -283,6 +284,7 @@ static int run_java_shutdown()
 
 void create_vm(Global_Env *p_env, JavaVMInitArgs* vm_arguments) 
 {
+init_l10n();
 #ifdef PLATFORM_POSIX
 init_linux_thread_system();
 #elif defined(PLATFORM_NT)
diff --git a/vm/vmcore/src/l10n.cpp b/vm/vmcore/src/l10n.cpp
new file mode 100644
index 000..d9d380a
--- /dev/null
+++ b/vm/vmcore/src/l10n.cpp
@@ -0,0 +1,4 @@
+#include l10n.h
+
+void init_l10n() {
+}

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Vladimir Gorr

On 7/13/06, Salikh Zakirov [EMAIL PROTECTED] wrote:


Vladimir Gorr wrote:
 I'd like to discuss with you a design for the VM native code
 internationalization (attached below).
 ...
 Please let me know your opinions/objections.

To make my point clearer, I would repeat my suggestion.

0) Agree on a design decision that the message key
  is the *unlocalized message itself*, rather some intermediary constant.

1) Start l10n with the below patch (untested)

2) Start marking localizable strings with _() in the DRLVM source code.
  The interface is very simple and does not impose any restrictions.

3) Implement the localization in any way we like, be it icu4c, log4cxx
  or gettext. Or may be even leave it configurable at compile time.




There is an essential obstacle to use the *gettext* approach.

It's impossible to run VM on Windows platform if not to take into account
CYGWIN environment.

I'm not clear as well how we will merge the previous .po catalogs (already
translated) with new ones (when new strings to be added).

In any case, a manual work needs for doing this. IMO the gettext is very
convenient to generate the initial template of message catalogs.

However _() should be inserted for all strings (and then deleted?) to
achieve this. It involves too big efforts.

Therefore my preference is to use more universal approach, namely, ICU4C or
LOG4CXX or combination of them.

Any comments?

Thanks,
Vladimir.


--- /dev/null

+++ b/vm/include/l10n.h
@@ -0,0 +1,8 @@
+#ifndef _L10N_H
+#define _L10N_H
+
+#define _(message) (message)
+
+void init_l10n();
+
+#endif // _L10N_H
diff --git a/vm/vmcore/src/init/vm_main.cpp
b/vm/vmcore/src/init/vm_main.cpp
index 9db56e5..96e9a8c 100644
--- a/vm/vmcore/src/init/vm_main.cpp
+++ b/vm/vmcore/src/init/vm_main.cpp
@@ -42,6 +42,7 @@ #include dll_jit_intf.h
#include dll_gc.h
#include em_intf.h
#include port_filepath.h
+#include l10n.h

union Scalar_Arg {
int i;
@@ -283,6 +284,7 @@ static int run_java_shutdown()

void create_vm(Global_Env *p_env, JavaVMInitArgs* vm_arguments)
{
+init_l10n();
#ifdef PLATFORM_POSIX
init_linux_thread_system();
#elif defined(PLATFORM_NT)
diff --git a/vm/vmcore/src/l10n.cpp b/vm/vmcore/src/l10n.cpp
new file mode 100644
index 000..d9d380a
--- /dev/null
+++ b/vm/vmcore/src/l10n.cpp
@@ -0,0 +1,4 @@
+#include l10n.h
+
+void init_l10n() {
+}

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Geir Magnusson Jr


Salikh Zakirov wrote:
 Vladimir Gorr wrote:
 Internationalization design *1. Introduction*
 ...
 The key idea is to use ResourceBundle class from apache log4cxx which allow
 to store and effective use bundles with localized messages.
 
 Why not use GNU gettext -- de facto standard i18n system on GNU/Linux systems?
 I think the developers' API can be a designed to allow a wide range of
 i18n implementations, just like we did with logging.

Isn't it under the GPL?



-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Geir Magnusson Jr
I'll state the obvious... there is another thread going on about how do
to similar things with Classlib.  Maybe you can find common ground for
message bundles and such...

geir


Vladimir Gorr wrote:
 Hi Harmony community.
 
 
 
 I'd like to discuss with you a design for the VM native code
 internationalization (attached below).
 
 We'd like to consider this approach for the DRLVM first of all. However it
 can be suitable for other parts of Harmony project I suppose.
 
 Please let me know your opinions/objections.
 
 
 
 Thanks,
 
 Vladimir .
 
 
 
 ---
 
 
 Internationalization design *1. Introduction*
 
 
 
 The VM's output needs to be internationalized in order to provide localized
 versions of our product.
 
 The key idea is to use ResourceBundle class from apache log4cxx which allow
 to store and effective use bundles with localized messages.
 
 The document describes:
 
 · ResourceBundle naming conventions for bundles with localized
 messages.
 
 · Structure of* *ResourceBundle file. MessageId (keys for
 localized message in ResourceBundle) development guidelines.
 
 · Requirements.
 
 · How it works inside VM.
 
 
 
 *Definitions: *
 
 
 
 I18n – internationalization
 
 L10n – localization
 
 L7d – localized
 
 
 
 *2. ResourceBundle naming conventions for bundles with localized
 messages. *
 
 
 
 We offer to use ResourceBundle class from apache log4cxx as storage of
 localized messages. At first time all Resourcebundles are files.
 
 After VM starts, on VM's logging subsystem initialization stage, logging
 system chooses appropriate set of ResourceBundles
 
 according to values of environment variables: LC_ALL, LC_MESSAGES, and
 LANG.
 
 Chosen ResourceBundles should be used for printing localized messages from
 VM.
 
 
 
 E.g. If the environment variable LANG is equal to ru_RU then the
 following
 set of ResourceBundles should be used (see naming conventions below):
 
 · java_ru_RU.properties
 
 · java_ru.properties
 
 · java.properties
 
 
 
 Each file which presents ResourceBundle class should have the following
 name:
 
 *java_language_country_variant.properties *where:
 
 
 
 _language is a language e.g. _ru (Russian language). It may be empty.
 
 _country is a country e.g. _RU (Russian federation ). It may be empty.
 
 _variant is a variant. It may be empty.
 
 
 
The main ResourceBundle file (with messages on English) should
 be java.properties.
 
 
 
 *3. Structure of ResourceBundle file. MessageId development guidelines. *
 
 
 
 The structure of ResourceBundle file should be the following:
 
 
 
 MessageId1=localized message1
 
 MessageId2=localized message2
 
 ….
 
 Where:
 
 MessageId{i} – ASCII string on English language. It should consist of vm's
 subcomponent name ( e.g. init, port, gc.) and short description of message.
 
 E.g. init.help is localized help message from init subcomponent of VM.
 
 Localized message{i} – localized message.
 
 
 
 Localized message can contain parameters. E.g. localized message pattern:
 This is message on English with two parameters: parameter number one –
 {0},
 
 
 and parameter number two – {1}. We can print it again and in back order:
 {1}, {0}.   For the first parameter is equal to integer value 1
 
 and the second is equal to string two the message for pattern above
 should
 be:
 
 This is message on English with two parameters: parameter number one – 1,
 and parameter number two – two. We can print it again and in back order:
 two, 1.
 
 *  *
 
 *4. Requirements. *
 
 
 
   - All localized messages may be printed through apache log4cxx logger.
 
   - Parameters may be present in localized messages.
   - VM-I18N subsystem should automatically detect user's locale
   according to values of environment variables.
   - Minimize performance impact.
 

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Andrew Zhang

On 7/13/06, Vladimir Gorr [EMAIL PROTECTED] wrote:


On 7/13/06, Salikh Zakirov [EMAIL PROTECTED] wrote:

 Vladimir Gorr wrote:
  I'd like to discuss with you a design for the VM native code
  internationalization (attached below).
  ...
  Please let me know your opinions/objections.

 To make my point clearer, I would repeat my suggestion.

 0) Agree on a design decision that the message key
   is the *unlocalized message itself*, rather some intermediary
constant.

 1) Start l10n with the below patch (untested)

 2) Start marking localizable strings with _() in the DRLVM source code.
   The interface is very simple and does not impose any restrictions.

 3) Implement the localization in any way we like, be it icu4c, log4cxx
   or gettext. Or may be even leave it configurable at compile time.



There is an essential obstacle to use the *gettext* approach.

It's impossible to run VM on Windows platform if not to take into account
CYGWIN environment.

I'm not clear as well how we will merge the previous .po catalogs (already
translated) with new ones (when new strings to be added).

In any case, a manual work needs for doing this. IMO the gettext is very
convenient to generate the initial template of message catalogs.

However _() should be inserted for all strings (and then deleted?) to
achieve this. It involves too big efforts.

Therefore my preference is to use more universal approach,



Agree.

namely, ICU4C or

LOG4CXX or combination of them.



As Paulex mentioned, it depends on the requirement.

If only for i18n, icu4c is prefered.

Otherwise, if grain control of logging is required, log4cxx may be the
choice.

Thanks!


Any comments?

Thanks,
Vladimir.


--- /dev/null

+++ b/vm/include/l10n.h
@@ -0,0 +1,8 @@
+#ifndef _L10N_H
+#define _L10N_H
+
+#define _(message) (message)
+
+void init_l10n();
+
+#endif // _L10N_H
diff --git a/vm/vmcore/src/init/vm_main.cpp
b/vm/vmcore/src/init/vm_main.cpp
index 9db56e5..96e9a8c 100644
--- a/vm/vmcore/src/init/vm_main.cpp
+++ b/vm/vmcore/src/init/vm_main.cpp
@@ -42,6 +42,7 @@ #include dll_jit_intf.h
#include dll_gc.h
#include em_intf.h
#include port_filepath.h
+#include l10n.h

union Scalar_Arg {
int i;
@@ -283,6 +284,7 @@ static int run_java_shutdown()

void create_vm(Global_Env *p_env, JavaVMInitArgs* vm_arguments)
{
+init_l10n();
#ifdef PLATFORM_POSIX
init_linux_thread_system();
#elif defined(PLATFORM_NT)
diff --git a/vm/vmcore/src/l10n.cpp b/vm/vmcore/src/l10n.cpp
new file mode 100644
index 000..d9d380a
--- /dev/null
+++ b/vm/vmcore/src/l10n.cpp
@@ -0,0 +1,4 @@
+#include l10n.h
+
+void init_l10n() {
+}

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]








--
Andrew Zhang
China Software Development Lab, IBM


Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Salikh Zakirov
Geir Magnusson Jr wrote:
 Salikh Zakirov wrote:
 Why not use GNU gettext -- de facto standard i18n system on GNU/Linux 
 systems?
 
 Isn't it under the GPL?

The runtime part (libintl) is LGPL, so it allows linking to non-GPL programs.

The tools are indeed GPL, but Harmony project is not going either link with 
them.

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [drlvm] proposals for VM internationalization

2006-07-13 Thread Magnusson, Geir

 -Original Message-
 From: Salikh Zakirov [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, July 13, 2006 10:32 AM
 To: harmony-dev@incubator.apache.org
 Subject: Re: [drlvm] proposals for VM internationalization
 
 Geir Magnusson Jr wrote:
  Salikh Zakirov wrote:
  Why not use GNU gettext -- de facto standard i18n system 
 on GNU/Linux systems?
  
  Isn't it under the GPL?
 
 The runtime part (libintl) is LGPL, so it allows linking to 
 non-GPL programs.
 
 The tools are indeed GPL, but Harmony project is not going 
 either link with them.
 

Do you mean there won't be any runtime dependencies?  We can't
distribute LGPL-ed binaries.

Geir

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Salikh Zakirov
Vladimir Gorr wrote:

Have you looked at the patch I've sent? It doesn't use gettext.
It just proposes the way to move forward towards propertly localized DRLVM.
I think we will be able to use ICU4C java-like localization in the following 
way:
* extract localizable strings from .cpp files using xgettext
* process resulting .po files and autogenerate resource bundles from them
* use resource bundles for translation and distribution

 There is an essential obstacle to use the *gettext* approach.
 It's impossible to run VM on Windows platform if not to take into account
 CYGWIN environment.

I do not insist on using gettext, but will answer just for the record:
there exists a project to port libintl to native Windows using MinGW [1]

 I'm not clear as well how we will merge the previous .po catalogs (already
 translated) with new ones (when new strings to be added).

Again, I do not insist on using gettext,
however, gettext has the tool exactly for this task: msgmerge [2]

 However _() should be inserted for all strings (and then deleted?) to
 achieve this. It involves too big efforts.

The effort could be big, but is needed for any localization system we use.
The task to classify the messages to translatable (visible to user on a 
day-to-day
basis) and non-translatable (internal errors and debug logging) is needed 
anyway,
because we do not want to overload translators with useless work of translating
every string in the project.

IMHO, _() marker is visually the prettiest way mark localizable strings. 
(compared to // NON-NLS comments and resource bundle constants)

 Therefore my preference is to use more universal approach, namely, ICU4C or
 LOG4CXX or combination of them.

I've looked through Log4cxx manual and haven't found anything concerning both 
localization
and internationalization. By the way, DRLVM already uses Log4cxx.

ICU4C provides both internationalization and localization services [3].
It's native system uses ResourceBundles and looks similar to Java localization 
system,
and it suffers from the same drawback: the message keys are constants, which are
never printed, but have to be defined and referenced in multiple places.

The developer overhead to make a localizable message is as high as
* define a new constant in some file
* add a message to the default resource bundle
and inolves editing multiple files. I have no doubt that this overhead 
significantly higher
than putting three characters to mark the string in _() way.

--
Salikh.

[1] http://gnuwin32.sourceforge.net/packages/libintl.htm
[2] http://www.gnu.org/software/gettext/manual/html_mono/gettext.html#SEC36
[3] http://icu.sourceforge.net/userguide/localizing.html

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Salikh Zakirov
Andrew Zhang wrote:
 Vladimir Gorr wrote:
 namely, ICU4C or
 LOG4CXX or combination of them.

log4cxx is already used in DRLVM. It does not provide localization services.

 If only for i18n, icu4c is prefered.

So, would the following solution be acceptable to all?

1 mark the localizable strings with _() in .cpp files
2 write a tool to extract localizable messages from .cpp files
  and autogenerate ICU4C .txt resource bundles.

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Salikh Zakirov
Magnusson, Geir wrote:
 Do you mean there won't be any runtime dependencies?  We can't
 distribute LGPL-ed binaries.

In this case, libintl is definitely out of question.

However, I like the simplicity of _() interface.
I think we can use it with ICU4C too.

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Geir Magnusson Jr
I don't know if this is being considered, but

1) Classlib has lots of java internationalization needs, and some native
internationalization needs

2) DRLVM has lots of native internationalization needs, and some java
needs (kernel classes).

So it seems clear to me we need to at least try for a common approach.

geir


Salikh Zakirov wrote:
 Andrew Zhang wrote:
 Vladimir Gorr wrote:
 namely, ICU4C or
 LOG4CXX or combination of them.
 
 log4cxx is already used in DRLVM. It does not provide localization services.
 
 If only for i18n, icu4c is prefered.
 
 So, would the following solution be acceptable to all?
 
 1 mark the localizable strings with _() in .cpp files
 2 write a tool to extract localizable messages from .cpp files
   and autogenerate ICU4C .txt resource bundles.
 
 -
 Terms of use : http://incubator.apache.org/harmony/mailing.html
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Salikh Zakirov
Geir Magnusson Jr wrote:
 I don't know if this is being considered, but
 
 1) Classlib has lots of java internationalization needs, and some native
 internationalization needs
 
 2) DRLVM has lots of native internationalization needs, and some java
 needs (kernel classes).

FWIW, I as far as I can figure from both [drlvm] and [classlib] discussions,
the topic is *localization*, i.e. providing the user with the messages 
in native language of the user.

Concerning *internationalization* Java code is internationalized by design,
and DRLVM needs some fixes to achieve it, at least
* accept non-ascii class names in locale-specific encoding

For more information about i18n vs l10n, see
http://www.w3.org/International/questions/qa-i18n

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [drlvm] proposals for VM internationalization

2006-07-13 Thread Geir Magnusson Jr
we stand corrected.  Both need localization, and we should be able to
find commonality in the approaches.

geir


Salikh Zakirov wrote:
 Geir Magnusson Jr wrote:
 I don't know if this is being considered, but

 1) Classlib has lots of java internationalization needs, and some native
 internationalization needs

 2) DRLVM has lots of native internationalization needs, and some java
 needs (kernel classes).
 
 FWIW, I as far as I can figure from both [drlvm] and [classlib] discussions,
 the topic is *localization*, i.e. providing the user with the messages 
 in native language of the user.
 
 Concerning *internationalization* Java code is internationalized by design,
 and DRLVM needs some fixes to achieve it, at least
 * accept non-ascii class names in locale-specific encoding
 
 For more information about i18n vs l10n, see
 http://www.w3.org/International/questions/qa-i18n
 
 -
 Terms of use : http://incubator.apache.org/harmony/mailing.html
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]