Re: Internationalization and cvs

1999-12-19 Thread Asger K. Alstrup Nielsen

  Concerning the multi-byte encoding: We should have a discussion

  May I know what this "last year's argument" is?  I joined this list
 for less than one month.

The argument of last year is to use Unicode as the document encoding
internally in LyX.  Also, there is code in the old development branch
that does conversions from different encodings.

So, the argument was (and remains IMO) that we should use Unicode
wherever possible, because this is the unifying encoding which we
will surely want to support.

However, if the toolkit (or any other external component) can not
handle Unicode, we will have the code in LyX to convert to whatever is
appropriate.

I wrote a long document regarding the encoding and representation
issue in LyX a year or two ago.  This document used to reside on the
old ftp-area, but that harddisk crashed.  Maybe someone has a copy
on this list, and can upload it to ftp again?

Greets,

Asger



Re: Internationalization and cvs

1999-12-19 Thread Seak, Teng-Fong

"Asger K. Alstrup Nielsen" wrote:

 However, if the toolkit (or any other external component) can not
 handle Unicode, we will have the code in LyX to convert to whatever is
 appropriate.

 XForm just only cannot support Unicode, or it cannot support multi-byte
in general?  Oh, by the way, the latest versin 1.5 of yudit works quite
satisfactory.  I think we could count on it when we want to do translations.

 I wrote a long document regarding the encoding and representation
 issue in LyX a year or two ago.  This document used to reside on the
 old ftp-area, but that harddisk crashed.  Maybe someone has a copy
 on this list, and can upload it to ftp again?

 I still have the file.  I'll send it to you in another mail.

 Regards

 Seak





Re: Internationalization and cvs

1999-12-19 Thread Asger K. Alstrup Nielsen

> > Concerning the multi-byte encoding: We should have a discussion
>
>  May I know what this "last year's argument" is?  I joined this list
> for less than one month.

The argument of last year is to use Unicode as the document encoding
internally in LyX.  Also, there is code in the old development branch
that does conversions from different encodings.

So, the argument was (and remains IMO) that we should use Unicode
wherever possible, because this is the unifying encoding which we
will surely want to support.

However, if the toolkit (or any other external component) can not
handle Unicode, we will have the code in LyX to convert to whatever is
appropriate.

I wrote a long document regarding the encoding and representation
issue in LyX a year or two ago.  This document used to reside on the
old ftp-area, but that harddisk crashed.  Maybe someone has a copy
on this list, and can upload it to ftp again?

Greets,

Asger



Re: Internationalization and cvs

1999-12-19 Thread Seak, Teng-Fong

"Asger K. Alstrup Nielsen" wrote:

> However, if the toolkit (or any other external component) can not
> handle Unicode, we will have the code in LyX to convert to whatever is
> appropriate.

 XForm just only cannot support Unicode, or it cannot support multi-byte
in general?  Oh, by the way, the latest versin 1.5 of yudit works quite
satisfactory.  I think we could count on it when we want to do translations.

> I wrote a long document regarding the encoding and representation
> issue in LyX a year or two ago.  This document used to reside on the
> old ftp-area, but that harddisk crashed.  Maybe someone has a copy
> on this list, and can upload it to ftp again?

 I still have the file.  I'll send it to you in another mail.

 Regards

 Seak





Re: Internationalization and cvs

1999-12-16 Thread Seak, Teng-Fong

Andre' Poenitz wrote:

 Concerning the multi-byte encoding: We should have a discussion
 whether
 it is sensible to use Unicode internally. The world is changing and
 last
 years' arguments won't fit anymore.

 May I know what this "last year's argument" is?  I joined this list
for less than one month.

 We could get rid of almost all of the encoding stuff at the price of
 double sized buffers plus quite a bit of work. But in the end thing
 would be much simpler

 It might have other advantages.  Eg, lastest versions of Windows
use Unicode internally and LyX is ported to windows, using Unicode
inside LyX perhaps could help i18n/l10n (if this is possible with
Cygnus) work also
for windows port more "natively"?  That is, windows doesn't have to
translate a menu string from a particular encoding to Unicode before
displaying it.






Re: Internationalization and cvs

1999-12-16 Thread Seak, Teng-Fong

Andre' Poenitz wrote:

> Concerning the multi-byte encoding: We should have a discussion
> whether
> it is sensible to use Unicode internally. The world is changing and
> last
> years' arguments won't fit anymore.

 May I know what this "last year's argument" is?  I joined this list
for less than one month.

> We could get rid of almost all of the encoding stuff at the price of
> double sized buffers plus quite a bit of work. But in the end thing
> would be much simpler

 It might have other advantages.  Eg, lastest versions of Windows
use Unicode internally and LyX is ported to windows, using Unicode
inside LyX perhaps could help i18n/l10n (if this is possible with
Cygnus) work also
for windows port more "natively"?  That is, windows doesn't have to
translate a menu string from a particular encoding to Unicode before
displaying it.






Internationalization and cvs

1999-12-15 Thread Asger K. Alstrup Nielsen

Hi fellow LyXers from all over the world,

These days, we are overwhelmed with patches that extend the scope
of LyX to Hebrew, Chinese, Korean, Japanese, and what have you.
This is great, and very much appreciated.

Now, the question is how to handle this.  It seems to me that the
Hebrew patch is "backwards" compatible in the sense that it does
not affect the left-to-right 8-bit languages, except maybe for a
small performance penalty.  However, several voices say that
paragraph-level language support is important, so it seems that
there might be a fair amount of development needed to get that.
Therefore, the best solution might be to create a Hebrew-cvs
branch for this, which for starters will use the document-wide
solution.

Similarly for the multi-byte encoding stuff:  The authors ask
for a separate cvs-branch, and since I have not seen the patch 
yet, this seems like a good idea.

So it's time to create a few cvs-branches.

As I understand the issue, we need two new branches:

One for Hebrew, and one for the multi-byte encoding languages.

Is this the correct understanding?

What would you like the branches to be called?

Lgb, how long will it take to create those branches and give
the relevant people write access to the repository?

Greets,

Asger



Re: Internationalization and cvs

1999-12-15 Thread Jules Bean

On Wed, 15 Dec 1999, Andre' Poenitz wrote:

 
 Concerning the multi-byte encoding: We should have a discussion whether
 it is sensible to use Unicode internally. The world is changing and last
 years' arguments won't fit anymore. We could get rid of almost all of
 the encoding stuff at the price of double sized buffers plus quite a bit
 of work. But in the end thing would be much simpler 

My instinct says that most people aren't going to care in the slightest
about a doubling of internal buffer sizes.  The combination of cheap
memory, cheap hard disks, and sensible VM systems makes this not much of
an issue, IMO.

Of course, if we're space-paranoid we could use UTF, or whatever it's
called.  That's more work, though.

Jules

/+---+-\
|  Jelibean aka  | [EMAIL PROTECTED] |  6 Evelyn Rd|
|  Jules aka | [EMAIL PROTECTED]  |  Richmond, Surrey   |
|  Julian Bean   | [EMAIL PROTECTED]|  TW9 2TF *UK*   |
++---+-+
|  War doesn't demonstrate who's right... just who's left. |
|  When privacy is outlawed... only the outlaws have privacy.  |
\--/



Re: Internationalization and cvs

1999-12-15 Thread Jean-Marc Lasgouttes

 "Jules" == Jules Bean [EMAIL PROTECTED] writes:

Jules On Wed, 15 Dec 1999, Andre' Poenitz wrote:
  Concerning the multi-byte encoding: We should have a discussion
 whether it is sensible to use Unicode internally. The world is
 changing and last years' arguments won't fit anymore. We could get
 rid of almost all of the encoding stuff at the price of double
 sized buffers plus quite a bit of work. But in the end thing would
 be much simpler

Jules My instinct says that most people aren't going to care in the
Jules slightest about a doubling of internal buffer sizes. The
Jules combination of cheap memory, cheap hard disks, and sensible VM
Jules systems makes this not much of an issue, IMO.

I think it should not be very difficult to choose the size of
characters at compile time. We define LChar to be the unit character,
and operate on that.

JMarc



Re: Internationalization and cvs

1999-12-15 Thread Andre' Poenitz

 My instinct says that most people aren't going to care in the slightest
 about a doubling of internal buffer sizes.  The combination of cheap
 memory, cheap hard disks, and sensible VM systems makes this not much of
 an issue, IMO.

*grin* Well, I did not want to write obvious things ;-)

Andre'

PS: External 8bit is another matter... Interoperation...

--
Andre' Poenitz .. [EMAIL PROTECTED]



Re: Internationalization and cvs

1999-12-15 Thread Greg Lee


On Wed, 15 Dec 1999, Asger K. Alstrup Nielsen wrote:
...
 not affect the left-to-right 8-bit languages, except maybe for a
 small performance penalty.  However, several voices say that
 paragraph-level language support is important, so it seems that

Surely those voices didn't mean paragraph-level support without
also sentence- and word-level support.  When you cite from a
different language, the citation is obviously not always
going to be a separate paragraph.
...

Greg Lee [EMAIL PROTECTED]



Re: Internationalization and cvs

1999-12-15 Thread Asger K. Alstrup Nielsen

 Surely those voices didn't mean paragraph-level support without
 also sentence- and word-level support.  When you cite from a
 different language, the citation is obviously not always
 going to be a separate paragraph.

As it happens, sub-paragraph-level support is markedly more
difficult than paragraph-level support, so don't hold your
breath.

It's similar to the old discussion (i.e. flame war) of why the 
ERT mode is not collapsable: The reason is that character-level 
(technically equivalent to sentence-level) support is much more 
difficult than paragraph-level.
One reason is that paragraph-level requires only 1D formatting
(because a paragraph spans the entire width of the screen), while
character-level requires 2D formatting.

Having said that, the old development branch actually had some
support of character-level collapsable insets implemented at
a developers meeting in Mexico by Juergen and Alejandro, so it
*is* possible to achieve.

This extrapolates to character-level RTL support: Of course it is
possible, but it's very complicated.  How should lines wrap?
What about nested quotes: An English document with quote in Hebrew,
which itself contains an English quote.

What about search/replace?  What about cut/paste?

There are many issues to consider.

Fortunately, most of them have been addressed by the Unicode standard,
but it still is a lot of work to understand and implement.  I have the 
Unicode book with part of the recipe, so if anybody have technical 
questions, please let me know and I'll see what I can find out.

Greets,

Asger



Re: Internationalization and cvs

1999-12-15 Thread Greg Lee


On Wed, 15 Dec 1999, Asger K. Alstrup Nielsen wrote:

  Surely those voices didn't mean paragraph-level support without
  also sentence- and word-level support.  When you cite from a
  different language, the citation is obviously not always
  going to be a separate paragraph.
 
 As it happens, sub-paragraph-level support is markedly more
 difficult than paragraph-level support, so don't hold your
 breath.

Ok, I'm not.  I'm just wondering whether paragraph-level support
only is going to be useful enough to potential users to make
it worthwhile embarking on.  Worth your while, of course, is
what I mean, since I'm not a Lyx developer.  But it would be
sort of irritating for you to get it all worked out and then
have those MSDOS users you were trying to seduce say, Oh,
we didn't think to mention that we need to cite words too ...

Greg Lee [EMAIL PROTECTED]



Internationalization and cvs

1999-12-15 Thread Asger K. Alstrup Nielsen

Hi fellow LyXers from all over the world,

These days, we are overwhelmed with patches that extend the scope
of LyX to Hebrew, Chinese, Korean, Japanese, and what have you.
This is great, and very much appreciated.

Now, the question is how to handle this.  It seems to me that the
Hebrew patch is "backwards" compatible in the sense that it does
not affect the left-to-right 8-bit languages, except maybe for a
small performance penalty.  However, several voices say that
paragraph-level language support is important, so it seems that
there might be a fair amount of development needed to get that.
Therefore, the best solution might be to create a Hebrew-cvs
branch for this, which for starters will use the document-wide
solution.

Similarly for the multi-byte encoding stuff:  The authors ask
for a separate cvs-branch, and since I have not seen the patch 
yet, this seems like a good idea.

So it's time to create a few cvs-branches.

As I understand the issue, we need two new branches:

One for Hebrew, and one for the multi-byte encoding languages.

Is this the correct understanding?

What would you like the branches to be called?

Lgb, how long will it take to create those branches and give
the relevant people write access to the repository?

Greets,

Asger



Re: Internationalization and cvs

1999-12-15 Thread Jules Bean

On Wed, 15 Dec 1999, Andre' Poenitz wrote:

> 
> Concerning the multi-byte encoding: We should have a discussion whether
> it is sensible to use Unicode internally. The world is changing and last
> years' arguments won't fit anymore. We could get rid of almost all of
> the encoding stuff at the price of double sized buffers plus quite a bit
> of work. But in the end thing would be much simpler 

My instinct says that most people aren't going to care in the slightest
about a doubling of internal buffer sizes.  The combination of cheap
memory, cheap hard disks, and sensible VM systems makes this not much of
an issue, IMO.

Of course, if we're space-paranoid we could use UTF, or whatever it's
called.  That's more work, though.

Jules

/+---+-\
|  Jelibean aka  | [EMAIL PROTECTED] |  6 Evelyn Rd|
|  Jules aka | [EMAIL PROTECTED]  |  Richmond, Surrey   |
|  Julian Bean   | [EMAIL PROTECTED]|  TW9 2TF *UK*   |
++---+-+
|  War doesn't demonstrate who's right... just who's left. |
|  When privacy is outlawed... only the outlaws have privacy.  |
\--/



Re: Internationalization and cvs

1999-12-15 Thread Jean-Marc Lasgouttes

> "Jules" == Jules Bean <[EMAIL PROTECTED]> writes:

Jules> On Wed, 15 Dec 1999, Andre' Poenitz wrote:
>>  Concerning the multi-byte encoding: We should have a discussion
>> whether it is sensible to use Unicode internally. The world is
>> changing and last years' arguments won't fit anymore. We could get
>> rid of almost all of the encoding stuff at the price of double
>> sized buffers plus quite a bit of work. But in the end thing would
>> be much simpler

Jules> My instinct says that most people aren't going to care in the
Jules> slightest about a doubling of internal buffer sizes. The
Jules> combination of cheap memory, cheap hard disks, and sensible VM
Jules> systems makes this not much of an issue, IMO.

I think it should not be very difficult to choose the size of
characters at compile time. We define LChar to be the unit character,
and operate on that.

JMarc



Re: Internationalization and cvs

1999-12-15 Thread Andre' Poenitz

> My instinct says that most people aren't going to care in the slightest
> about a doubling of internal buffer sizes.  The combination of cheap
> memory, cheap hard disks, and sensible VM systems makes this not much of
> an issue, IMO.

*grin* Well, I did not want to write obvious things ;-)

Andre'

PS: External 8bit is another matter... Interoperation...

--
Andre' Poenitz .. [EMAIL PROTECTED]



Re: Internationalization and cvs

1999-12-15 Thread Greg Lee


On Wed, 15 Dec 1999, Asger K. Alstrup Nielsen wrote:
...
> not affect the left-to-right 8-bit languages, except maybe for a
> small performance penalty.  However, several voices say that
> paragraph-level language support is important, so it seems that

Surely those voices didn't mean paragraph-level support without
also sentence- and word-level support.  When you cite from a
different language, the citation is obviously not always
going to be a separate paragraph.
...

Greg Lee <[EMAIL PROTECTED]>



Re: Internationalization and cvs

1999-12-15 Thread Asger K. Alstrup Nielsen

> Surely those voices didn't mean paragraph-level support without
> also sentence- and word-level support.  When you cite from a
> different language, the citation is obviously not always
> going to be a separate paragraph.

As it happens, sub-paragraph-level support is markedly more
difficult than paragraph-level support, so don't hold your
breath.

It's similar to the old discussion (i.e. flame war) of why the 
ERT mode is not collapsable: The reason is that character-level 
(technically equivalent to sentence-level) support is much more 
difficult than paragraph-level.
One reason is that paragraph-level requires only 1D formatting
(because a paragraph spans the entire width of the screen), while
character-level requires 2D formatting.

Having said that, the old development branch actually had some
support of character-level collapsable insets implemented at
a developers meeting in Mexico by Juergen and Alejandro, so it
*is* possible to achieve.

This extrapolates to character-level RTL support: Of course it is
possible, but it's very complicated.  How should lines wrap?
What about nested quotes: An English document with quote in Hebrew,
which itself contains an English quote.

What about search/replace?  What about cut/paste?

There are many issues to consider.

Fortunately, most of them have been addressed by the Unicode standard,
but it still is a lot of work to understand and implement.  I have the 
Unicode book with part of the recipe, so if anybody have technical 
questions, please let me know and I'll see what I can find out.

Greets,

Asger



Re: Internationalization and cvs

1999-12-15 Thread Greg Lee


On Wed, 15 Dec 1999, Asger K. Alstrup Nielsen wrote:

> > Surely those voices didn't mean paragraph-level support without
> > also sentence- and word-level support.  When you cite from a
> > different language, the citation is obviously not always
> > going to be a separate paragraph.
> 
> As it happens, sub-paragraph-level support is markedly more
> difficult than paragraph-level support, so don't hold your
> breath.

Ok, I'm not.  I'm just wondering whether paragraph-level support
only is going to be useful enough to potential users to make
it worthwhile embarking on.  Worth your while, of course, is
what I mean, since I'm not a Lyx developer.  But it would be
sort of irritating for you to get it all worked out and then
have those MSDOS users you were trying to seduce say, Oh,
we didn't think to mention that we need to cite words too ...

Greg Lee <[EMAIL PROTECTED]>