Re: Internationalization and cvs
Concerning the multi-byte encoding: We should have a discussion May I know what this "last year's argument" is? I joined this list for less than one month. The argument of last year is to use Unicode as the document encoding internally in LyX. Also, there is code in the old development branch that does conversions from different encodings. So, the argument was (and remains IMO) that we should use Unicode wherever possible, because this is the unifying encoding which we will surely want to support. However, if the toolkit (or any other external component) can not handle Unicode, we will have the code in LyX to convert to whatever is appropriate. I wrote a long document regarding the encoding and representation issue in LyX a year or two ago. This document used to reside on the old ftp-area, but that harddisk crashed. Maybe someone has a copy on this list, and can upload it to ftp again? Greets, Asger
Re: Internationalization and cvs
"Asger K. Alstrup Nielsen" wrote: However, if the toolkit (or any other external component) can not handle Unicode, we will have the code in LyX to convert to whatever is appropriate. XForm just only cannot support Unicode, or it cannot support multi-byte in general? Oh, by the way, the latest versin 1.5 of yudit works quite satisfactory. I think we could count on it when we want to do translations. I wrote a long document regarding the encoding and representation issue in LyX a year or two ago. This document used to reside on the old ftp-area, but that harddisk crashed. Maybe someone has a copy on this list, and can upload it to ftp again? I still have the file. I'll send it to you in another mail. Regards Seak
Re: Internationalization and cvs
> > Concerning the multi-byte encoding: We should have a discussion > > May I know what this "last year's argument" is? I joined this list > for less than one month. The argument of last year is to use Unicode as the document encoding internally in LyX. Also, there is code in the old development branch that does conversions from different encodings. So, the argument was (and remains IMO) that we should use Unicode wherever possible, because this is the unifying encoding which we will surely want to support. However, if the toolkit (or any other external component) can not handle Unicode, we will have the code in LyX to convert to whatever is appropriate. I wrote a long document regarding the encoding and representation issue in LyX a year or two ago. This document used to reside on the old ftp-area, but that harddisk crashed. Maybe someone has a copy on this list, and can upload it to ftp again? Greets, Asger
Re: Internationalization and cvs
"Asger K. Alstrup Nielsen" wrote: > However, if the toolkit (or any other external component) can not > handle Unicode, we will have the code in LyX to convert to whatever is > appropriate. XForm just only cannot support Unicode, or it cannot support multi-byte in general? Oh, by the way, the latest versin 1.5 of yudit works quite satisfactory. I think we could count on it when we want to do translations. > I wrote a long document regarding the encoding and representation > issue in LyX a year or two ago. This document used to reside on the > old ftp-area, but that harddisk crashed. Maybe someone has a copy > on this list, and can upload it to ftp again? I still have the file. I'll send it to you in another mail. Regards Seak
Re: Internationalization and cvs
Andre' Poenitz wrote: Concerning the multi-byte encoding: We should have a discussion whether it is sensible to use Unicode internally. The world is changing and last years' arguments won't fit anymore. May I know what this "last year's argument" is? I joined this list for less than one month. We could get rid of almost all of the encoding stuff at the price of double sized buffers plus quite a bit of work. But in the end thing would be much simpler It might have other advantages. Eg, lastest versions of Windows use Unicode internally and LyX is ported to windows, using Unicode inside LyX perhaps could help i18n/l10n (if this is possible with Cygnus) work also for windows port more "natively"? That is, windows doesn't have to translate a menu string from a particular encoding to Unicode before displaying it.
Re: Internationalization and cvs
Andre' Poenitz wrote: > Concerning the multi-byte encoding: We should have a discussion > whether > it is sensible to use Unicode internally. The world is changing and > last > years' arguments won't fit anymore. May I know what this "last year's argument" is? I joined this list for less than one month. > We could get rid of almost all of the encoding stuff at the price of > double sized buffers plus quite a bit of work. But in the end thing > would be much simpler It might have other advantages. Eg, lastest versions of Windows use Unicode internally and LyX is ported to windows, using Unicode inside LyX perhaps could help i18n/l10n (if this is possible with Cygnus) work also for windows port more "natively"? That is, windows doesn't have to translate a menu string from a particular encoding to Unicode before displaying it.
Internationalization and cvs
Hi fellow LyXers from all over the world, These days, we are overwhelmed with patches that extend the scope of LyX to Hebrew, Chinese, Korean, Japanese, and what have you. This is great, and very much appreciated. Now, the question is how to handle this. It seems to me that the Hebrew patch is "backwards" compatible in the sense that it does not affect the left-to-right 8-bit languages, except maybe for a small performance penalty. However, several voices say that paragraph-level language support is important, so it seems that there might be a fair amount of development needed to get that. Therefore, the best solution might be to create a Hebrew-cvs branch for this, which for starters will use the document-wide solution. Similarly for the multi-byte encoding stuff: The authors ask for a separate cvs-branch, and since I have not seen the patch yet, this seems like a good idea. So it's time to create a few cvs-branches. As I understand the issue, we need two new branches: One for Hebrew, and one for the multi-byte encoding languages. Is this the correct understanding? What would you like the branches to be called? Lgb, how long will it take to create those branches and give the relevant people write access to the repository? Greets, Asger
Re: Internationalization and cvs
On Wed, 15 Dec 1999, Andre' Poenitz wrote: Concerning the multi-byte encoding: We should have a discussion whether it is sensible to use Unicode internally. The world is changing and last years' arguments won't fit anymore. We could get rid of almost all of the encoding stuff at the price of double sized buffers plus quite a bit of work. But in the end thing would be much simpler My instinct says that most people aren't going to care in the slightest about a doubling of internal buffer sizes. The combination of cheap memory, cheap hard disks, and sensible VM systems makes this not much of an issue, IMO. Of course, if we're space-paranoid we could use UTF, or whatever it's called. That's more work, though. Jules /+---+-\ | Jelibean aka | [EMAIL PROTECTED] | 6 Evelyn Rd| | Jules aka | [EMAIL PROTECTED] | Richmond, Surrey | | Julian Bean | [EMAIL PROTECTED]| TW9 2TF *UK* | ++---+-+ | War doesn't demonstrate who's right... just who's left. | | When privacy is outlawed... only the outlaws have privacy. | \--/
Re: Internationalization and cvs
"Jules" == Jules Bean [EMAIL PROTECTED] writes: Jules On Wed, 15 Dec 1999, Andre' Poenitz wrote: Concerning the multi-byte encoding: We should have a discussion whether it is sensible to use Unicode internally. The world is changing and last years' arguments won't fit anymore. We could get rid of almost all of the encoding stuff at the price of double sized buffers plus quite a bit of work. But in the end thing would be much simpler Jules My instinct says that most people aren't going to care in the Jules slightest about a doubling of internal buffer sizes. The Jules combination of cheap memory, cheap hard disks, and sensible VM Jules systems makes this not much of an issue, IMO. I think it should not be very difficult to choose the size of characters at compile time. We define LChar to be the unit character, and operate on that. JMarc
Re: Internationalization and cvs
My instinct says that most people aren't going to care in the slightest about a doubling of internal buffer sizes. The combination of cheap memory, cheap hard disks, and sensible VM systems makes this not much of an issue, IMO. *grin* Well, I did not want to write obvious things ;-) Andre' PS: External 8bit is another matter... Interoperation... -- Andre' Poenitz .. [EMAIL PROTECTED]
Re: Internationalization and cvs
On Wed, 15 Dec 1999, Asger K. Alstrup Nielsen wrote: ... not affect the left-to-right 8-bit languages, except maybe for a small performance penalty. However, several voices say that paragraph-level language support is important, so it seems that Surely those voices didn't mean paragraph-level support without also sentence- and word-level support. When you cite from a different language, the citation is obviously not always going to be a separate paragraph. ... Greg Lee [EMAIL PROTECTED]
Re: Internationalization and cvs
Surely those voices didn't mean paragraph-level support without also sentence- and word-level support. When you cite from a different language, the citation is obviously not always going to be a separate paragraph. As it happens, sub-paragraph-level support is markedly more difficult than paragraph-level support, so don't hold your breath. It's similar to the old discussion (i.e. flame war) of why the ERT mode is not collapsable: The reason is that character-level (technically equivalent to sentence-level) support is much more difficult than paragraph-level. One reason is that paragraph-level requires only 1D formatting (because a paragraph spans the entire width of the screen), while character-level requires 2D formatting. Having said that, the old development branch actually had some support of character-level collapsable insets implemented at a developers meeting in Mexico by Juergen and Alejandro, so it *is* possible to achieve. This extrapolates to character-level RTL support: Of course it is possible, but it's very complicated. How should lines wrap? What about nested quotes: An English document with quote in Hebrew, which itself contains an English quote. What about search/replace? What about cut/paste? There are many issues to consider. Fortunately, most of them have been addressed by the Unicode standard, but it still is a lot of work to understand and implement. I have the Unicode book with part of the recipe, so if anybody have technical questions, please let me know and I'll see what I can find out. Greets, Asger
Re: Internationalization and cvs
On Wed, 15 Dec 1999, Asger K. Alstrup Nielsen wrote: Surely those voices didn't mean paragraph-level support without also sentence- and word-level support. When you cite from a different language, the citation is obviously not always going to be a separate paragraph. As it happens, sub-paragraph-level support is markedly more difficult than paragraph-level support, so don't hold your breath. Ok, I'm not. I'm just wondering whether paragraph-level support only is going to be useful enough to potential users to make it worthwhile embarking on. Worth your while, of course, is what I mean, since I'm not a Lyx developer. But it would be sort of irritating for you to get it all worked out and then have those MSDOS users you were trying to seduce say, Oh, we didn't think to mention that we need to cite words too ... Greg Lee [EMAIL PROTECTED]
Internationalization and cvs
Hi fellow LyXers from all over the world, These days, we are overwhelmed with patches that extend the scope of LyX to Hebrew, Chinese, Korean, Japanese, and what have you. This is great, and very much appreciated. Now, the question is how to handle this. It seems to me that the Hebrew patch is "backwards" compatible in the sense that it does not affect the left-to-right 8-bit languages, except maybe for a small performance penalty. However, several voices say that paragraph-level language support is important, so it seems that there might be a fair amount of development needed to get that. Therefore, the best solution might be to create a Hebrew-cvs branch for this, which for starters will use the document-wide solution. Similarly for the multi-byte encoding stuff: The authors ask for a separate cvs-branch, and since I have not seen the patch yet, this seems like a good idea. So it's time to create a few cvs-branches. As I understand the issue, we need two new branches: One for Hebrew, and one for the multi-byte encoding languages. Is this the correct understanding? What would you like the branches to be called? Lgb, how long will it take to create those branches and give the relevant people write access to the repository? Greets, Asger
Re: Internationalization and cvs
On Wed, 15 Dec 1999, Andre' Poenitz wrote: > > Concerning the multi-byte encoding: We should have a discussion whether > it is sensible to use Unicode internally. The world is changing and last > years' arguments won't fit anymore. We could get rid of almost all of > the encoding stuff at the price of double sized buffers plus quite a bit > of work. But in the end thing would be much simpler My instinct says that most people aren't going to care in the slightest about a doubling of internal buffer sizes. The combination of cheap memory, cheap hard disks, and sensible VM systems makes this not much of an issue, IMO. Of course, if we're space-paranoid we could use UTF, or whatever it's called. That's more work, though. Jules /+---+-\ | Jelibean aka | [EMAIL PROTECTED] | 6 Evelyn Rd| | Jules aka | [EMAIL PROTECTED] | Richmond, Surrey | | Julian Bean | [EMAIL PROTECTED]| TW9 2TF *UK* | ++---+-+ | War doesn't demonstrate who's right... just who's left. | | When privacy is outlawed... only the outlaws have privacy. | \--/
Re: Internationalization and cvs
> "Jules" == Jules Bean <[EMAIL PROTECTED]> writes: Jules> On Wed, 15 Dec 1999, Andre' Poenitz wrote: >> Concerning the multi-byte encoding: We should have a discussion >> whether it is sensible to use Unicode internally. The world is >> changing and last years' arguments won't fit anymore. We could get >> rid of almost all of the encoding stuff at the price of double >> sized buffers plus quite a bit of work. But in the end thing would >> be much simpler Jules> My instinct says that most people aren't going to care in the Jules> slightest about a doubling of internal buffer sizes. The Jules> combination of cheap memory, cheap hard disks, and sensible VM Jules> systems makes this not much of an issue, IMO. I think it should not be very difficult to choose the size of characters at compile time. We define LChar to be the unit character, and operate on that. JMarc
Re: Internationalization and cvs
> My instinct says that most people aren't going to care in the slightest > about a doubling of internal buffer sizes. The combination of cheap > memory, cheap hard disks, and sensible VM systems makes this not much of > an issue, IMO. *grin* Well, I did not want to write obvious things ;-) Andre' PS: External 8bit is another matter... Interoperation... -- Andre' Poenitz .. [EMAIL PROTECTED]
Re: Internationalization and cvs
On Wed, 15 Dec 1999, Asger K. Alstrup Nielsen wrote: ... > not affect the left-to-right 8-bit languages, except maybe for a > small performance penalty. However, several voices say that > paragraph-level language support is important, so it seems that Surely those voices didn't mean paragraph-level support without also sentence- and word-level support. When you cite from a different language, the citation is obviously not always going to be a separate paragraph. ... Greg Lee <[EMAIL PROTECTED]>
Re: Internationalization and cvs
> Surely those voices didn't mean paragraph-level support without > also sentence- and word-level support. When you cite from a > different language, the citation is obviously not always > going to be a separate paragraph. As it happens, sub-paragraph-level support is markedly more difficult than paragraph-level support, so don't hold your breath. It's similar to the old discussion (i.e. flame war) of why the ERT mode is not collapsable: The reason is that character-level (technically equivalent to sentence-level) support is much more difficult than paragraph-level. One reason is that paragraph-level requires only 1D formatting (because a paragraph spans the entire width of the screen), while character-level requires 2D formatting. Having said that, the old development branch actually had some support of character-level collapsable insets implemented at a developers meeting in Mexico by Juergen and Alejandro, so it *is* possible to achieve. This extrapolates to character-level RTL support: Of course it is possible, but it's very complicated. How should lines wrap? What about nested quotes: An English document with quote in Hebrew, which itself contains an English quote. What about search/replace? What about cut/paste? There are many issues to consider. Fortunately, most of them have been addressed by the Unicode standard, but it still is a lot of work to understand and implement. I have the Unicode book with part of the recipe, so if anybody have technical questions, please let me know and I'll see what I can find out. Greets, Asger
Re: Internationalization and cvs
On Wed, 15 Dec 1999, Asger K. Alstrup Nielsen wrote: > > Surely those voices didn't mean paragraph-level support without > > also sentence- and word-level support. When you cite from a > > different language, the citation is obviously not always > > going to be a separate paragraph. > > As it happens, sub-paragraph-level support is markedly more > difficult than paragraph-level support, so don't hold your > breath. Ok, I'm not. I'm just wondering whether paragraph-level support only is going to be useful enough to potential users to make it worthwhile embarking on. Worth your while, of course, is what I mean, since I'm not a Lyx developer. But it would be sort of irritating for you to get it all worked out and then have those MSDOS users you were trying to seduce say, Oh, we didn't think to mention that we need to cite words too ... Greg Lee <[EMAIL PROTECTED]>