Re: [updated PATCH] bugs 3958, 3313 and 3976 lyx2lyx and unicode characters

2007-07-17 Thread José Matos
On Friday 13 July 2007 13:16:46 Anders Ekberg wrote:
> I have tried to address Georg's and Juergen's comments.
> To avoid data-loss, the function is only run if the encoding is auto
> or default and there are no language changes (overly conservative,
> but possible to work around, as commented in the code). Now, I
> *assume* this should prevent any cases that can cause data-loss (i.e.
> assuming unicode encoding when it is not), but I don't know the
> format good enough to be sure, so please correct the code if I'm wrong.

  We can extend the test to see if the included languages have the same 
encoding...

> To avoid the need for two unicodesymbols files, I have removed that
> requirement by assuming any command that includes { and } and not any
> of the exception tokens, to be an accented character. I also check
> for combining characters and don't translate these (which allowed the
> removal of an if-test in the end of the code). It is not pretty, but
> works...
>
> Details and an idea for a (hopefully) better solution are in bugzilla
> http://bugzilla.lyx.org/show_bug.cgi?id=3958

  As I said there the format change needs to be in lyx_1_5.py and not in 
LyX.py.

> Anders

-- 
José Abílio


Re: [updated PATCH] bugs 3958, 3313 and 3976 lyx2lyx and unicode characters

2007-07-14 Thread Anders Ekberg

A hopefully better solution is posted to bugzilla. Details in

http://bugzilla.lyx.org/show_bug.cgi?id=3958

/Anders


Anders Ekberg

Fri, 13 Jul 2007 05:18:45 -0700

I have tried to address Georg's and Juergen's comments.
To avoid data-loss, the function is only run if the encoding is  
auto or default and there are no language changes (overly  
conservative, but possible to work around, as commented in the  
code). Now, I *assume* this should prevent any cases that can cause  
data-loss (i.e. assuming unicode encoding when it is not), but I  
don't know the format good enough to be sure, so please correct the  
code if I'm wrong.


To avoid the need for two unicodesymbols files, I have removed that  
requirement by assuming any command that includes { and } and not  
any of the exception tokens, to be an accented character. I also  
check for combining characters and don't translate these (which  
allowed the removal of an if-test in the end of the code). It is  
not pretty, but works...

Details and an idea for a (hopefully) better solution are in bugzilla
http://bugzilla.lyx.org/show_bug.cgi?id=3958


[updated PATCH] bugs 3958, 3313 and 3976 lyx2lyx and unicode characters

2007-07-13 Thread Anders Ekberg

I have tried to address Georg's and Juergen's comments.
To avoid data-loss, the function is only run if the encoding is auto  
or default and there are no language changes (overly conservative,  
but possible to work around, as commented in the code). Now, I  
*assume* this should prevent any cases that can cause data-loss (i.e.  
assuming unicode encoding when it is not), but I don't know the  
format good enough to be sure, so please correct the code if I'm wrong.


To avoid the need for two unicodesymbols files, I have removed that  
requirement by assuming any command that includes { and } and not any  
of the exception tokens, to be an accented character. I also check  
for combining characters and don't translate these (which allowed the  
removal of an if-test in the end of the code). It is not pretty, but  
works...


Details and an idea for a (hopefully) better solution are in bugzilla
http://bugzilla.lyx.org/show_bug.cgi?id=3958

Anders



patch
Description: Binary data


Re: [PATCH] bugs 3958, 3313 and 3976 lyx2lyx and unicode characters

2007-07-09 Thread Anders Ekberg

On 9 jul 2007, at 14.15, Jean-Marc Lasgouttes wrote:


"Anders" == Anders Ekberg <[EMAIL PROTECTED]> writes:


Anders> The patch addresses this by excluding all accented characters
Anders> from the list that revert_unicode processes. In principle this
Anders> results in that accented characters get "properly" translated
Anders> and remaining unicode characters are replaced by ERT or math
Anders> commands (if they are in the list of characters).

What is the different between reverse_unicode and revert accent? Is it
just generating ERT versus InsetLaTexAccent?

Basically yes (ERT or math inset)

In this case, lyx2lyx
could look at the generated string and decide what to do?

You mean merge the two?
That would of course be the best solution, but that requires someone  
with better knowledge of revert_accent and the LyX-format (and  
time ;-) than I have.
Otherwise I think it would be quite straightforward to merge  
reverse_unicode. Just read in the unicodesymbols file, keep track of  
whether you're in an inset (and which) and if you include a math or  
an ERT. Then get the corresponding replacement string.


/Anders


Re: [PATCH] bugs 3958, 3313 and 3976 lyx2lyx and unicode characters

2007-07-09 Thread Jean-Marc Lasgouttes
> "Anders" == Anders Ekberg <[EMAIL PROTECTED]> writes:

Anders> The patch addresses this by excluding all accented characters
Anders> from the list that revert_unicode processes. In principle this
Anders> results in that accented characters get "properly" translated
Anders> and remaining unicode characters are replaced by ERT or math
Anders> commands (if they are in the list of characters).

What is the different between reverse_unicode and revert accent? Is it
just generating ERT versus InsetLaTexAccent? In this case, lyx2lyx
could look at the generated string and decide what to do?

JMarc


Re: [PATCH] bugs 3958, 3313 and 3976 lyx2lyx and unicode characters

2007-07-09 Thread Anders Ekberg


On 9 jul 2007, at 11.36, Jürgen Spitzmüller wrote:


Anders Ekberg wrote:

And then we have to maintain two unicodesymbols lists? I do not
think this is
a good idea.


I know, I tried some different options (like searching for {...}),
but didn't come up with anything I thought was better. So as a
compromise, I kept the lists identical and just commented out the
accented characters. If there is little maintenance of the
unicodesymbols list, I think it is acceptable (you can do a diff to
fairly easy spott errors).


I think lots of symbols will be added soon after 1.5.0. The need to  
always

snchronize the two lists is inefficient and error-prone.


How about adding a new flag "revert" to the unicodesymbols list
instead?


I thought about that too, but was afraid this would mess things up in
other places.


Where?
I assumed in the conversion to TeX. But I don't know anything about  
that, so I took what I thought was the safest route.


Anders

Re: [PATCH] bugs 3958, 3313 and 3976 lyx2lyx and unicode characters

2007-07-09 Thread Jürgen Spitzmüller
Anders Ekberg wrote:
> > And then we have to maintain two unicodesymbols lists? I do not  
> > think this is
> > a good idea.
>
> I know, I tried some different options (like searching for {...}),  
> but didn't come up with anything I thought was better. So as a  
> compromise, I kept the lists identical and just commented out the  
> accented characters. If there is little maintenance of the  
> unicodesymbols list, I think it is acceptable (you can do a diff to  
> fairly easy spott errors).

I think lots of symbols will be added soon after 1.5.0. The need to always 
snchronize the two lists is inefficient and error-prone.

> > How about adding a new flag "revert" to the unicodesymbols list  
> > instead?
>
> I thought about that too, but was afraid this would mess things up in  
> other places.

Where?

Jürgen


Re: [PATCH] bugs 3958, 3313 and 3976 lyx2lyx and unicode characters

2007-07-09 Thread Anders Ekberg

Jürgen Spitzmüller
Mon, 09 Jul 2007 02:05:27 -0700

Anders Ekberg wrote:
> The patch addresses this by excluding all accented characters from
> the list that revert_unicode processes. In principle this results in
> that accented characters get "properly" translated and remaining
> unicode characters are replaced by ERT or math commands (if they are
> in the list of characters).

And then we have to maintain two unicodesymbols lists? I do not  
think this is

a good idea.
I know, I tried some different options (like searching for {...}),  
but didn't come up with anything I thought was better. So as a  
compromise, I kept the lists identical and just commented out the  
accented characters. If there is little maintenance of the  
unicodesymbols list, I think it is acceptable (you can do a diff to  
fairly easy spott errors).


How about adding a new flag "revert" to the unicodesymbols list  
instead?
I thought about that too, but was afraid this would mess things up in  
other places.


/Anders



Re: [PATCH] bugs 3958, 3313 and 3976 lyx2lyx and unicode characters

2007-07-09 Thread Jürgen Spitzmüller
Anders Ekberg wrote:
> The patch addresses this by excluding all accented characters from  
> the list that revert_unicode processes. In principle this results in  
> that accented characters get "properly" translated and remaining  
> unicode characters are replaced by ERT or math commands (if they are  
> in the list of characters).

And then we have to maintain two unicodesymbols lists? I do not think this is 
a good idea.

How about adding a new flag "revert" to the unicodesymbols list instead?

Jürgen