On Sun, 13 Feb 2022 at 09:04, Jürgen Spitzmüller <sp...@lyx.org> wrote:
> Am Sonntag, dem 13.02.2022 um 04:19 +0100 schrieb Thibaut Cuvelier: > > You mean, with code like > > > https://github.com/cburschka/lyx/blob/d3c335a5d524e2edeb73ae1a891fcc58ba5bfd1a/src/BiblioInfo.cpp#L421-L428 > > for the search? I thought it would be good to have a file to store > > this information, but I wasn't aware of unicodesymbols. I believe > > that the file shouldn't even be modified at all, thanks to the > > presence of the Unicode character number at the beginning of the line > > (0x00c0 "\\`{A}", whith 0xC0 corresponding to 192, > > > https://github.com/cburschka/lyx/blob/master/src/insets/InsetERT.cpp#L131 > > ). > > > > Based on the contents of unicodesymbols, how could I match " \`{A}", > > "\`A", and "\` A" at once? Should I just use tricks like > > > https://github.com/cburschka/lyx/blob/d3c335a5d524e2edeb73ae1a891fcc58ba5bfd1a/src/BiblioInfo.cpp#L414-L418 > > (which I'm already doing, in a sense, in > > > https://github.com/cburschka/lyx/blob/master/src/insets/InsetERT.cpp#L452-L463 > > )? > > I don't know how to do it exactly, but yes, I mean that the information > you need here should all be in unicodesymbols, or added if not, and > could be retrieved by the methods defined in Encoding.cpp. > > There should be no need to store LaTeX<>Unicode mappings anywhere else. > Thanks, I just did that (with a small test file): a460097823. However, this test showed a limitation in the current unicodesymbols: there can be only one LaTeX command per symbol. This is a limitation in only a few cases, like LyX Document \textexclamdown and !`: both of them are mapped to ¡ (i.e. ¡), but the file only allows for one mapping. I would have no problem saying that this is a corner case that can be easily ignored, but after all I dived into Unicode mapping within ERTs for DocBook to handle corner cases… (Albeit not in Spanish.) From a memory-consumption point of view, supporting several commands for one symbol would require to store more than one string in CharInfo, potentially even a vector of strings for all entries (even those that have only one command): that's a 24 bytes overhead ( https://stackoverflow.com/a/34035291/1066843) for roughly 4000 entries; that's not so large. If we decide to solve this problem, we could have several solutions (all modifying Encodings::read), I could think of two: - either use a separator symbol in the latexcommand part of each unicodesymbols line, but it would be hard to find a single character that is never used for latexcommands - or have multiple lines for a single character, with duplicate information for the second one or a simpler line format for these entries. For instance, for the inverted exclamation mark: 0x00a1 "\\textexclamdown" "" "force=cp862;cp1255;euc-jp;euc-jp-platex;euc-kr;utf8-platex" # INVERTED EXCLAMATION MARK 0x00a1 "!`" # Implicitly, all the other parameters still apply What do you think of this? Should this be done? What would be the preferred solution, if so? (Of course, I offer to do this refactoring :).)
-- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel