Re: [LyX master] CJK support for tex2lyx

Vincent van Ravesteijn Mon, 25 Jun 2012 00:28:20 -0700

Op 24-6-2012 6:38, Uwe Stöhr   schreef:

commit a3f4f2d1e3c2c9cb6907859a72e9e7e0592fbdc8
Author: Uwe StÃ¶hr<uwesto...@lyx.org>
Date:   Sun Jun 24 06:38:33 2012 +0200


     CJK support for tex2lyx

     - support as best as possible; setting a document language is however not 
possible

Could you please indicate what "as good as possible" means and why wecan't set the document language ? I know I can figure that out myself,but it would be much easier if you described it in the commit log whileyou are at the subject.

I don't know whether we settled to use metadata to enable LyX-LaTeXroundtrips, buf if we do so, would it be an idea to set some LyXmetadata to allow a succesfull roundtrip for CJK languages ?

+/**
+ * supported CJK encodings
+ */
+const char * const supported_CJK_encodings[] = {
+"EUC-JP", "KS", "GB", "UTF8", 0};
+
+/**
+ * the same as supported_CJK_encodings with .lyx names
+ * please keep this in sync with supported_CJK_encodings line by line!
+ */
+const char * const coded_supported_CJK_encodings[] = {
+"japanese-cjk", "korean", "chinese-simplified", "chinese-traditional", 0};


What does "coded_supported_CJK_encodings" mean, what is 'coded' about it?

These relations aren't bijective. "chinese-traditional" might imply"UTF8", but "UTF8" does not imply "chinese-traditional" right ? If so,CJK2lyx function is wrong.

What about all other encodings supported by CJK ? Below you hardcodethat if SJIS, JIS or Big5 is defined we use utf8 instead. Why notcollecting all this information in a struct here ? In a sense tex2lyxalso supports these other encodings ?


Why are the following entries in lib/encodings not supported ?

# For japanese
Encoding jis JIS "Japanese (CJK) (JIS)" ISO-2022-JP variable CJK
End
# For traditional chinese
Encoding euc-tw EUC-TW "Chinese (traditional) (EUC-TW)" EUC-TW variable CJK
End

Isn't it possible to reuse the information from lib/encodings ?

diff --git a/src/tex2lyx/Parser.h b/src/tex2lyx/Parser.h
index c0c5685..3cf3dbd 100644
--- a/src/tex2lyx/Parser.h
+++ b/src/tex2lyx/Parser.h
@@ -251,6 +251,8 @@ public:
        void setCatCode(char c, CatCode cat);
        ///
        CatCode getCatCode(char c) const;
+       /// latex name of the current encoding
+       std::string encoding_latex_;

  private:
        ///
@@ -265,8 +267,6 @@ private:
        idocstringstream * iss_;
        ///
        idocstream&  is_;
-       /// latex name of the current encoding
-       std::string encoding_latex_;
  };

As Pavel already said, please don't make member variables public justbecause it is easy. Not only did Pavel suggest to use get** and set**methods, these methods are already there, and you even already use themin your code.


@@ -734,6 +734,16 @@ void Preamble::handle_package(Parser&p, string const&  
name,
                        p.setEncoding("utf8");
        }

+       else if (name == "CJK") {
+               // It is impossible to determine the document language if CJK 
is used.
+               // All we can do is to notify the user that he has to set this 
by hisself.

hisself -> himself

@@ -1433,6 +1454,57 @@ void parse_environment(Parser&  p, ostream&  os, bool 
outer,
                os<<  "\n\\begin_layout Standard\n";
        }

+       else if (name == "CJK") {
+               // the scheme is \begin{CJK}{encoding}{mapping}{text}
+               // It is impossible to decide if a CJK environment was in its 
own paragraph or within
+               // a line. We therefore always assume a paragraph since the 
latter is a rare case.


Why is this impossible ?

+               eat_whitespace(p, os, parent_context, false);
+               parent_context.check_end_layout(os);
+               // store the encoding to be able to reset it
+               string const encoding_old = p.encoding_latex_;
+               string const encoding = p.getArg('{', '}');
+               // SJIS and BIG5 don't work with LaTeX according to the comment 
in unicode.cpp

The comment in unicode.cpp just points to the comments in lib/encodings.

+               // JIS does not work with LyX's encoding conversion
+               if (encoding != "SJIS"&&  encoding != "BIG5"&&  encoding != 
"JIS")
+                       p.setEncoding(encoding);
+               else
+                       p.setEncoding("utf8");


Ugh.. hardcoding.

+               // LyX doesn't support the second argument so if
+               // this is used we need to output everything as ERT
+               string const mapping = p.getArg('{', '}');
+               if ( (!mapping.empty()&&  mapping != " ")


one space too much ..

+                       || (!is_known(encoding, supported_CJK_encodings))) {
+                       parent_context.check_layout(os);
+                       handle_ert(os, "\\begin{" + name + "}{" + encoding + "}{" + 
mapping + "}",
+                                      parent_context);
+                       // we must parse the content as verbatim because e.g. 
SJIS can contain
+                       // normally invalid characters
+                       string const s = p.plainEnvironment("CJK");
+                       string::const_iterator it2 = s.begin();


it2 is unused ??

+                       for (string::const_iterator it = s.begin(), et = 
s.end(); it != et; ++it) {
+                               if (*it == '\\')
+                                       handle_ert(os, "\\", parent_context);
+                               else if (*it == '$')
+                                       handle_ert(os, "$", parent_context);
+                               else
+                                       os<<  *it;
+                       }
+                       p.skip_spaces();
+                       handle_ert(os, "\\end{" + name + "}",
+                                      parent_context);
+               } else {
+                       string const lang = CJK2lyx(encoding);
+                       // store the language because we must reset it at the 
end
+                       string const lang_old = parent_context.font.language;
+                       parent_context.font.language = lang;
+                       parse_text_in_inset(p, os, FLAG_END, outer, 
parent_context);
+                       parent_context.font.language = lang_old;
+                       parent_context.new_paragraph(os);
+               }
+               p.encoding_latex_ = encoding_old;
+               p.skip_spaces();
+       }
+
        else if (name == "lyxgreyedout") {
                eat_whitespace(p, os, parent_context, false);
                parent_context.check_layout(os);
@@ -2029,6 +2101,24 @@ void parse_text(Parser&  p, ostream&  os, unsigned 
flags, bool outer,
        while (p.good()) {
                Token const&  t = p.get_token();

+       // it is impossible to determine the correct document language if CJK 
is used.
+       // Therefore write a note at the beginning of the document
+       if (have_CJK) {
+               context.check_layout(os);
+               begin_inset(os, "Note Note\n");
+               os<<  "status open\n\\begin_layout Plain Layout\n"
+               <<  "\\series bold\n"
+               <<  "Important information:\n"
+               <<  "\\end_layout\n\n"
+               <<  "\\begin_layout Plain Layout\n"
+               <<  "This document contains text in Chinese, Japanese or 
Korean.\n"
+               <<  " It was therefore impossible for tex2lyx to set the correct 
document langue for your document."

langue -> language
Shouldn't this text be translated somehow ?

I guess many users won't get the logic of 'therefore' ? An englishdocument with some chinese text would be set to english correctly. AJapanese document not using CJK will also be converted correctly.

+               <<  " Please set in the document settings by yourself!\n"

I don't like the 'by yourself' -> " Please set the language manually inthe document settings.\n"


Vincent

Re: [LyX master] CJK support for tex2lyx

Reply via email to