Op 24-6-2012 6:38, Uwe Stöhr schreef:
commit a3f4f2d1e3c2c9cb6907859a72e9e7e0592fbdc8
Author: Uwe Stöhr<uwesto...@lyx.org>
Date: Sun Jun 24 06:38:33 2012 +0200
CJK support for tex2lyx
- support as best as possible; setting a document language is however not
possible
Could you please indicate what "as good as possible" means and why we
can't set the document language ? I know I can figure that out myself,
but it would be much easier if you described it in the commit log while
you are at the subject.
I don't know whether we settled to use metadata to enable LyX-LaTeX
roundtrips, buf if we do so, would it be an idea to set some LyX
metadata to allow a succesfull roundtrip for CJK languages ?
+/**
+ * supported CJK encodings
+ */
+const char * const supported_CJK_encodings[] = {
+"EUC-JP", "KS", "GB", "UTF8", 0};
+
+/**
+ * the same as supported_CJK_encodings with .lyx names
+ * please keep this in sync with supported_CJK_encodings line by line!
+ */
+const char * const coded_supported_CJK_encodings[] = {
+"japanese-cjk", "korean", "chinese-simplified", "chinese-traditional", 0};
What does "coded_supported_CJK_encodings" mean, what is 'coded' about it?
These relations aren't bijective. "chinese-traditional" might imply
"UTF8", but "UTF8" does not imply "chinese-traditional" right ? If so,
CJK2lyx function is wrong.
What about all other encodings supported by CJK ? Below you hardcode
that if SJIS, JIS or Big5 is defined we use utf8 instead. Why not
collecting all this information in a struct here ? In a sense tex2lyx
also supports these other encodings ?
Why are the following entries in lib/encodings not supported ?
# For japanese
Encoding jis JIS "Japanese (CJK) (JIS)" ISO-2022-JP variable CJK
End
# For traditional chinese
Encoding euc-tw EUC-TW "Chinese (traditional) (EUC-TW)" EUC-TW variable CJK
End
Isn't it possible to reuse the information from lib/encodings ?
diff --git a/src/tex2lyx/Parser.h b/src/tex2lyx/Parser.h
index c0c5685..3cf3dbd 100644
--- a/src/tex2lyx/Parser.h
+++ b/src/tex2lyx/Parser.h
@@ -251,6 +251,8 @@ public:
void setCatCode(char c, CatCode cat);
///
CatCode getCatCode(char c) const;
+ /// latex name of the current encoding
+ std::string encoding_latex_;
private:
///
@@ -265,8 +267,6 @@ private:
idocstringstream * iss_;
///
idocstream& is_;
- /// latex name of the current encoding
- std::string encoding_latex_;
};
As Pavel already said, please don't make member variables public just
because it is easy. Not only did Pavel suggest to use get** and set**
methods, these methods are already there, and you even already use them
in your code.
@@ -734,6 +734,16 @@ void Preamble::handle_package(Parser&p, string const&
name,
p.setEncoding("utf8");
}
+ else if (name == "CJK") {
+ // It is impossible to determine the document language if CJK
is used.
+ // All we can do is to notify the user that he has to set this
by hisself.
hisself -> himself
@@ -1433,6 +1454,57 @@ void parse_environment(Parser& p, ostream& os, bool
outer,
os<< "\n\\begin_layout Standard\n";
}
+ else if (name == "CJK") {
+ // the scheme is \begin{CJK}{encoding}{mapping}{text}
+ // It is impossible to decide if a CJK environment was in its
own paragraph or within
+ // a line. We therefore always assume a paragraph since the
latter is a rare case.
Why is this impossible ?
+ eat_whitespace(p, os, parent_context, false);
+ parent_context.check_end_layout(os);
+ // store the encoding to be able to reset it
+ string const encoding_old = p.encoding_latex_;
+ string const encoding = p.getArg('{', '}');
+ // SJIS and BIG5 don't work with LaTeX according to the comment
in unicode.cpp
The comment in unicode.cpp just points to the comments in lib/encodings.
+ // JIS does not work with LyX's encoding conversion
+ if (encoding != "SJIS"&& encoding != "BIG5"&& encoding !=
"JIS")
+ p.setEncoding(encoding);
+ else
+ p.setEncoding("utf8");
Ugh.. hardcoding.
+ // LyX doesn't support the second argument so if
+ // this is used we need to output everything as ERT
+ string const mapping = p.getArg('{', '}');
+ if ( (!mapping.empty()&& mapping != " ")
one space too much ..
+ || (!is_known(encoding, supported_CJK_encodings))) {
+ parent_context.check_layout(os);
+ handle_ert(os, "\\begin{" + name + "}{" + encoding + "}{" +
mapping + "}",
+ parent_context);
+ // we must parse the content as verbatim because e.g.
SJIS can contain
+ // normally invalid characters
+ string const s = p.plainEnvironment("CJK");
+ string::const_iterator it2 = s.begin();
it2 is unused ??
+ for (string::const_iterator it = s.begin(), et =
s.end(); it != et; ++it) {
+ if (*it == '\\')
+ handle_ert(os, "\\", parent_context);
+ else if (*it == '$')
+ handle_ert(os, "$", parent_context);
+ else
+ os<< *it;
+ }
+ p.skip_spaces();
+ handle_ert(os, "\\end{" + name + "}",
+ parent_context);
+ } else {
+ string const lang = CJK2lyx(encoding);
+ // store the language because we must reset it at the
end
+ string const lang_old = parent_context.font.language;
+ parent_context.font.language = lang;
+ parse_text_in_inset(p, os, FLAG_END, outer,
parent_context);
+ parent_context.font.language = lang_old;
+ parent_context.new_paragraph(os);
+ }
+ p.encoding_latex_ = encoding_old;
+ p.skip_spaces();
+ }
+
else if (name == "lyxgreyedout") {
eat_whitespace(p, os, parent_context, false);
parent_context.check_layout(os);
@@ -2029,6 +2101,24 @@ void parse_text(Parser& p, ostream& os, unsigned
flags, bool outer,
while (p.good()) {
Token const& t = p.get_token();
+ // it is impossible to determine the correct document language if CJK
is used.
+ // Therefore write a note at the beginning of the document
+ if (have_CJK) {
+ context.check_layout(os);
+ begin_inset(os, "Note Note\n");
+ os<< "status open\n\\begin_layout Plain Layout\n"
+ << "\\series bold\n"
+ << "Important information:\n"
+ << "\\end_layout\n\n"
+ << "\\begin_layout Plain Layout\n"
+ << "This document contains text in Chinese, Japanese or
Korean.\n"
+ << " It was therefore impossible for tex2lyx to set the correct
document langue for your document."
langue -> language
Shouldn't this text be translated somehow ?
I guess many users won't get the logic of 'therefore' ? An english
document with some chinese text would be set to english correctly. A
Japanese document not using CJK will also be converted correctly.
+ << " Please set in the document settings by yourself!\n"
I don't like the 'by yourself' -> " Please set the language manually in
the document settings.\n"
Vincent