Re: [O] Embedded LaTeX does not work with Unicode quotes
Hello, Florian Beck f...@miszellen.de writes: Nick Dokos ndo...@gmail.com writes: punctuation in the syntax tables. Look for org-latex-regexps in org.el The line in question is #+BEGIN_SRC emacs-lisp ($ \\([^$]\\|^\\)\\(\\(\\$\\([^ \r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ \r\n,.$]\\)\\$\\)\\)\\([- .,?;:'\)\000]\\|$\\) 2 nil) #+END_SRC It's probably not too hard to see that the culprit is the bunch of punctuation characters towards the end. Indeed if you change .,?;:'\ to .,?;:'\” -- that solves the OPs problem. However, it might be even better to use a more general syntax, [:punct:], which matches all punctuation (as we want). So: #+BEGIN_SRC emacs-lisp ($ \\([^$]\\|^\\)\\(\\(\\$\\([^ \r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ \r\n,.$]\\)\\$\\)\\)\\([- [:punct:]\000]\\|$\\) 2 nil) #+END_SRC Actually this variable is hardly used throughout Org code base. See org-element-latex-fragment-parser instead (which has the same problem anyway). Also, according to Elisp manual; [:punct:] is not ideal either: `[:punct:]' This matches any punctuation character. (At present, for multibyte characters, it matches anything that has non-word syntax.) There is also \s.. Anyway, it might be better to know exactly what kind of false positives we want to avoid. Regards, -- Nicolas Goaziou
Re: [O] Embedded LaTeX does not work with Unicode quotes
On 2014-11-12, at 07:05, Nick Dokos wrote: Marcin Borkowski mb...@wmi.amu.edu.pl writes: Hi list, I have this: „$n\eps\le b$”, and it seems not to be recognized as a LaTeX fragment. The manual says: To avoid conflicts with currency specifications, single `$' characters are only recognized as math delimiters if the enclosed text contains at most two line breaks, is directly attached to the `$' characters with no whitespace in between, and if the closing `$' is followed by whitespace, punctuation or a dash. When I C-u C-x = on the closing quote, I get ... syntax: . which means: punctuation ... so I don't know why it is not recognized as punctuation. Consequently, it is exported verbatim (with `\$') into LaTeX, and also (obviously) C-c C-x C-l does not fontify it. When I change ” into (the ASCII #x22 quote), everything is ok. The $...$ construct is recognized by a regexp which, while complicated, is not complicated enough to recognize everything that's marked punctuation in the syntax tables. Look for org-latex-regexps in org.el (and note that the regexp for $ is about twice as long as the next longest regexp - the one for begin). The others (for \(...\), \[...\] and $$..$$) are fairly trivial. My questions: 1. Isn't it a bug? Yes, probably - but looking at the regexp, I cringe: I don't want to even try deciphering it, let alone change it - life's too short... Ah, regex. I have no more questions... 2. If not, what can I do to in my config so that it is recognized properly? PS. I just recalled that using \(...\) should help, and indeed it does. Still, I'm curious about the answer to my questions (now that I remembered a workaround, especially #1). That is indeed the best solution. Yep. Thanks! -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Adam Mickiewicz University
Re: [O] Embedded LaTeX does not work with Unicode quotes
Nick Dokos ndo...@gmail.com writes: punctuation in the syntax tables. Look for org-latex-regexps in org.el The line in question is #+BEGIN_SRC emacs-lisp ($ \\([^$]\\|^\\)\\(\\(\\$\\([^ \r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ \r\n,.$]\\)\\$\\)\\)\\([- .,?;:'\)\000]\\|$\\) 2 nil) #+END_SRC It's probably not too hard to see that the culprit is the bunch of punctuation characters towards the end. Indeed if you change .,?;:'\ to .,?;:'\” -- that solves the OPs problem. However, it might be even better to use a more general syntax, [:punct:], which matches all punctuation (as we want). So: #+BEGIN_SRC emacs-lisp ($ \\([^$]\\|^\\)\\(\\(\\$\\([^ \r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ \r\n,.$]\\)\\$\\)\\)\\([- [:punct:]\000]\\|$\\) 2 nil) #+END_SRC -- Florian Beck
[O] Embedded LaTeX does not work with Unicode quotes
Hi list, I have this: „$n\eps\le b$”, and it seems not to be recognized as a LaTeX fragment. The manual says: To avoid conflicts with currency specifications, single `$' characters are only recognized as math delimiters if the enclosed text contains at most two line breaks, is directly attached to the `$' characters with no whitespace in between, and if the closing `$' is followed by whitespace, punctuation or a dash. When I C-u C-x = on the closing quote, I get position: 54465 of 108125 (50%), restriction: 52496-56766, column: 152 character: ” (displayed as ”) (codepoint 8221, #o20035, #x201d) preferred charset: unicode (Unicode (ISO10646)) code point in charset: 0x201D syntax: .which means: punctuation category: .:Base, c:Chinese, h:Korean, j:Japanese to input: type C-x 8 RET HEX-CODEPOINT or C-x 8 RET NAME buffer code: #xE2 #x80 #x9D file code: #xE2 #x80 #x9D (encoded by coding system utf-8-unix) display: by this font (glyph code) xft:-unknown-Ubuntu Mono-normal-normal-normal-*-17-*-*-*-m-0-iso10646-1 (#x71) Character code properties: customize what to show name: RIGHT DOUBLE QUOTATION MARK old-name: DOUBLE COMMA QUOTATION MARK general-category: Pf (Punctuation, Final quote) decomposition: (8221) ('”') There are text properties here: fontifiedt so I don't know why it is not recognized as punctuation. Consequently, it is exported verbatim (with `\$') into LaTeX, and also (obviously) C-c C-x C-l does not fontify it. When I change ” into (the ASCII #x22 quote), everything is ok. My questions: 1. Isn't it a bug? 2. If not, what can I do to in my config so that it is recognized properly? PS. I just recalled that using \(...\) should help, and indeed it does. Still, I'm curious about the answer to my questions (now that I remembered a workaround, especially #1). TIA, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Adam Mickiewicz University