Re: [O] Tweaking the export

2012-02-17 Thread Nicolas Goaziou
Hello,

Christian Wittern  writes:

>>3. If all went well, you now have an impressive Org to Org converter.
>>   You can even test it with:
>>
>>   #+begin_src emacs-lisp
>>   (switch-to-buffer (org-export-to-buffer 'translator "*Translation*"))
>>   #+end_src
>>
>>   Obviously, there is not much to see.

> It worked wonderful until here.

>> Now, we're going to redefine `org-translator-paragraph' to properly
>> ignore one language or the other, depending on `:translator-side' value.
>>
>> #+begin_src emacs-lisp
>> (defun org-translator-paragraph (paragraph contents info)
>>"Convert PARAGRAPH to Org, ignoring one language.
>> Language kept is determined by `:translator-side' value."
>>(let ((leftp (eq (plist-get info :translator-side) 'left)))
>>  (replace-regexp-in-string
>>   (if leftp "\t+.*$" "^.*\t+") "" contents)))
>> #+end_src
>
> With a little tweaking, I got rid of errors when running this code.
> However, no changes in the output where observable.  Finally, I looked
> at the output from step 3 above and realized that the parser
> normalizes my  characters away.  Only a bunch of spaces in the
> output!  Ouch!!
> So I guess I would need an option on the parser to switch tab expansion off.
>
> I also intended to implement my transformer in a way that I first
> define the general org-e-org transformer and then derive a specialized
> transcormer by somehow inheriting the general transformer and then
> implement my specialized paragraph transformation.   It seems that
> this is at the moment not possible, but I think it would be good to
> think about this, that will make defining new exporters or even
> org-file tweakers a breeze.

In fact the problem is subtle.  For example, you don't want include
keywords to be expanded and babel block to be executed when exporting
from Org to Org.  I've added a noexpand keyword for that.  Hence, you
will need to call your converter with:

#+begin_src emacs-lisp
(switch-to-buffer (org-export-to-buffer 'translator "*Translation*" nil nil nil 
nil 'noexpand))
#+end_src

The TAB problem is different.  I expand tab early because the machine
creating the parse-tree and the machine exporting it may not be the
same.  Tab widths may differ, and it could lead to subtle bugs.  I may
add a :tab-width property in the initial environment.  I'm not sure
about it yet.

Anyway, your tabs have been replaced with spaces, for now. `tab-width'
of them.  Your paragraph translator may then become something like:

#+begin_src emacs-lisp
(defun org-translator-paragraph (paragraph contents info)
   "Convert PARAGRAPH to Org, ignoring one language.
Language kept is determined by `:translator-side' value."
   (let ((leftp (eq (plist-get info :translator-side) 'left)))
 (replace-regexp-in-string
  (format (if leftp " \\{%d,\\}.*$" "^.* \\{%d,\\}") tab-width) "" 
contents)))
#+end_src

Is it better?


Regards,

-- 
Nicolas Goaziou



Re: [O] Tweaking the export

2012-02-03 Thread Christian Wittern

Hi Nicolas,

Thank you very much for taking the time for such a detailed recipe.  Today I 
finally found time to go over it and try to implement my transformer.  It 
turned out to be really easy to get going, but in the end, I hit a roadblock.



On 2012-01-29 18:07, Nicolas Goaziou wrote:


   3. If all went well, you now have an impressive Org to Org converter.
  You can even test it with:

  #+begin_src emacs-lisp
  (switch-to-buffer (org-export-to-buffer 'translator "*Translation*"))
  #+end_src

  Obviously, there is not much to see.


It worked wonderful until here.


Now, we're going to redefine `org-translator-paragraph' to properly
ignore one language or the other, depending on `:translator-side' value.

#+begin_src emacs-lisp
(defun org-translator-paragraph (paragraph contents info)
   "Convert PARAGRAPH to Org, ignoring one language.
Language kept is determined by `:translator-side' value."
   (let ((leftp (eq (plist-get info :translator-side) 'left)))
 (replace-regexp-in-string
  (if leftp "\t+.*$" "^.*\t+") "" contents)))
#+end_src


With a little tweaking, I got rid of errors when running this code.  
However, no changes in the output where observable.  Finally, I looked at 
the output from step 3 above and realized that the parser normalizes my 
 characters away.  Only a bunch of spaces in the output!  Ouch!!

So I guess I would need an option on the parser to switch tab expansion off.

I also intended to implement my transformer in a way that I first define the 
general org-e-org transformer and then derive a specialized transcormer by 
somehow inheriting the general transformer and then implement my specialized 
paragraph transformation.   It seems that this is at the moment not 
possible, but I think it would be good to think about this, that will make 
defining new exporters or even org-file tweakers a breeze.


Anyhow, again thanks for writing the new parser /  exporter and for your 
help with my problem!


All the best,

Christian


--
Christian Wittern, Kyoto




Re: [O] Tweaking the export

2012-01-29 Thread Nicolas Goaziou
Hello,

Christian Wittern  writes:

> Exactly.  The reason for wanting to do this is that the above is my
> setup for translating, but in some cases the publication will have
> only the translation, for such cases, I want to extract just the
> translation.  This should then produce a new org file, that simple has
> either everything before the tab (the original) or everything after
> the tab (the translation), while leaving all lines that do not contain
> a  character as they are.
>
> I assume this would be an easy task with the new exporter -- but still
> a bit at loss on where to start...

>From here, I'll assume that:

  1. you only split paragraphs (not tables, or lists, and so on);
  2. your back-end is called `translator';
  3. you never use tabs in objects (links, latex-fragments).

The first step would be to initialize a property that will allow to
control the side of the paragraph being exported:

#+begin_src emacs-lisp
(defconst org-translator-option-alist
   '((:translator-side nil nil left)))
#+end_src

Another step will be to create the basis of `translator', that is an Org
to Org back-end.

  1. For each ELEMENT in `org-element-all-elements', you need to created
 an appropriate transcoder in the following shape:

 #+begin_src emacs-lisp
 (defun org-translator-ELEMENT (element contents info)
   "Convert ELEMENT from Org to Org syntax."
   (org-element-ELEMENT-interpreter element contents))
 #+end_src

 This can be done quickly with a macro or some elisp.

  2. You should do the same with each OBJECT in
 `org-element-all-successors':

 #+begin_src emacs-lisp
 (defun org-translator-OBJECT (object contents info)
   "Convert OBJECT from Org to Org syntax."
   (org-element-OBJECT-interpreter object contents))
 #+end_src

 Though, you will need to duplicate and rename some functions
 created, as some objects share the same successor. Thus:

 - `org-translator-sub/superscript' will be split into
   `org-translator-subscript' and `org-translator-superscript';

 - `org-translator-text-markup' will be split into
   `org-translator-emphasis' and `org-translator-verbatim';

 - `org-translator-latex-or-entity' will be split into
   `org-translator-entity' and `org-translator-latex-fragment'.

  3. If all went well, you now have an impressive Org to Org converter.
 You can even test it with:

 #+begin_src emacs-lisp
 (switch-to-buffer (org-export-to-buffer 'translator "*Translation*"))
 #+end_src

 Obviously, there is not much to see.

Now, we're going to redefine `org-translator-paragraph' to properly
ignore one language or the other, depending on `:translator-side' value.

#+begin_src emacs-lisp
(defun org-translator-paragraph (paragraph contents info)
  "Convert PARAGRAPH to Org, ignoring one language.
Language kept is determined by `:translator-side' value."
  (let ((leftp (eq (plist-get info :translator-side) 'left)))
(replace-regexp-in-string
 (if leftp "\t+.*$" "^.*\t+") "" contents)))
#+end_src

Eventually, you need to define two commands to respectively keep left
and right parts and save the output in an appropriate file.

#+begin_src emacs-lisp
(defun org-translator-left (file)
  "Save buffer in FILE, with only left language in paragraphs."
  (interactive "FFile (left language): ")
  (org-export-to-file 'translator file))

(defun org-translator-right (file)
  "Save buffer in FILE, with only right language in paragraphs."
  (interactive "FFile (right language): ")
  (org-export-to-file 
'translator file nil nil nil '(:translator-side right)))
#+end_src

This is completely untested.


Regards,

-- 
Nicolas Goaziou



Re: [O] Tweaking the export

2012-01-27 Thread Eric Abrahamsen
On Sat, Jan 28 2012, Christian Wittern wrote:

> Hi, Jambunathan and Nicolas,
>
> On 2012-01-27 22:47, Jambunathan K wrote:
>> Nicolas
>>
>> I will let Christian answer for himself.
> Thanks Jambunathan, you are not only an excellent coder, but also an
> expert mind reader:-)
> What you describe is exactly what I want to achieve.
>
>> text A text A'
>> line 2 line 2
>>
>> My name is Jambunathan. I live   Mon nom est Jambunathan. Je vis 
>> in India.    en India...
>>
>> He wants the "English column" to be collected in to an English file and
>> the "French column" to be collected in to a French file.
>
>> In some sense, he wants to tangle the "English column", let's say as
>> verse_en.org and "French column" to verse_fr.org
>
> Exactly.  The reason for wanting to do this is that the above is my
> setup for translating, but in some cases the publication will have
> only the translation, for such cases, I want to extract just the
> translation.  This should then produce a new org file, that simple has
> either everything before the tab (the original) or everything after
> the tab (the translation), while leaving all lines that do not contain
> a  character as they are.

I also use org mode for translating (from modern Chinese,
coincidentally), and as Sebastien mentioned, I find it easiest to split
a single file into two subtrees, source and target, then split the
window so that I've got the two subtrees side-by-side. You could use
follow-mode at this point, though I don't. Selective export then becomes
trivial, though you'd have a harder time getting it into a two-column
table.

It's always annoying to ask how to do something and then be told to do
something else, so I'm not going to do that, but I do think you might
encounter fewer difficulties making the above setup do what you want,
rather than the TAB arrangement.

Of course, classical Chinese (particularly poetry) lends itself better
to doing discrete chunks one at a timeā€¦ modern prose would be a
nightmare with TABs, though.

I've toyed with a home-made follow-type setup, where the two subtrees
are displayed in split windows as above, and the sub-headings of the two
subtrees have properties pointing to the IDs of their corresponding
sub-heading (ie, source chapters are linked to target chapters and vice
versa). I got about halfway to implementing something where
corresponding paragraphs are highlighted in the non-active window,
before getting distracted by an actual translation deadline.

(The pie-in-the-sky next step would be to use org-mode to maintain a
TMX-formatted translation database
(http://en.wikipedia.org/wiki/Translation_Memory_eXchange), and allow
for automatic insertion of translations of known terms, a library I
expect to have written some time before the obsoletion of Emacs itself.)

Anyway, I'm not sure I had much of a point, but if there are any other
translators using org-mode, it might be interesting to discuss how we
could make it more useful, perhaps in a separate thread.

Eric

-- 
Gnu Emacs 24.0.92.1 (i686-pc-linux-gnu, GTK+ Version 2.24.9)
 of 2012-01-26 on pellet
Org-mode version 7.8.03 (release_7.8.03.249.g742c4e9)




Re: [O] Tweaking the export

2012-01-27 Thread Christian Wittern

Hi Sebastian,

On 2012-01-27 23:03, Sebastien Vauban wrote:
Just a side comment: isn't easier to work in 2 different files or buffers 
(eventually, within the same file) and use some sort of "parallel" 
follow-mode? I thought such a thing existed, but can't find it back right 
now. Anyway, it would be quite easy to implement: it's more or less 
implementing C-v/M-v so that it's done in two parallel buffers at the same 
time, instead of just in one!? Best regards, Seb 
What you describe is Two-Column mode, and this was suggested by Jambunathan 
before.  I did try this alley, but for me org-mode works better.  One of the 
reasons for this is, that there are some structural aspects that are common 
to both files.  Another reason is that I want to be able grep through the 
files and be able to see matching lines in both languages -- this helps me 
ensure a consistent translation.  So the current setup is really nice for me 
for doing the work, but now I need to construct the pipeline for 
publication.  As Jambunathan put it, this is really a problem of tangling 
the output.


BTW, I think the general exporter should also be able to to a org-mode to 
org-mode conversion.  This would provide a general framework to 
systematically correct little problems in files.  I guess here it shows that 
I am coming from the XML world, where a conversion from one XML file to 
another XML file with slight alterations of some aspects is a very common 
pattern.


All the best,

Christian

--
Christian Wittern, Kyoto




Re: [O] Tweaking the export

2012-01-27 Thread Christian Wittern

Hi, Jambunathan and Nicolas,

On 2012-01-27 22:47, Jambunathan K wrote:

Nicolas

I will let Christian answer for himself.
Thanks Jambunathan, you are not only an excellent coder, but also an expert 
mind reader:-)

What you describe is exactly what I want to achieve.


text A text A'
line 2 line 2

My name is Jambunathan. I live  Mon nom est Jambunathan. Je vis 
in India.   en India...

He wants the "English column" to be collected in to an English file and
the "French column" to be collected in to a French file.



In some sense, he wants to tangle the "English column", let's say as
verse_en.org and "French column" to verse_fr.org


Exactly.  The reason for wanting to do this is that the above is my setup 
for translating, but in some cases the publication will have only the 
translation, for such cases, I want to extract just the translation.  This 
should then produce a new org file, that simple has either everything before 
the tab (the original) or everything after the tab (the translation), while 
leaving all lines that do not contain a  character as they are.


I assume this would be an easy task with the new exporter -- but still a bit 
at loss on where to start...


All the best,

Christian




--
Christian Wittern, Kyoto




Re: [O] Tweaking the export

2012-01-27 Thread Sebastien Vauban
Hi all,

Jambunathan K wrote:
> Nicolas
>
> I will let Christian answer for himself.
>
>> [Nicolas]
>> While I understand the shape of your input, I fail to see what you
>> output should you look like. For example, given the following paragraph,
>>
>> text A   text A'
>> line 2   line 2 bis
>> A line with *emphasis*   A traduced line with *emphasis*
>>
>
>>> [Christian]
>>> I need to separate these two parts in separate texts; the stuff to the
>>> left of the  has to go into one file, the stuff to the right to
>>> some other file, 
>
>>> while at the same time merging the chunks of texts
>>> into paragraphs.
>
> If I interpret the above lines, I imagine his request more along the
> following lines:
>
> text A text A'
> line 2 line 2
>
> My name is Jambunathan. I liveMon nom est Jambunathan. Je vis 
> in India. en India...
>
> He wants the "English column" to be collected in to an English file and
> the "French column" to be collected in to a French file.
>
> It is possible that "English column" constitutes a poem and the "French
> column" is a line-by-line translation of the column to the left.
>
> In some sense, he wants to tangle the "English column", let's say as
> verse_en.org and "French column" to verse_fr.org and later include them
> as a table cell or a column of a 2-C section. 
>
> Notionally something like:
> |+---|
> |#+INCLUDE: verse_en.org |#+INCLUDE: verse_fr.org|
> |+---|
>
> Put another way, collect Column-X in to Paragraph-X and do whatver.
>
> ps: French translation is courtesy google.

Just a side comment: isn't easier to work in 2 different files or buffers
(eventually, within the same file) and use some sort of "parallel"
follow-mode?  I thought such a thing existed, but can't find it back right
now.

Anyway, it would be quite easy to implement: it's more or less implementing
C-v/M-v so that it's done in two parallel buffers at the same time, instead of
just in one!?

Best regards,
  Seb

-- 
Sebastien Vauban




Re: [O] Tweaking the export

2012-01-27 Thread Jambunathan K
Nicolas

I will let Christian answer for himself.

> [Nicolas]
> While I understand the shape of your input, I fail to see what you
> output should you look like. For example, given the following paragraph,
>
> text Atext A'
> line 2line 2 bis
> A line with *emphasis*A traduced line with *emphasis*
>

>> [Christian]
>> I need to separate these two parts in separate texts; the stuff to the
>> left of the  has to go into one file, the stuff to the right to
>> some other file, 

>> while at the same time merging the chunks of texts
>> into paragraphs.

If I interpret the above lines, I imagine his request more along the
following lines:

text A text A'
line 2 line 2

My name is Jambunathan. I live  Mon nom est Jambunathan. Je vis 
in India.   en India...

He wants the "English column" to be collected in to an English file and
the "French column" to be collected in to a French file.

It is possible that "English column" constitutes a poem and the "French
column" is a line-by-line translation of the column to the left.

In some sense, he wants to tangle the "English column", let's say as
verse_en.org and "French column" to verse_fr.org and later include them
as a table cell or a column of a 2-C section. 

Notionally something like:
|+---|
|#+INCLUDE: verse_en.org |#+INCLUDE: verse_fr.org|
|+---|

Put another way, collect Column-X in to Paragraph-X and do whatver.

ps: French translation is courtesy google.
-- 



Re: [O] Tweaking the export

2012-01-27 Thread Nicolas Goaziou
Hello,

Christian Wittern  writes:

> For the last couple of years, I have used org-mode more and more for
> working with and translating texts from classical Chinese.  Over time,
> some special conventions have crept in, like the fact that I like (for
> the draft translation) to work in a way that has a short chunk of
> Chinese text on the left and, separated by a  character, the
> translation of that piece following on the same line (there are other
> special conventions like specialized drawers etc., but I don't need to
> discuss these here now.)
>
> While this is setup is extremely pleasant to work with, at some point
> I need to separate these two parts in separate texts; the stuff to the
> left of the  has to go into one file, the stuff to the right to
> some other file, while at the same time merging the chunks of texts
> into paragraphs.   Now for quite some while if have thought about how
> to automate that, but until now, I have usually done it by hand with
> a couple of regex search-and-replace.
>
> Now, with the new export engine, it looks like all I would need to do
> would be to tweak the way paragraphs are handled, while leaving the
> rest intact, some kind of org to org transform that simply tweaks one
> single aspect of the text.  However, I am a bit baffled on where to
> start with this.  I would be glad if you or somebody else could give
> me some pointers at how to tackle this problem.  (And please be kind,
> since my elisp fu is pretty insignificant:-(  )

While I understand the shape of your input, I fail to see what you
output should you look like. For example, given the following paragraph,

--8<---cut here---start->8---
text A  text A'
line 2  line 2 bis
A line with *emphasis*  A traduced line with *emphasis*
--8<---cut here---end--->8---

what exactly do you want to obtain ?


Regards,

-- 
Nicolas Goaziou