Re: [NTG-context] Ugly hack for multiple MSWord docs.

2006-06-19 Thread luigi scarso
It's also true that
On 8 May 2006, the International Organization for Standardization
(ISO) and the International
Electrotechnical Commission (IEC) approved the OpenDocument Format
(ODF) for release as ISO/IEC 26300
ODF can be an important xml format in next years.

luigi
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


[NTG-context] Ugly hack for multiple MSWord docs.

2006-06-15 Thread John R. Culleton
Frequently I find myself in the position of needing to combine
several MSWord and/or rtf documents into a single file for either
pdftex or Context. I have settled on this strategy. 

1. If necessary I convert the documents to rtf with Open Ofice
Writer. 
2. I convert the resulting  rtf documents to LaTeX using rtf2latex2e.
3. I need to rename some of the LaTeX commands to their plain 
TeX or Context equivalents, and simply ignore others. Instead of
editing each and every occurrence, I add the following to my
macros.tex file which heads up the document:

\def\documentclass{}
\def\newcommand{}
\def\usepackage{}
\def\tab{}
\def\hspace{}
\def\begin{}
\def\end{}
\def\textbf#1{\bf #1}
\def\nobreakspace{~}
\def\underline{}
\def\newpage{}
\def\textmd#1{\rm #1}
\def\textit#1{\it #1}
\def\large{\tfb}
\def\reg{\rm\char174\ }
\def\textregistered{\reg}
--

I create a master file that calls in each of the .tex files
and compile the whole goulash. If I missed a latex tag then I add
it to my \defs shown above and recompile until I get a
clean run. Now I have a readable pdf file and can start correcting
the format. 

The scattered Latex tags give me hints where centering etc. might
be needed even though the tags are inoperative in Context, thanks
to my nullifying \def statements shown above.  

Someday there will be an elegant solution to the MSWord to
Context problem. For now there is my ugly hack as described here.

-- 
John Culleton




___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Ugly hack for multiple MSWord docs.

2006-06-15 Thread Hans Hagen
John R. Culleton wrote:
 Someday there will be an elegant solution to the MSWord to
 Context problem. For now there is my ugly hack as described here.
   
maybe the word xml output, since that can be parsed

Hans 

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Ugly hack for multiple MSWord docs.

2006-06-15 Thread Bob Kerstetter

On Jun 13, 2006, at 5:29 PM, John R. Culleton wrote:

 Frequently I find myself in the position of needing to combine
 several MSWord and/or rtf documents into a single file for either
 pdftex or Context. I have settled on this strategy.



 snip

 Someday there will be an elegant solution to the MSWord to
 Context problem. For now there is my ugly hack as described here.


MEMORY DISCLAIMER: In these examples none of the function names are  
really what they are in Word or VB for Word. The functions are  
available in VB for Word, but it's been some time since I've done  
this, i don't have the macros these days and don't really know the  
real names anymore. So they are just representative of the functions  
available.

STYLE COMMENT: These methods should work even if styles are not being  
used. For example the primary heading may be Arial, 18pt, bold and  
not the Heading 1 style. That's okay because you can search for font  
attributes in Word. If the document is not consistent, well, convert  
to text and markup manually. :)



MORE OR LESS CURRENT EXAMPLE

It's not particularly elegant, but I used to convert from MSWord to  
whatever by writing VB find/replace macros based on styles and  
formatting. In newer versions of Word (at least on OS X), Replace has  
a function that includes what you found, plus you can add other text.

Example:

Find: Heading 1%find stuff formatted with heading 1 style

Replace: \subject{WhatItFound}   %replaces what it found and  
wraps \subject{} around it.


Because Word stores its formatting in the line feed/carriage return,  
for paragraph styles you end up with something like this:

\subject{Some TeX
}

So my last VB find/replace removes the carriage returns globally:

Find: ^p}
Replace: }


When done with all find/replace functions, save as text.

That's it.


Not being much of a script writer, I record the first find/replace,  
then edit the macro and duplicate the find/replace as needed.

The VB find/replace function has options for starting at the top of  
the file, replacing globally, continuing if nothing is found and that  
sort of thing.

The macro looks something like this:

Find: Heading 1%find stuff formatted with heading 1 style
Replace: \subject{WhatItFound}   %replaces what it found and  
wraps \subject{} around it.

Find: Heading 2%find stuff formatted with heading 2 style
Replace: \subsubject{WhatItFound}   %replaces what it found and  
wraps \subsubject{} around it.

Find: Heading 3%find stuff formatted with heading 3 style
Replace: \subsubject{WhatItFound}   %replaces what it found and  
wraps \subsubsubject{} around it.


The above method uses global replacement and it's pretty zippy, for  
Word.



ANOTHER OLDER METHOD

Another method I used before Find/Replace had the WhatItFound  
function was to put the found string into a variable, then use that  
variable for the replacement text, plus any TeX control sequences  
wrapped around it.

In summary:

1. Put your finds and replaces in an array:
ArrayFind(0) Heading 1; ArrayReplace(0) \subject{
ArrayFind(1) Heading 2; ArrayReplace(1) \subsubject{
ArrayFind(2) Heading 3; ArrayReplace(2) \subsubsubject{
Note the closing } is missing. It is hardcoded in the replacement code.

2. Find the first array item starting from the top of the document.  
This highlights the text in Word:
Find = $ArrayFind(n)

3. Put the highlighted text into a variable. Maybe you can even strip  
the CR's from formatted pagagraphs:
stripCarriageReturns($FoundThisStuff) = CurrentSelection


4. Put the variable and the first replace item in the Word Replace  
function. Note the hard coded closing bracket. And the CR assuming  
you stripped the CR in step 3:
Replace = $ArrayReplace(n)+$FoundThisStuff+}+CR

5. Repeatedly use Replace and Find Next until nothing else is found.
Replace and Find Next
.
.
.

6. Repeatedly find the next array item to the end of the array.
n = n + 1
Find = $ArrayFind(n)
.
.
.

7. Save the file as text.
FilesSaveAs using the text option


Hum. After thinking about this and typing it in, maybe I should still  
use the OLD method. It appears to be a little easier to manage. Maybe  
a lot easier.
Oh well, not a real programmer.
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Ugly hack for multiple MSWord docs.

2006-06-15 Thread John R. Culleton
On Thursday 15 June 2006 08:50, Hans Hagen wrote:
 John R. Culleton wrote:
  Someday there will be an elegant solution to the MSWord to
  Context problem. For now there is my ugly hack as described here.

 maybe the word xml output, since that can be parsed

 Hans
Interesting suggestion. I don't have a copy of MSWord. And my
clients are naive so that asking them to save in exotic formats
is likely to be unproductive. 

Open Office does not save as xml. Abiword, however does. In a
simplistic test case (Now is the time for all good men.)
Abiword saved the document as xml with a little coaxing and
texexec compiled it clean. So at least there is something there
to experiment with. 

Next I will try a real MSWord document, save it as xml from
Abiword, and see what Context does with it.

One question: How do I mix in the necessary Context commands such
as papersize, font selection etc.? What are the rules and no-nos
for blending Context commands into an xml document?

-- 
John Culleton
Books with answers to marketing and publishing questions:
http://wexfordpress.com/tex/shortlist.pdf

Book coaches, consultants and packagers:
http://wexfordpress.com/tex/packagers.pdf

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Ugly hack for multiple MSWord docs.

2006-06-15 Thread Hans Hagen
John R. Culleton wrote:
 On Thursday 15 June 2006 08:50, Hans Hagen wrote:
   
 John R. Culleton wrote:
 
 Someday there will be an elegant solution to the MSWord to
 Context problem. For now there is my ugly hack as described here.
   
 maybe the word xml output, since that can be parsed

 Hans
 
 Interesting suggestion. I don't have a copy of MSWord. And my
 clients are naive so that asking them to save in exotic formats
 is likely to be unproductive. 

 Open Office does not save as xml. Abiword, however does. In a
   
hm, open offices uses xml as storage format, just save in oo format and 
unzip the file and you will end up with xml files

(however, the xml is typical office xml, complete with tab elements that 
spoil the idea)
 One question: How do I mix in the necessary Context commands such
 as papersize, font selection etc.? What are the rules and no-nos
 for blending Context commands into an xml document?
   
just set up a style 

Hans 

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Ugly hack for multiple MSWord docs.

2006-06-15 Thread John R. Culleton
On Thursday 15 June 2006 13:55, Hans Hagen wrote:
 John R. Culleton wrote:
  On Thursday 15 June 2006 08:50, Hans Hagen wrote:
  John R. Culleton wrote:
  Someday there will be an elegant solution to the MSWord to
  Context problem. For now there is my ugly hack as described here.
 
  maybe the word xml output, since that can be parsed
 
  Hans
 
  Interesting suggestion. I don't have a copy of MSWord. And my
  clients are naive so that asking them to save in exotic formats
  is likely to be unproductive.
 
  Open Office does not save as xml. Abiword, however does. In a

 hm, open offices uses xml as storage format, just save in oo format and
 unzip the file and you will end up with xml files

 (however, the xml is typical office xml, complete with tab elements that
 spoil the idea)

The abiword xml is neat and parsimonious thus:

--

!DOCTYPE book PUBLIC -//OASIS//DTD DocBook XML V4.2//EN
http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd;

book
!-- 

 
--
!-- This DocBook file was created by AbiWord.  
--
!-- AbiWord is a free, Open Source word processor. 
   --
!-- You may obtain more information about AbiWord at www.abisource.com 
   
--
!-- 

 
--


chapter
title/title
section role=unnumbered
title/title
paraNow is the time for all good men./para
/section
/chapter
/book


The Open Office file unzipped is a lot more verbose and  a lot
less readable. There are five files in fact. The file content.xml
will in fact compile correctly via texexec and yield the expected
result. The character count in that file alone is three times
that of the corresponding Abiword xml output shown above.  

The experiments continue...
-- 
John Culleton
Books with answers to marketing and publishing questions:
http://wexfordpress.com/tex/shortlist.pdf

Book coaches, consultants and packagers:
http://wexfordpress.com/tex/packagers.pdf

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context