On Sat, 19 Jul 2008, Steve Litt wrote:
Believe me, it's not easy at all. RTF is much too wierd. The style
identifier occurs in the middle of a complex string. The text to which
to apply it occurs at the end, but there's no reasonable, consistant way
to identify where the markup ends and the content begins.
You could do the identification of where and what to replace manually, but
instead of changing manually, you create a command in a script that does
the change. Maybe you wouldn't use a regexp-relace in this case.
To be specific, I'm suggesting separate commands/lines in the script that
only does a simle/concrete replacment. Perhaps even one command for each
replacement...
The advantage is that you can re-run the script, or run it partially, if
you later get into problems.
And more importantly, start over in case you later in the process
discover a problem with the sequence of regexp-replacements you've
used.
I used a series of files, each of which contains one tweak type, so that
shouldn't happen. At the end of each tweak type I verify that it still
loads and looks right in MS Word.
You could also use version control and commit after each tweak type.
Writing a program would have been an excellent idea, but RTF is MUCH too
wierd to write that program in anything resembling a reasonable
timeframe.
I don't think I'm suggesting a program in that sense (at the most the
script would be a hack..:-). However, if you really wanted to do to a
program, I'd start with something that's capable of parsing RTF.
Btw... googling for 'converting rtf to html css'
http://www.google.com/search?hl=en&client=opera&rls=en&hs=1CL&q=converting+rtf+to+html+css&btnG=Search
gives some results that look useful, but I guess they might all remove
your styles :-(
Actually, you probably want 'convert rtf to xml'. Googling gave this link:
http://www.rtf-to-xml.com/
where they claim:
RTF TO XML is a handy solution to convert your RTF documents
(created, for example, in Microsoft® Word) into custom XML
formats, preserving their appearance and internal structure.
/Christian
--
Christian Ridderström, +46-8-768 39 44 http://www.md.kth.se/~chr