HTML Output for LyX

Richard Heck Thu, 30 Apr 2009 07:14:13 -0700

Hi, Alex. This is going to seem critical, but it is going to end upbeing constructive. See below.

What's more problematic, to my mind, is that the framework isn't extensible.
Yes, I know that the programmer can add support for new layout types, etc,
but as things now stand even simple layouts like Theorem aren't supported.
Yes, those could be added. But LyX itself is extensible, as LaTeX is, and
that's a crucial thing about it. If I define some new character styles, or
new layouts of some other sort, then they're not handled and *can't* be
handled without making eLyXer aware of what the LaTeX commands that define
these styles mean. But then we're back to writing a LaTeX to HTML converter.
Similarly, I think it will be a challenge to provide proper support for math
macros, let alone for things people do using ERT.


The framework is extensible in several respects. The first one is of
course with code. But a much better way only requires CSS. Unknown
layouts do not generate errors, but new HTML <div> classes. Adding

support in the CSS is trivial.

This isn't quite right. You're assuming that whatever needs to be doneto render the new environment can be done in CSS. Even for theoremenvironments, this is non-trivial. I suppose you can use advanced stuff,like the content tag and counters, but browser support for these isstill in its infancy.


And that is a very simple case. Just consider the Endnotes module.

Similarly, unknown insets generate
<span> tags with a characteristic span. As to new macros, LaTeX
commands and ERT, they are not supported at this point.

Right. Which means that people who want to use eLyXer for output arerestricted to a subset of LyX's functionality. And they are going tocontinue to be, since there is no way of solving this problem short ofwriting or borrowing a LaTeX to HTML converter. That is why I don'tsupport including it within LyX. Anything that is part of LyX ought tosupport LyX's full functionality, or at least something close to it.

If you would be so kind as to send me a sample I would be delighted to
help make theorems work.

See below again. But you can easily to create a LyX document with sometheorem environments, if you want to continue with this approach.

Similarly, at present, so far as I can tell, there is no support for BibTeX.
Is that right? In my conversions, I get little raised numbers, but they
don't link to anything.

BibTex is not working at the moment. Again, a sample would be appreciated.

You can easily create a LyX document with some BibTeX. And if you wantto work on this, then you can probably use the python-bibtex package toparse the files. Figuring out how the bibliography is supposed to berendered will be more difficult, though maybe there's not so much of aneed to render it precisely as BibTeX would. Or maybe you could(optionally?) use the bbl file. But see below again.

Also, crossrefs appear as little arrows, which is nice, except that you
don't get the corresponding text, which makes things like "In [arrow], we
will discuss..." hard to read.


A single-pass converter cannot possibly output the actual number of
the linked section, or the text -- it will first find the reference
and a few kb later (once that part of the document is parsed) the
label. A second pass would be required to make labels work properly,
and the second pass is still in the works. But this should work,
eventually.

There are other problems, too. We have to keep track of which countersare "linked" and which ones are supposed to be reset when. You couldmaybe try (optionally?) using the aux file, which of course has alreadydealt with all of that. But see below again.

Let me just be clear about the nature of these criticisms. It may well be
that eLyXer will be a good program for use by people whose LyX files are
very basic, don't use much math, don't define custom styles, and the like.
If so, then so; there are plenty of people for whom that is plenty good
enough. But if this program is going to be included in LyX, then, in my
opinion, it needs to handle the LyX format, pretty much without exception.
Yes, there may be some special cases it doesn't quite handle, but, mostly,
it should do what LyX does, and it absolutely needs to handle math cleanly.
Right now, and with all due respect, it doesn't come close.


Fair enough. My views are much more simplistic: LyX currently doesn't
do what 99.9% of users need, which is output to different formats

including HTML and/or something importable from within Word.

LyX will always output to plain text, and that's readily importable inWord. If one wants to output to a format that preserves a good bit ofthe formatting, then latex2rtf does a fine job, so long as you don'thave too much math, etc. (I've used that for collaboration myself, so Iknow.) Properly configured, which is apparently a challenge on someoperating systems, htlatex will do excellent conversion both to HTML andto ODT, and plastex does a very good job converting to HTML, though withsome limits, including the fact that all the math is little pictures(though it does handle cross-references and BibTeX nicely). So there arelots of options. None of that means the world can't use a bettermousetrap. See below.

Thus many, many people don't know about an otherwise wonderful editor. The
needs of these users (and of the majority of current users IMHO) would
be adequately served by an HTML export tool which generates anything
that does not look like garbage. Nobody is volunteering to write a
tool that does what you want, so LyX could very well use what eLyXer
offers. But perhaps as you say integration is not such a good idea.
Separate packaging and distribution might be an enabler for people,
especially if the most popular versions (Windows, Debian) do a joint
distribution.

See below.

My own view, for what it's worth, is that there is no stable way to go here
except to have LyX output the LaTeX, which it does well, and then convert
that. Otherwise, I don't see how you will ever get proper handling of math
macros, character styles, new layouts, and the like. And if I were going to
work on that, then I'd work on plastex, which seems to me generally to do a
pretty good job. At the very least, one can use the plastex parser and write
a new output routine.

Good luck with that. The problem is orders of magnitude harder than a

LyX to HTML converter.

That depends how much of LyX's source you care to convert. If you wantto handle custom styles, then the problem is the same. Which was my point.

So the question is: What do we have to do if we're going to get reallygood HTML output for more than fairly simple LyX files, let alone forLyX's full functionality? I think there is now fairly widespreadagreement that the answer is: You have to do it within LyX itself, i.e.,in the C++ source, where we actually have access to the information weneed. And once you start to think in those terms, then I think itbecomes completely obvious that this is the way to go. If HTML is anoutput format, then the layout files themselves can contain appropriateinformation about how custom styles should be output as HTML, and indeedeven about how standard insets should be output. If Footnote gets outputas a span, then that can be configured in the InsetLayout for Footnoteand even overridden by the user. E.g.:

   InsetLayout Footnote
   ...
      HTMLType       BeginEnd
      HTMLBegin      <span class="footnote">
      HTMLEnd         </span>
      HTMLPreamble
           .footnote { ... }
      EndPreamble
   ...
   EndLayout
or something along those lines. Similarly:
   Style Section
   ...
      HTMLType       BeginEnd
      HTMLBegin      <h2>
      HTMLEnd         </h2>
   ...
   End

Which can of course vary for different classes. E.g., you might want itthat way in a book, but <h1> in an article.

Getting something workable that does as much as eLyXer now does would bepretty easy, because we already have access to the complete structure ofthe document. Lots of the output code could almost be cut and paste fromthe other output routines. The challenge will be to get good renderingof the math. Addressing other issues, like file splitting, would takesome work, but not too much. Note that we can even get a good TOC thisway. Dealing with cross-references and BibTeX becomes easy, too, becausewe have all the information we need ready to hand. (Of course, therewill be issues, but you get my point.)


Alex, do you know C++? I'd be happy to help with this, once exams are over.

rh

HTML Output for LyX

Reply via email to