Re: Migrating documentation from HTML files

Vincent Siveton Tue, 04 Mar 2008 16:01:10 -0800

2008/3/4, Lukas Theussl <[EMAIL PROTECTED]>:
> Ehm, yes, sorry, I talked quicker than I thought. Of course, the parser
>  is an xml parser so it will cough up any tags that are not properly
>  closed. So it has to be xhtml. You can use tools like htmltidy [1] to
>  convert html to xhtml.
>
>  Btw, Vincent just added a simple tool to do document translations with
>  doxia: http://svn.apache.org/viewvc?view=rev&revision=633328
>  Feel free to test and comment! :)


You need to use the entire trunk for this.

I guess it will be easy to patch the converter with jtidy to support
html as an input format. Patches are welcome :)

Cheers,

Vincent

>  Cheers,
>  -Lukas
>
>  [1] http://tidy.sourceforge.net/
>
>
>
>  Cristóbal Fandiño wrote:
>  > Output latex2html produces no XHTML code. For example:
>  >
>  > HTML
>  > ==========
>  > <LINK REL="STYLESHEET" HREF="embebidos.css">
>  >
>  > XhtmlParser
>  > ==========
>  > org.apache.maven.doxia.parser.ParseException: Error parsing the model: end
>  > tag name </HEAD> must be the same as start tag <LINK> from line 19
>  > (position: TEXT seen ...<LINK REL="STYLESHEET"
>  > HREF="embebidos.css">\n\n</HEAD>...
>  > @21:8)
>  >     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
>  > AbstractXmlParser.java:57)
>  >
>  >
>  > HTML
>  > ==========
>  > <H2><A NAME="SECTION00221000000000000000"></A>
>  > <A NAME="74"></A>
>  > <BR>
>  > Grupos de usuarios
>  > </H2>
>  >
>  > XhtmlParser
>  > ==========
>  > org.apache.maven.doxia.parser.ParseException: Error parsing the model: end
>  > tag name </H2> must be the same as start tag <BR> from line 119 (position:
>  > TEXT seen ...<BR>\nGrupos de usuarios\n</H2>... @121:6)
>  >     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
>  > AbstractXmlParser.java:57)
>  >
>  >
>  > XhtmlParser
>  > ==========
>  > org.apache.maven.doxia.parser.ParseException: Error parsing the model:
>  > attribute value must start with quotation or apostrophe not 3 (position:
>  > TEXT seen ...<A NAME="91"></A>\n<TABLE CELLPADDING=3... @171:21)
>  >     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
>  > AbstractXmlParser.java:57)
>  >
>  > ... and far more
>  >
>  >
>  > 2008/3/3, Lukas Theussl <[EMAIL PROTECTED]>:
>  >
>  >>doxia doesn't have a latex parser (I'd like to have one too!),
>  >>latex2html is the only solution I can think of (there exist other latex
>  >>translators though but that's the only one I know). I am not sure what
>  >>kind of output latex2html produces, however, the difference HTML - xhtml
>  >>shouldn't matter here. What kind of exceptions do you get? Maybe you
>  >>could attach an example file at jira [1] with a snippet of your code so
>  >>we can try to reproce the problem?
>  >>
>  >>-Lukas
>  >>
>  >>[1] http://jira.codehaus.org/browse/DOXIA
>  >>
>  >>
>  >>krycho fandino wrote:
>  >>
>  >>>Thanks for your help, however my HTML files isn't XHTML and XhtmlParser
>  >>>throws a lot of exceptions. Perhaps, I should convert these HTML files
>  >>
>  >>to
>  >>
>  >>>XHTML format, but I've a lot of pages and should be a hard task.
>  >>>
>  >>>Really, I has generated these HTML files using latex2html conversion
>  >>
>  >>tool. I
>  >>
>  >>>don't know how I could transform latex files to some markup languages
>  >>>supported by doxia (apt or xdoc). Could you give me some advice?
>  >>>
>  >>>
>  >>>2008/3/2, Lukas Theussl <[EMAIL PROTECTED]>:
>  >>>
>  >>>
>  >>>>If you use the current development branch of doxia (beta-1-SNAPSHOT)
>  >>>>then this should work rather well for simple html files. However, you
>  >>>>will probably loose a lot of information if you have anything fancy (eg
>  >>>>special layout, tables, figures are not well supported), don't expect it
>  >>>>to be perfect. In particular if you have figures you might try to
>  >>>>translate to xdoc instead of apt (use XdocSink), that should work
>  >>
>  >>better.
>  >>
>  >>>>Cheers,
>  >>>>
>  >>>>-Lukas
>  >>>>
>  >>>>
>  >>>>
>  >>>>Vincent Siveton wrote:
>  >>>>
>  >>>>
>  >>>>>Hi,
>  >>>>>
>  >>>>>Frankly, I never test your use case.
>  >>>>>
>  >>>>>But I guess that you need to have an XHTML file in input with no
>  >>>>>header, footer or navbar something to the div bodyColumn in [1].
>  >>>>>
>  >>>>>The snippet should be something like the following:
>  >>>>>
>  >>>>>File f = new File( "blabla.html" );
>  >>>>>XhtmlParser parser = new XhtmlParser();
>  >>>>>StringWriter output = new StringWriter();
>  >>>>>Sink sink = new AptSink( output );
>  >>>>>parser.parse( new FileReader( f ), output );
>  >>>>>
>  >>>>>Output will contain APT declaration.
>  >>>>>
>  >>>>>HTH,
>  >>>>>
>  >>>>>Vincent
>  >>>>>
>  >>>>>[1] http://maven.apache.org/doxia/
>  >>>>>
>  >>>>>2008/3/1, krycho fandino <[EMAIL PROTECTED]>:
>  >>>>>
>  >>>>>
>  >>>>>
>  >>>>>>I'm a newbie using doxia. I've a lot of documentation in HTML format
>  >>
>  >>an
>  >>
>  >>>>I'd
>  >>>>
>  >>>>
>  >>>>>>like convert these files to apt format. Is there some way to transform
>  >>>>>>easily? I want to create a maven site for my project and, right now, I
>  >>>>
>  >>>>only
>  >>>>
>  >>>>
>  >>>>>>have this documentation in HTML format without css styles nor menu.
>  >>>>>>
>  >>>>>>Could you help me? Very thanks
>  >>>>>>Cristóbal
>  >>>>>
>  >>
>  >
>

Re: Migrating documentation from HTML files

Reply via email to