Re: Straghtforward XML export?

2012-05-17 Thread Nico Williams
FWIW, I've put up a github repo with my LyX->xml2rfc tool, though it's
still a work in progress: https://github.com/nicowilliams/lyx2rfc

BTW, I can't get "lyx -e lyxhtml ..." to work.  lyx -e xhtml does
work, but then there are some differences from the LyXHTML option in
the File->Export menu.  The differences appear to be confined to the
magicparlabel numbering, so I don't think I care; I mention this only
because it seems odd.

Thanks for all the help!

Nico
--


Re: Straghtforward XML export?

2012-05-12 Thread Nico Williams
Well, thanks lots for your help.   I have something that's very close.
 Close enough that I can now author I-Ds in LyX.  I've found one more
bug in the LyX XHTML output, and I filed a bug for it (bibitem anchor
generation is not working properly), and I can work around it.
Cheers!

Nico
--


Re: Straghtforward XML export?

2012-05-11 Thread Nico Williams
On Thu, May 10, 2012 at 10:09 AM, Richard Heck  wrote:
>> I don't know how to create a custom inset that does.. [...]
>
> Try putting this into Local Layout, under Document>Settings:

Excellent, that worked great.

> I guess if you want these as metadata, you should also add:
>    InTitle 1
> to each of them.

I want them as metadata in my XSLT stylesheet's output, so it sufficed
without the InTitle bit.

Thanks!

Nico
--


Re: Straghtforward XML export?

2012-05-10 Thread Nico Williams
On Thu, May 10, 2012 at 8:27 PM, Richard Heck  wrote:
> On 05/10/2012 04:52 PM, Nico Williams wrote:
>> On Thu, May 10, 2012 at 3:31 PM, Richard Heck  wrote:
>> Here's a LyX snippet:
>
> OK, I see the problem. The vertical space gets moved, for reasons
> that probably aren't very interesting. Can you file a bug about this on
> trac? I can fix it, but it will take a little thought about how best to do
> it.

Filed http://www.lyx.org/trac/ticket/8154

Thanks.

>> FYI, right now I'm struggling with how to transform h2, h3, h4
>> elements into nested section elements; [...]
>>
> It could be done in LyX, but I guess I'd suggest pre-processing the
> whole thing with some kind of script. It shouldn't be too hard to do.
> Find h1, write a start tag; when you see another h1, write the end tag
> for the first one; etc.

I've figured out how to handle this with XSLT 2.0.  Here's a snippet:





















The key is the << operator (here encoded, so <<).  The right
operand had to be stored in a variable because there's no other way
(that I could find!) to refer to the node I wanted to there.

That took a lot of effort to work out.  Much more than I'd wanted to.
And it requires XSLT 2.0.  But it works and it's not terribly
inelegant -- more elegant than any robust script to do the same, most
likely.

>> [I'm guessing that LyX's XHTML output is not stable, but I can cope,
>> provided I find a way to transform those h elements into nested
>> sections.]
>>
> It's generally stable, but of course under development. Mostly, I want
> it to be as modular and customizable as possible, in which case we can
> all make it do what we want.

Great.  Thanks so much for your work and your help!

Nico
--


Re: Straghtforward XML export?

2012-05-10 Thread Richard Heck

On 05/10/2012 04:52 PM, Nico Williams wrote:

On Thu, May 10, 2012 at 3:31 PM, Richard Heck  wrote:

Actually, it looks like this got fixed a while ago. In a simple text
document I get:

I'm running LyX 2.0.0.  The vspace I had was in an author inset, FWIW.
  The output you show is certainly fine.


If you want to post a simple example file that does the wrong thing, please
do.

Here's a LyX snippet:

\begin_layout Standard
A paragraph.
\begin_inset VSpace defskip
\end_inset

  Text after a vspace.
\end_layout


OK, I see the problem. The vertical space gets moved, for reasons
that probably aren't very interesting. Can you file a bug about this on
trac? I can fix it, but it will take a little thought about how best to 
do it.



FYI, right now I'm struggling with how to transform h2, h3, h4
elements into nested section elements; this seems very difficult to do
in XSLT 1.0, but I'm still exploring ideas, including XSLT 2.0.  (This
actually seems like a common problem, some recipes for which I do find
online and in books, but no solutions general enough.)  Of course, the
way LyX represents sections/subsections/subsubsections internally is
exactly the same as in its XHTML output, and it'd be asking a lot to
ask for LyX to wrap section contents in a div -- if I can do this with
XSLT you might be able to incorporate that solution as an option in
LyX, say.


It could be done in LyX, but I guess I'd suggest pre-processing the
whole thing with some kind of script. It shouldn't be too hard to do.
Find h1, write a start tag; when you see another h1, write the end tag
for the first one; etc.


[I'm guessing that LyX's XHTML output is not stable, but I can cope,
provided I find a way to transform those h elements into nested
sections.]


It's generally stable, but of course under development. Mostly, I want
it to be as modular and customizable as possible, in which case we can
all make it do what we want.

Richard



Re: Straghtforward XML export?

2012-05-10 Thread Nico Williams
On Thu, May 10, 2012 at 3:31 PM, Richard Heck  wrote:
> Actually, it looks like this got fixed a while ago. In a simple text
> document I get:

I'm running LyX 2.0.0.  The vspace I had was in an author inset, FWIW.
 The output you show is certainly fine.

> If you want to post a simple example file that does the wrong thing, please
> do.

Here's a LyX snippet:

\begin_layout Standard
A paragraph.
\begin_inset VSpace defskip
\end_inset

 Text after a vspace.
\end_layout

FYI, right now I'm struggling with how to transform h2, h3, h4
elements into nested section elements; this seems very difficult to do
in XSLT 1.0, but I'm still exploring ideas, including XSLT 2.0.  (This
actually seems like a common problem, some recipes for which I do find
online and in books, but no solutions general enough.)  Of course, the
way LyX represents sections/subsections/subsubsections internally is
exactly the same as in its XHTML output, and it'd be asking a lot to
ask for LyX to wrap section contents in a div -- if I can do this with
XSLT you might be able to incorporate that solution as an option in
LyX, say.

[I'm guessing that LyX's XHTML output is not stable, but I can cope,
provided I find a way to transform those h elements into nested
sections.]

Nico
--


Re: Straghtforward XML export?

2012-05-10 Thread Richard Heck

On 05/10/2012 11:52 AM, Nico Williams wrote:

On Thu, May 10, 2012 at 10:02 AM, Richard Heck  wrote:

On 05/09/2012 02:29 AM, Nico Williams wrote:

[Actually, I'm noticing one problem with LyXHTML: it doesn't preserve
vertical spacing in any way, not even as horizontal spacing!  I'm
talking about Insert->Formatting->Vertical Space.  I suspect that
there are other such things that aren't preserved.  For now I'll live.
  Vertical space is useful for multi-paragraph list items, which are
very common in RFCs and Internet-Drafts.  If need be I suspect I can
write a patch and submit it.]

I basically didn't know what to do with the vspace stuff, the issue being
that HTML in a way just doesn't have that kind of concept. But if you have an
idea, please let me know, and I'll be happy to put it in.

Ah, good point.  Hmmm, could you use?  Or
maybe an XML entity that gets defined into a newline but with a
processor could replace with an element?

Actually, it looks like this got fixed a while ago. In a simple text 
document I get:




this











that.

If you want to post a simple example file that does the wrong thing, 
please do.



Richard




Re: Straghtforward XML export?

2012-05-10 Thread Nico Williams
On Thu, May 10, 2012 at 10:02 AM, Richard Heck  wrote:
> On 05/09/2012 02:29 AM, Nico Williams wrote:
>>> [Actually, I'm noticing one problem with LyXHTML: it doesn't preserve
>>> vertical spacing in any way, not even as horizontal spacing!  I'm
>>> talking about Insert->Formatting->Vertical Space.  I suspect that
>>> there are other such things that aren't preserved.  For now I'll live.
>>>  Vertical space is useful for multi-paragraph list items, which are
>>> very common in RFCs and Internet-Drafts.  If need be I suspect I can
>>> write a patch and submit it.]
>
> I basically didn't know what to do with the vspace stuff, the issue being
> that
> HTML in a way just doesn't have that kind of concept. But if you have an
> idea,
> please let me know, and I'll be happy to put it in.

Ah, good point.  Hmmm, could you use ?  Or
maybe an XML entity that gets defined into a newline but with a
processor could replace with an element?

Nico
--


Re: Straghtforward XML export?

2012-05-10 Thread Richard Heck

On 05/09/2012 02:14 AM, Nico Williams wrote:

On Tue, May 8, 2012 at 10:58 PM, Richard Heck  wrote:

On 05/08/2012 07:30 PM, Nico Williams wrote:

LyXHTML looks very promising.  It certainly preserves everything I
have in my [admittedly small] test file.  If it preserves custom inset
names then I could probably use custom insets to provide the
additional metadata I need (I still haven't quite figured out how to
create custom insets, but give me time).  XSLT can do the rest.


It will do with custom insets whatever you ask it to do. If I remember
correctly, it defaults to something like:

or an equivalent span, depending upon whether its a charstyle or a
flex inset.

Excellent.  I've got an XSLT stylesheet in the works that does what I want.

I don't know how to create a custom inset that does.. nothing much
except have a custom inset name.  Specifically I need variants of the
Author inset to represent the metadata I need (author organization,
e-mail address, and postal address).  With that I'd be set.


Try putting this into Local Layout, under Document>Settings:

Format 31

InsetLayout Flex:MyInset
LyXType Custom
End

InsetLayout Flex:MyInsets
LyXType Custom
HTMLTag mytag
End

You can specify more if you wish, but that gets you started. (As LaTeX, 
these export as normal text.)


I guess if you want these as metadata, you should also add:
InTitle 1
to each of them.

Richard



Re: Straghtforward XML export?

2012-05-10 Thread Richard Heck

On 05/09/2012 02:29 AM, Nico Williams wrote:

[Actually, I'm noticing one problem with LyXHTML: it doesn't preserve
vertical spacing in any way, not even as horizontal spacing!  I'm
talking about Insert->Formatting->Vertical Space.  I suspect that
there are other such things that aren't preserved.  For now I'll live.
  Vertical space is useful for multi-paragraph list items, which are
very common in RFCs and Internet-Drafts.  If need be I suspect I can
write a patch and submit it.]

Found a solution to that: a nest list with no bulleting/numbering is
rendered as a single  withs for the nested list elements,
which works out perfectly.  No doubt the vspace loss will come up
elsewhere, but for now it's fine.

I basically didn't know what to do with the vspace stuff, the issue 
being that
HTML in a way just doesn't have that kind of concept. But if you have an 
idea,

please let me know, and I'll be happy to put it in.

Richard



Re: Straghtforward XML export?

2012-05-08 Thread Nico Williams
> [Actually, I'm noticing one problem with LyXHTML: it doesn't preserve
> vertical spacing in any way, not even as horizontal spacing!  I'm
> talking about Insert->Formatting->Vertical Space.  I suspect that
> there are other such things that aren't preserved.  For now I'll live.
>  Vertical space is useful for multi-paragraph list items, which are
> very common in RFCs and Internet-Drafts.  If need be I suspect I can
> write a patch and submit it.]

Found a solution to that: a nest list with no bulleting/numbering is
rendered as a single  with s for the nested list elements,
which works out perfectly.  No doubt the vspace loss will come up
elsewhere, but for now it's fine.


Re: Straghtforward XML export?

2012-05-08 Thread Nico Williams
On Tue, May 8, 2012 at 10:58 PM, Richard Heck  wrote:
> On 05/08/2012 07:30 PM, Nico Williams wrote:
>> LyXHTML looks very promising.  It certainly preserves everything I
>> have in my [admittedly small] test file.  If it preserves custom inset
>> names then I could probably use custom insets to provide the
>> additional metadata I need (I still haven't quite figured out how to
>> create custom insets, but give me time).  XSLT can do the rest.
>>
> It will do with custom insets whatever you ask it to do. If I remember
> correctly, it defaults to something like:
> 
> or an equivalent span, depending upon whether its a charstyle or a
> flex inset.

Excellent.  I've got an XSLT stylesheet in the works that does what I want.

I don't know how to create a custom inset that does.. nothing much
except have a custom inset name.  Specifically I need variants of the
Author inset to represent the metadata I need (author organization,
e-mail address, and postal address).  With that I'd be set.

> In principle, you can also tell the LyXHTML output to use some other
> tag than div or span. This is all customized in the layout files, as is
> explained in the bits on XHTML in the customization manual. So I'm
> guessing that you could get quite a long way towards XML simply in
> that sort of way.

The divs are fine.  I can address them just fine with XPath, so I'm
quite happy.  If the LyXHTML schema changes radically I'll just have
to re-write the XSLT stylesheet I'm writing now, but as long as no
metadata is lost I'll be fine.

Eventually I'll probably want to develop a layout and class for
actually dealing with RFCs directly in LyX.  The typesetting rules for
RFCs are... trivial in comparison to most other layouts.  But I
confess knowing nothing about LaTeX, so it will be sometime before I
get there.  For now I'm just happy -ecstatic even- to just consume
LyXHTML with XSLT.

[Actually, I'm noticing one problem with LyXHTML: it doesn't preserve
vertical spacing in any way, not even as horizontal spacing!  I'm
talking about Insert->Formatting->Vertical Space.  I suspect that
there are other such things that aren't preserved.  For now I'll live.
 Vertical space is useful for multi-paragraph list items, which are
very common in RFCs and Internet-Drafts.  If need be I suspect I can
write a patch and submit it.]

Thanks for your help.  Sorry to need so much handholding, I'm out of
my element here,

Nico
--


Re: Straghtforward XML export?

2012-05-08 Thread Richard Heck

On 05/08/2012 07:30 PM, Nico Williams wrote:

On Tue, May 8, 2012 at 12:40 AM, Guenter Milde  wrote:

So how about XHTML as starting point for your XSLT transformations?

LyXHTML looks very promising.  It certainly preserves everything I
have in my [admittedly small] test file.  If it preserves custom inset
names then I could probably use custom insets to provide the
additional metadata I need (I still haven't quite figured out how to
create custom insets, but give me time).  XSLT can do the rest.


It will do with custom insets whatever you ask it to do. If I remember
correctly, it defaults to something like:

or an equivalent span, depending upon whether its a charstyle or a
flex inset.

In principle, you can also tell the LyXHTML output to use some other
tag than div or span. This is all customized in the layout files, as is
explained in the bits on XHTML in the customization manual. So I'm
guessing that you could get quite a long way towards XML simply in
that sort of way.

Richard



Otherwise, you could use the native XHTML formatter as a model for adding
"native XML" output.

Indeed, I think that would be a good last resort.

Ideally there'd be a terribly straightforward LyXML, but LyXHTML looks
manageable.


Another starting point would be the external "elyxer" tool: a Python
package that takes a LyX file and converts it to XHTML.
http://elyxer.nongnu.org/

That looks pretty good too.  That's a lot of realistic options.  Thanks again,

Nico
--




Re: Straghtforward XML export?

2012-05-08 Thread Nico Williams
On Tue, May 8, 2012 at 12:40 AM, Guenter Milde  wrote:
> So how about XHTML as starting point for your XSLT transformations?

LyXHTML looks very promising.  It certainly preserves everything I
have in my [admittedly small] test file.  If it preserves custom inset
names then I could probably use custom insets to provide the
additional metadata I need (I still haven't quite figured out how to
create custom insets, but give me time).  XSLT can do the rest.

> Otherwise, you could use the native XHTML formatter as a model for adding
> "native XML" output.

Indeed, I think that would be a good last resort.

Ideally there'd be a terribly straightforward LyXML, but LyXHTML looks
manageable.

> Another starting point would be the external "elyxer" tool: a Python
> package that takes a LyX file and converts it to XHTML.
> http://elyxer.nongnu.org/

That looks pretty good too.  That's a lot of realistic options.  Thanks again,

Nico
--


Re: Straghtforward XML export?

2012-05-07 Thread Nico Williams
On Tue, May 8, 2012 at 12:40 AM, Guenter Milde  wrote:
> So how about XHTML as starting point for your XSLT transformations?
>
> Otherwise, you could use the native XHTML formatter as a model for adding
> "native XML" output.
>
> Another starting point would be the external "elyxer" tool: a Python
> package that takes a LyX file and converts it to XHTML.
> http://elyxer.nongnu.org/

Ah, those are good ideas.  I'll take a look.  Thanks!


Re: Straghtforward XML export?

2012-05-07 Thread Guenter Milde
On 2012-05-07, Nico Williams wrote:

> [-- Type: text/plain, Encoding:  --]

> No, i got that. I don't actually care for docbook. I want a straightforward
> translation to XML that preserves all data and metadata. If I need a
> specific schema I can always use XSLT to get output in that form.

So how about XHTML as starting point for your XSLT transformations?

Otherwise, you could use the native XHTML formatter as a model for adding
"native XML" output.

Another starting point would be the external "elyxer" tool: a Python
package that takes a LyX file and converts it to XHTML. 
http://elyxer.nongnu.org/

Günter



Re: Straghtforward XML export?

2012-05-07 Thread Nico Williams
Is there canonical documentation of the LyX file format?  I can't find
it...  I did find this: http://wiki.lyx.org/Devel/LyXFileFormat , but
that's just a changelog.  There's nothing else obvious in
http://wiki.lyx.org/Devel/ ...  The development/FORMAT file in the
source tree is also a changelog.

Nico
--


Re: Straghtforward XML export?

2012-05-07 Thread Nico Williams
On Mon, May 7, 2012 at 12:56 PM, Nico Williams  wrote:
> Ah, that works.  Thanks!  I'll take a look and see if the native
> DocBook export works for me.

Nope, it still doesn't allow more than one author in docbook, though
it does merge all the authors listed in the LyX document source.


Re: Straghtforward XML export?

2012-05-07 Thread Nico Williams
No, i got that. I don't actually care for docbook. I want a straightforward
translation to XML that preserves all data and metadata. If I need a
specific schema I can always use XSLT to get output in that form.

Nico
--


Re: Straghtforward XML export?

2012-05-07 Thread Pavel Sanda
Nico Williams wrote:
> On Mon, May 7, 2012 at 12:07 PM, Pavel Sanda  wrote:
> > Nico Williams wrote:
> >> How does LyX represent documents internally?  If it does it in an
> >> objectified form then it should be fairly straightforward to walk the
> >> document tree and emit XML, no?  Or, looking at .lyx files, maybe it
> >> should be possible to script a simple LyX->XML conversion has
> >> anyone tried this before?
> >
> > we miss someone who knows docbook/sgml/xml rather well and would like to 
> > help
> > to bring lyx output more up-to-date or at least clearly state what needs to 
> > be done.
> > http://article.gmane.org/gmane.editors.lyx.devel/119220
> 
> Lookingat LyX's format, it seems like translating to XML using a
> LyX-specific schema should be utterly straightforward.  For example,
> something like this:

heh, you didn't get the point ;) to sumarize:

- lyx already produce docbook xml. but in older format.

- people spend lot of time to write quite complex web guides how to setup
  things and fix issues for new docbook format but never share their wisdom
  with lyx developers. either in contribution to lyx documentation or in
  stating what needs to be changed in lyx output.

- no lyx dev seems to be motivated to study docbook xml so although we think
  that the upgrade would be simple, until we know what exactly should change,
  things will stay as they are now :)

pavel


Re: Straghtforward XML export?

2012-05-07 Thread Nico Williams
On Mon, May 7, 2012 at 12:41 PM, Pavel Sanda  wrote:
> Nico Williams wrote:
>> This I hadn't seen.  One thing to note is that the LyX I'm running (on
>> Ubuntu) has no option to save as or export to SGML or DocBook.  I
>> gather from the link you gave me that SGML and Docbook are natively
>> supported export formats, so I guess Ubuntu's build must be lacking
>> that feature.  Is that correct?
>
> export items depend on software you have installed, in case of docbook
> sgml-tools are needed. not using it i can't say much more, but it seems
> that your question are answered in the older link.

Ah, that works.  Thanks!  I'll take a look and see if the native
DocBook export works for me.


Re: Straghtforward XML export?

2012-05-07 Thread Nico Williams
On Mon, May 7, 2012 at 12:07 PM, Pavel Sanda  wrote:
> Nico Williams wrote:
>> How does LyX represent documents internally?  If it does it in an
>> objectified form then it should be fairly straightforward to walk the
>> document tree and emit XML, no?  Or, looking at .lyx files, maybe it
>> should be possible to script a simple LyX->XML conversion has
>> anyone tried this before?
>
> we miss someone who knows docbook/sgml/xml rather well and would like to help
> to bring lyx output more up-to-date or at least clearly state what needs to 
> be done.
> http://article.gmane.org/gmane.editors.lyx.devel/119220

Lookingat LyX's format, it seems like translating to XML using a
LyX-specific schema should be utterly straightforward.  For example,
something like this:


\lyxformat 413
\begin_document
\begin_header
\textclass article
...
\end_header

\begin_body

\begin_layout Title
Some Doc
\end_layout

\begin_layout Author
Joe Sixpack
\begin_inset VSpace defskip
\end_inset

Sixpack Corp.
\end_layout

\begin_layout Abstract
Foo bar baz blah blah.
\end_layout

\begin_layout Abstract
Two paragrap abstract, eh?
\end_layout

...

should translate into:


Some Doc
Joe SixpackSixpack Corp.
Foo bar baz blah blah.
Two paragrap abstract, eh?
...


Translating insets and layouts into XML elements and attributes seems
relatively straightforward.  Translating directives seems
straightforward also.  Now, note that the two paragraph abstract would
be translated into two  elements, but an XSLT stylesheet
could easily translate that into:

..

A straightforward LyX->XML translation seems like the best approach to
LyX->XML translation because translation to any other schemas can then
be done via XSLT.

Nico
--


Re: Straghtforward XML export?

2012-05-07 Thread Pavel Sanda
Nico Williams wrote:
> This I hadn't seen.  One thing to note is that the LyX I'm running (on
> Ubuntu) has no option to save as or export to SGML or DocBook.  I
> gather from the link you gave me that SGML and Docbook are natively
> supported export formats, so I guess Ubuntu's build must be lacking
> that feature.  Is that correct?

export items depend on software you have installed, in case of docbook
sgml-tools are needed. not using it i can't say much more, but it seems
that your question are answered in the older link.
p


Re: Straghtforward XML export?

2012-05-07 Thread Nico Williams
On Mon, May 7, 2012 at 12:07 PM, Pavel Sanda  wrote:
> Nico Williams wrote:
>> The LaTeX->XML tools I've tried leave me... sad.  They tend to drop
>> some things.  For example: vertical space, which becomes a simple
>> newline in a paragraph's text.  It would be better to translate
>> vertical space into  elements -- that'd be much, much more
>> useful in XSLT than embedded newlines!
>>
>> So I'm wondering: why couldn't LyX export to XML using a native schema
>> that preserves as much LyX markup as possible, indeed, if not all of
>> it?
>
> google says:
> http://bgu.perso.libertysurf.fr/doc/db4lyx/

I did see that link when I was researching this.  It's very out of date.

> http://www.neomantic.com/tutorials/lyx-and-docbookXML/

This I hadn't seen.  One thing to note is that the LyX I'm running (on
Ubuntu) has no option to save as or export to SGML or DocBook.  I
gather from the link you gave me that SGML and Docbook are natively
supported export formats, so I guess Ubuntu's build must be lacking
that feature.  Is that correct?

Nico
--


Re: Straghtforward XML export?

2012-05-07 Thread Pavel Sanda
Nico Williams wrote:
> The LaTeX->XML tools I've tried leave me... sad.  They tend to drop
> some things.  For example: vertical space, which becomes a simple
> newline in a paragraph's text.  It would be better to translate
> vertical space into  elements -- that'd be much, much more
> useful in XSLT than embedded newlines!
> 
> So I'm wondering: why couldn't LyX export to XML using a native schema
> that preserves as much LyX markup as possible, indeed, if not all of
> it?

google says:
http://bgu.perso.libertysurf.fr/doc/db4lyx/
http://www.neomantic.com/tutorials/lyx-and-docbookXML/

> How does LyX represent documents internally?  If it does it in an
> objectified form then it should be fairly straightforward to walk the
> document tree and emit XML, no?  Or, looking at .lyx files, maybe it
> should be possible to script a simple LyX->XML conversion has
> anyone tried this before?

we miss someone who knows docbook/sgml/xml rather well and would like to help
to bring lyx output more up-to-date or at least clearly state what needs to be 
done. 
http://article.gmane.org/gmane.editors.lyx.devel/119220
p