[midgard] Document parsing, was Re: Midgard Question: creating links

Henri Bergius Wed, 12 Jan 2000 11:39:50 -0800
On 10 Jan, Alexander Bokovoy wrote:
> It seems that it should be done at low level of Midgard, in midgard-lib.
> Then we will have open system which operates by parsers just like PHP
> operates by external modules. The main difference will be in the
> language where parser's functions is accessible - it will be C source of
> parser. Thus, we has no headache with syntaxical and lexical analyzers.
> Instead, at the high level (for example, at PHP script) we will operate
> with queues of parsers but not with their functions. You may see parser
> at this level like single function "black box" that accepts information
> and transforms it into another format. Last parser in each queue thus
> will be one that outputs Net-wide format (either HTML, XML, etc, or, for
> example, PDF).

This is a very good idea. Also, it has something in 
common with the SusiSGML system Jukka and I worked 
on before Midgard.

SusiSGML was a hack on top of Apache that handled 
translating SGML files (of a specific DTD) into HTML 
(and applying visual outlook into them). The actual 
SGML files were stored  in the htdocs directory. 
When Apache got a query for a particular URI, it 
would then go to that path in the filesystem, look 
for the .html and .sgml file there, and check whether 
the SGML file was newer. If it was, then it Apache 
would translate it to HTML and save it as a .html 
file.

The initial translation was a bit slow, but after
that Apache would just use the HTML document, and
so normal browsing of the site wasn't affected.


Maybe a similar method could be applied with these
parsers? First thing would of course be that Midgard
would have to know to use all the parsers and converters
needed in the process. If this could be done via a
generic interface, it should be easy to write Midgard
support to different parsers, and also for third 
parties to provide these interfaces with their
parser software. I think this is the area that needs
most work here.

Then Midgard would need to know in what format was
the document saved as in Midgard's database, was
it SGML, XML, LaTeX, or something else. Based on
this, and the output format specified in PHP end
(HTML, WML, whatever) Midgard would then run the 
document through the needed parsers and save the 
output to a different field in the article table. 
After that Midgard could just use that stored 
information to serve further queries for the same 
document. Midgard 2 has a nice modification log, 
so it would be easy to check whether the document 
has changed since it was previously converted.

This way we would finally get a really powerful
and flexible publishing system that could work
with multiple formats for Midgard.

> Alexander Bokovoy 

/Bergie

-- 
-- Henri Bergius -- +358 40 525 1334 -- [EMAIL PROTECTED] --
               http://www.iki.fi/Henri.Bergius


--
This is The Midgard Project's mailing list. For more information,
please visit the project's web site at http://www.midgard-project.org

To unsubscribe the list, send an empty email message to address
[EMAIL PROTECTED]
[midgard] Document parsing, was Re: Midgard Question: creating links

Reply via email to