Alan wrote:

* Daniel Fagerstrom <[EMAIL PROTECTED]> [2004-02-25 15:49]:


Why Cocoon rocks for publishing
-------------------------------

Cocoon is based on three great ideas: XML-adaptors, XML-pipelines and the sitemap. Here we will discuss the first two.

If you have N different input formats and M output formats you need N*M converers for converting from every input format to every output format. This complexity can be reduced to N+M by finding a standard format...


Having a common format (XML) also makes it worthwhile to write tools that use that format booth as input and output (e.g. XSLT), and we can use the pipes and filter pattern to build complex transformations in terms of smaller specialized, reusable filters.


Dataflow in (web)apps ---------------------


and for (web)apps:


[Input format (user) -> Output format (storage)] -> webapp -> [Input format (storage) -> Output format]


This is how I've built by LAMP applications. The first thing is to develop a
    database. Then everything goes into the database before it comes back out.
    Even if the application only keeps session data, I build a database. It is
    a matter of course.

    What other ways are there of handing data? Does anyone keep things in
    memory, for simply regurgiate the input in their applications?

If so, then there are two pipeline designs, a input / output pipeline pair
and a pass-through pipeline.


As we can see publishing has one conversion step and (web)apps has two. In [1] I talked about input and output pipelines for the two conversion steps.


I'd like to expand on this, currently Cocoon treats storage as a filter.
    Things like the SQLTransformer filter streams to store data then onward to
    a serializer.

    What you are proposing is a pipeline that that terminates not at a
    serializer, but at something else, that somehow stores the XML. Then it
    kicks off a new pipeline that terminates at a serializer.

Yes, that can either be done in flowscript by something like processPipelineTo[...], something like processPipelineToModifyableSource(pipe, source, args). ANother possibility is a new store sitemap component, but as you know by now, new sitemap components is a quite controversial subject ;)



To my mind, rather than have this parameter things in the site map, I'd much rather have everything kept as XML.

Session information, for exmaple. I am going to use Momento to keep a
session document, and then skip that name/value pair nonsense.

That's the idea, session document sounds like a good name.


    Somewhere in my CForms pipelines, I transform input into an XUpdate
    statement and build a sub document in a Momento document. Then I can
    aggregate or cinclude that session document.


Comparing input and output pipelines, the input handling have one main source of extra complexity: we cannot trust user input. We need to check that the input is correct and take different action dependent on that, so as a consequence control structure becomes more complicated when we have user input.


A further reason for detailed control of user input is that while the output
tend go from strongly typed data (db:s, Java etc) to loosely typed data; in
presentation most things are strings. Input tend to have the opposite
requirement, from strings to typed data.


Okay, here is the strongly typed part of it, my apologies, I understand now.

Strongly typed data, but first...

    Your solution is nice, except that it your N+M is missing something now.
    There are N different input formats, M different output formats, and of
    course, S different storage formats.

The general idea is that sources and modifiable sources describe places. A generator reads a certain format from a source (place) and converts it to XML. A serializer converts from XML to a specified format that in turn could be fed into a modifiable source (a place). So the output and the storage format are the same, but we have I+M+N+S, where I, S are the number of input sources and outpu“t source respectively, instead of I*M*N*S if we where to write components that go from input source to output source in one step.


    Consider a e-mail account user registration form, first page they tell us
    who they are and choose a password. Second page, we ask them to choose
    which junk newsletters they want to recieve.

    When the information arrives and becomes XML. Now maybe I want to put the
    XML in three different storage areas. Say I want to store the username and
    password in an LDAP directory, the user's profile and such in an
    relational database, and the fact that the user is now on the second page
    of the registration wizzard as session information.

I think it is easy enough to validate and construct strongly typed data once
    the input is an XML format. You can use XML Schema, Relax NG, and such to
    validate information in the pipeline, then transform it to XUpdate or
    ebXML, or SQL statements, to feed to an XML consumer.

For form input, CForms provides validation of form entries in a way that is
interactive and assoicates mistakes with the source widget in the
interface. If you were to offer a web service however, you would have to
have a way to validate XML that would return an error document of a
different nature, thus Relax NG, Schematron, etc.


(You go on to say this yourself. Good. I'll snip it but I agree.)



Is Cocoon that great for input handling?
----------------------------------------

We see that the situation for input handling have become quite similar to that for output: many input formats and many output formats. But in contrast to the output scenario we have no common design patterns for handling the complexity.


And this makes it very difficult for new users like myself. New users seem to
    get the pipeline concepts quickly, and then stumble on the various input
    concepts. Such has been the case for me. I've been very creative in the
    page genration part of my web site (http://engrm.com), using fop, mutiple
    transforms, cinclude, aggregation. I still have only written one example
    CForm application, however.

    If anything pipelines in will  be easy to teach once people understand
    pipelines out.

Absolutely, we need a common design pattern for how to build Cocoon applications. That will make it easier for new Cocoon users and it will lead to more reusable components for webapps.



In some cases we have components that converts directly from input format to storage format. In other cases we use a format between input and storage, but this format can be a hashmap, java beans, the Woody widget hierarchy or XML in form of DOM or SAX. In some of the cases we also have validation mechanisms for the middle format.

This lack of a common accepted pattern for input handling leads to: less reuse, multiple components that does similar things and a lack of a common focal point. An example of this is the discussion about Cocoon/relational database coupling: we have multiple ways to go from RDBs to XML, but no components for the opposite direction...


I better jump into this discussion then. I've considered a language that would
    express a database document as xml, and a tool that would compare that
    document to the database only updating what is necessary.

    <xd:document xmlns="http://engrm.com/schema/2004/02/rosetta";>
      <xd:record table="employee">
        <xd:column name="employee-id" key="primary">007</xd:column>
        <xd:column name="first-name">James</xd:column>
        <xd:column name="last-name">Bond</xd:column>
        <xd:record table="employee-department">
          <xd:record table="department">
            <xd:column name="department-id">mi6</xd:column>
            <xd:column name="department-name"
                       >Secret Intelligence Service</xd:column>
          </xd:record>
        </xd:record>
      </xd:record>
    </xd:document>

    The above document would have all the information necessary to update a
    database where:

empoyee employee-department ----------- ------------------- department
employee-id <---> employee-id ---------------
first-name department-id <---> department-id
last-name department-name


If it isn't enough, I suppose you add some form of functional programming or
    direct execution of SQL statements.

I sugest something like that in http://marc.theaimsgroup.com/?t=107762798300006&r=1&w=2, it is based on XML-DBMS, http://www.rpbourret.com/xmldbms/index.htm.




The solution ;)
---------------

IMO we have an obvious solution to this situation rigth before our eyes: adapt the patterns that we allready use for output handling, i.e. adaptors and pipelines, to input handling as well. To do this we must decide about a common format. The candidates are: hashmaps, Java beans, Woody widget hierarchy and XML.


I vote for XML. At this point, a person can use Cocoon as a publishing
    platform without adding Java. Please keep it that way.

Cocoon output is like a delta. Cocoon input should be like a funnel.

    Rather than runing towards a serializer from a generator, input should run
    from a deserializer to a consumer.

    For my purposes, I'd like to have all input filter into an XML Cocoon
    pipeline that funnels everything into a transform that produces XUpdate
    fit for consumption by Momento.


I think that using XML has _huge_ advantages:

* Cocoon is an XML based framework and use XML as internal format allmost everywhere. When one use the Woody widget hierarchy one have to translate back and forth between XML and Woody all the time which as least IMO is a waist of time.

* XML is standardized, and there are an enormous amount of tools that use it. For Woody widgets, we have to do everything ourselves.

* There are well designed schemas for XML: XML Schema, and if you don't like that: Relax-NG. As the rest of the XML world use XML data types we get an impedance mismatch between the Woody data types and XML.


Yes. Yes. Yes.


What does this mean in practice?
--------------------------------

This far I have, (fairly strongly I supose ;) ), sugested that we should use XML as the standardized internal format for all input handling in Cocoon...


Untyped XML is not enough, so we also need typed XML. Here I consider a DOM with a schema atached to it, so that one can [re]validate the DOM, ask the nodes and the leaves if they are valid and what datatype they have and also access valid leaves in terms of the corresponding Java data type. I think something like this should be possible to build by combining a DOM implementation, e.g. Xerces, with Sun Multi Schema Validator (MSV) and XSDLIB [2].


To make DOM easy to use within flowscripts it would be nice to write Rhino binding code (scriptable object) so that one can use the Ecma script API for DOM. It is also a good idea to use a DOM implementation that implements DOM events, so that one can write flowscript code in the same style as client side JS.


I would like to understand what this DOM versus SAX stuff is about. Do you
want to input data base editing DOM? I'd much rather express an update
statement in XUpdate.

No I don't want to edit a DB other the DOM api. For webapps you typically have to allow the "session document" to be partly invalid and incomplete. For more fine grained access to the session store under a user interaction I think DOM is good, with a ECMA interface to the DOM tree it would also fit quite well in flowscripts. For more transactional stuff DOM write is of course bad.


/Daniel



Reply via email to