[PHP-DOC] Whoa! News..Time for a little bit planning of architecture?

techtonik Thu, 08 Feb 2007 02:19:07 -0800

Hello everyb.,

--begin FUD
I've been away for quite a long time after it became obvious for me
that my 256Mb machine is unable to handle Docbook compilation process
for XCHM and after a little dissatisfaction with overall systems@
functioning.
--end FUD


Please CC me the discussion (if I'll succeed to fire any) as was not
able to stand against content related commits for manual.

I think it is a big progress to see PHPDOC announcement on the first
page of PHP.NET pointed by the people asking me where to get latest
"alpha" of XCHM. =) I am still at a distance from PHP development for
now, but I would like to share some ideas of mine about what should be
done with PHPDOC to make documentation process more transparent and
easy.


1. DocBook --> PHPBook

While DocBook offers a great set of ready to use doc building system
for customization I would prefer to start from a small and limited set
of tags and minimum of templates to build up later template
functionality using some kind of CookBook. The process should include
not only selection of the set, but documentation of its tags from the
point of PHPDOC developer. Like DocBook definitive guide reference
[1], but specifically aimed at PHP community with tags easily
comprehended by people without publishing experience at all. Moreover
this documentation should contain basic design principles of output
elements - how tags should be rendered - to also be a reference for
people, who design templates and will support them in the future.

[1]: http://www.docbook.org/tdg5/en/html/pt02.html    "DocBook: The
Definitive Guide"

CookBook with XSL techniques is essential in this process. It must
contain recipes on how to achieve features the most projects expect
from DocBook. Namely:
- cross-linking with validation and additional information from the link-end
- pagination
- table of content
- footnotes
- additional pages based on document structure (examples, title page, credits)

Clear definition of these features and recipes will allow to see how
could we map semantically based DocBook or PHPBook tags to
presentation and how could we provide backward transition, which is
even more important as most of us would like to use convenient
web-interface with familiar wiki markup (with different meaning
though) for editing PHP Manual.

DocBook developers may recall something similar to the CookBook desired [2]

[2]: http://www.sagehill.net/docbookxsl/    "DocBook XSL: The Complete Guide"


2. Web-Interface for PHPBook

I would like to say that I know exactly how to make it work.
Unfortunately, no, but I don't see any obstacles to make this
unfeasible either. Unless nobody is interested or no one has enough
time or ... to contribute. So, the interface conception..

2.1 Version control

As I said above Web-Interface is the mapping from rather
presentational nature of wiki system to semantic XML source of
PHPBook. How could it be possible? The first problem is CVS. To keep
it short it is enough to say that CVS does not integrate with security
concept of Apache to make it useless. To make it work through CVS we
would need to implement CVS client in PHP to connect to CVS server and
commit to it using explicitly entered user credentials. Sounds bad? It
is.

--begin FUD
CVS is is so buggy and complicated that I easily can imagine why it is
not developed anymore.
--end FUD

What are alternatives? We still need our XML documentation to be
present in version control system. We can't use SQL database, because
without database users won't be able to grab sources and build offline
versions of manual - it will be difficult to test and develop. It is
not a solution to generate different formats online as this could
consume a big amount of resources, is not easy to debug and eventually
we will come up with a very complicated piece of code - there is no
need to broke that we already have. So for version system an ideal
solution for XML backend would just CVS alternative and that is SVN .

Why SVN? It perfectly integrates with Apache security concept, it is
easier to use (esp. for beginners), works via 80 or 443 port (more
accessible), more simple and supports many other Good Things (tm)
besides being supported itself. The only reason to stick with CVS is
familiar branching/tagging, but we do not use it anyway. If CVS is
convenient for PHP Core hackers it is absolutely awful for plain PHP
users (or potential contributors to make it more weighty).

To conclude. We need XML as backend to maintain integrity and
transparency of manual building process (compatibility or better say
smooth transition). We need to maintain documents in SVN for revision
control and make modifications from web with already authenticated
developers accounts. Now we need to decide how to turn back the
process and save changes from once generated PHP web site back to XML
on SVN.

2.2 Backend Duality

While SVN is in all cases good system it is not as fast and efficient
as Apache when it comes to serving content for end users. Using SVN as
direct backend will have a great impact of performance considering the
fact we need to apply additional transformation rules and check
cross-links at this time (if we aim at editing manual online - we need
to check links online). So we need cache, but thinking twice cache is
not enough - we still need the complexity of transformation and
integrity check. Simple cache is the approach of Google Code where
wiki pages are backed up in SVN and there is no evidence of any
integrity checking tools except for "deprecated pages" filter in a
wiki interface. Google Code wiki uses presentational markup while we
need structural markup only.

Here is where it becomes complicated.
We have forward documentation processing toolchain which allows us to
get convenient web presentation from XML source. Now we need to build
backward documentation processing toolchain which allow us to
transform modifications of this web presentation back again into XML
form. I.e.

Forward toolchain:
XML ----(xslt)----> HTML ----(php)-----> PHPWEB
Backward Toolchain:
XML <---(php)---- DIFF <----(php)---- PHPWEB

At first sight the problem might seem impossible - restore initial XML
format from generated HTML. There are good chances that essential
information about XML structure is lost during forward transformation
and HTML parsing process is obscure and error-prone itself. Even if we
would generate perfect XHTML, XSLT transformation is slow and resource
hungry. So, the task must be simplified to a degree when changes to
both chains can be made incrementally, transparently and
simultaneously. Let me show how would look simple paragraph to
paragraph mapping.

To make it possible to match output paragraph back again to the XML
source we need to find what information is missing to unambiguously
identify the part of XML it was originated from. So each HTML
paragraph belongs to XML paragraph and XML paragraph has a clear
XPath. We could either embed this path into paragraph comment in
generated manual or use common convention to get XML ID from output
elements without explicit comment.

Another consideration is that we do not need to generate HTML code -
we could use intermediate format, which could be easily parsed on the
server side to produce HTML (just like wiki) and at the same
transformed back again to XML source even if modified.

To be able to patch XML we will need some kind of diff engine. That
engine will match structure of source document with structure of
modified document in intermediate format, find modified, new and
deleted nodes, provide checks for validity of the nodes.

For rapid integrity checking (if added link or function is actually
available in manual), we need some kind of database just to contain
all available links and their linkends. The same technique may be
applied to other parts of manual that are not easy to map between
PHPWEB and XML.

After paragraph to paragraph mapping is complete, the system could be
extended with additional transformation mappings, but to make it
extensible and maintainable it is very desirable to keep system
architecture clear and in sync with every obstacle or workaround. To
survive complexity architecture of the project should be suitable for
restructurizing, rewriting and iterative development.


3. Final KISS

Without proper organization of the work the whole project is
impossible or very hard to maintain. If we want to make documentation
editing process more convenient and easy, if we want to attract
contributors - we need to increase SIMPLICITY and reduce CONFUSION.

SIMPLICITY

At last I propose to use SCONS for build automation. It is written in Python.
Ahhh.. I can only imagine the kind of feeling some of us could
experience after these words. =) But then ask yourself - why do we use
Perl for our build system? You may say that we use Autotools not Perl,
but does that really matters - are autotools simple? are they
cross-platform? are they easy to customize?

I am not defending SCONS in any way - I just want to say that if we
have a choice - it is better to use tools that are easier for everyone
- not just for Linux/Unix hackers. Although it was a chance for me to
learn "autotools" and "make" quirks I want to stress that our audience
are manual editors with a little or no experience in that stuff.

CONFUSION

Look here - http://news.php.net/group.php?group=php.doc.chm
It is our beautiful list dedicated to xCHM development. Before Gabor
left we have already agreed that it should be closed and a new list
PHPDOC-TOOLS created to discuss everything concerned DOC.PHP.NET,
Livedocs, manual building tools and all additional instrumentation.
http://beeblex.com/lists/index.php/php.doc.chm/1260?s=
After the post we had asked systems@ to do the switch, but
unfortunately no action was taken.

Such dead or spammed lists confuse users and contributors and make
them think that the whole project is dead. I do not feel like sticking
with that disgusting stuff, but there is nothing I could do except
annoying systems@ to a degree of adding me into their black list of
very poisonous people.

The proposal is the same - the summary:
- kill long dead PHP-DOC-CHM spam list
http://news.php.net/group.php?group=php.doc.chm

- create new PHPDOC-TOOLS mailing list for PHPDOC instrumentary
development (leaving PHPDOC to manual writers and editors)

- route CVS commits to instrumentation parts of CVS tree to new list

- fight spam in new list by using existing filters from PHPDOC list
or work out alternative scheme like using Google Group for discussions


In the end the phrase "preserve the environment" that made TikiWiki
monster possible in this case of PHPDOC would sound like "clean up the
environment".


4. Conclusion

At first the idea was simple - to develop simplified DocBook-like XML
format with as less tags as possible for pagination and
cross-referencing specialized for PHP Manual, but it turned out that
we need a little bit more. I thought about making ordinary PHP+SQL
web-app to work with custom and simple wiki-like, but structural
format for editing of manual, but complicated PHP and SQL system will
experience a lack of visibility to gain appropriate level of support
from specialized open source folks we've got here. With a lot of PHP
code it eventually will be a problem to distinguish between various
parts or components of application. Using combination of technologies
we could draw clear borders to make various system parts separated
from each other without sacrificing flexibility and control over the
whole.


That's all.
Please CC me if you have any comments as I am not directly subscribed
to the list.

--
--t.

[PHP-DOC] Whoa! News..Time for a little bit planning of architecture?

Reply via email to