Hello everyb., --begin FUD I've been away for quite a long time after it became obvious for me that my 256Mb machine is unable to handle Docbook compilation process for XCHM and after a little dissatisfaction with overall systems@ functioning. --end FUD
Please CC me the discussion (if I'll succeed to fire any) as was not able to stand against content related commits for manual. I think it is a big progress to see PHPDOC announcement on the first page of PHP.NET pointed by the people asking me where to get latest "alpha" of XCHM. =) I am still at a distance from PHP development for now, but I would like to share some ideas of mine about what should be done with PHPDOC to make documentation process more transparent and easy. 1. DocBook --> PHPBook While DocBook offers a great set of ready to use doc building system for customization I would prefer to start from a small and limited set of tags and minimum of templates to build up later template functionality using some kind of CookBook. The process should include not only selection of the set, but documentation of its tags from the point of PHPDOC developer. Like DocBook definitive guide reference [1], but specifically aimed at PHP community with tags easily comprehended by people without publishing experience at all. Moreover this documentation should contain basic design principles of output elements - how tags should be rendered - to also be a reference for people, who design templates and will support them in the future. [1]: http://www.docbook.org/tdg5/en/html/pt02.html "DocBook: The Definitive Guide" CookBook with XSL techniques is essential in this process. It must contain recipes on how to achieve features the most projects expect from DocBook. Namely: - cross-linking with validation and additional information from the link-end - pagination - table of content - footnotes - additional pages based on document structure (examples, title page, credits) Clear definition of these features and recipes will allow to see how could we map semantically based DocBook or PHPBook tags to presentation and how could we provide backward transition, which is even more important as most of us would like to use convenient web-interface with familiar wiki markup (with different meaning though) for editing PHP Manual. DocBook developers may recall something similar to the CookBook desired [2] [2]: http://www.sagehill.net/docbookxsl/ "DocBook XSL: The Complete Guide" 2. Web-Interface for PHPBook I would like to say that I know exactly how to make it work. Unfortunately, no, but I don't see any obstacles to make this unfeasible either. Unless nobody is interested or no one has enough time or ... to contribute. So, the interface conception.. 2.1 Version control As I said above Web-Interface is the mapping from rather presentational nature of wiki system to semantic XML source of PHPBook. How could it be possible? The first problem is CVS. To keep it short it is enough to say that CVS does not integrate with security concept of Apache to make it useless. To make it work through CVS we would need to implement CVS client in PHP to connect to CVS server and commit to it using explicitly entered user credentials. Sounds bad? It is. --begin FUD CVS is is so buggy and complicated that I easily can imagine why it is not developed anymore. --end FUD What are alternatives? We still need our XML documentation to be present in version control system. We can't use SQL database, because without database users won't be able to grab sources and build offline versions of manual - it will be difficult to test and develop. It is not a solution to generate different formats online as this could consume a big amount of resources, is not easy to debug and eventually we will come up with a very complicated piece of code - there is no need to broke that we already have. So for version system an ideal solution for XML backend would just CVS alternative and that is SVN . Why SVN? It perfectly integrates with Apache security concept, it is easier to use (esp. for beginners), works via 80 or 443 port (more accessible), more simple and supports many other Good Things (tm) besides being supported itself. The only reason to stick with CVS is familiar branching/tagging, but we do not use it anyway. If CVS is convenient for PHP Core hackers it is absolutely awful for plain PHP users (or potential contributors to make it more weighty). To conclude. We need XML as backend to maintain integrity and transparency of manual building process (compatibility or better say smooth transition). We need to maintain documents in SVN for revision control and make modifications from web with already authenticated developers accounts. Now we need to decide how to turn back the process and save changes from once generated PHP web site back to XML on SVN. 2.2 Backend Duality While SVN is in all cases good system it is not as fast and efficient as Apache when it comes to serving content for end users. Using SVN as direct backend will have a great impact of performance considering the fact we need to apply additional transformation rules and check cross-links at this time (if we aim at editing manual online - we need to check links online). So we need cache, but thinking twice cache is not enough - we still need the complexity of transformation and integrity check. Simple cache is the approach of Google Code where wiki pages are backed up in SVN and there is no evidence of any integrity checking tools except for "deprecated pages" filter in a wiki interface. Google Code wiki uses presentational markup while we need structural markup only. Here is where it becomes complicated. We have forward documentation processing toolchain which allows us to get convenient web presentation from XML source. Now we need to build backward documentation processing toolchain which allow us to transform modifications of this web presentation back again into XML form. I.e. Forward toolchain: XML ----(xslt)----> HTML ----(php)-----> PHPWEB Backward Toolchain: XML <---(php)---- DIFF <----(php)---- PHPWEB At first sight the problem might seem impossible - restore initial XML format from generated HTML. There are good chances that essential information about XML structure is lost during forward transformation and HTML parsing process is obscure and error-prone itself. Even if we would generate perfect XHTML, XSLT transformation is slow and resource hungry. So, the task must be simplified to a degree when changes to both chains can be made incrementally, transparently and simultaneously. Let me show how would look simple paragraph to paragraph mapping. To make it possible to match output paragraph back again to the XML source we need to find what information is missing to unambiguously identify the part of XML it was originated from. So each HTML paragraph belongs to XML paragraph and XML paragraph has a clear XPath. We could either embed this path into paragraph comment in generated manual or use common convention to get XML ID from output elements without explicit comment. Another consideration is that we do not need to generate HTML code - we could use intermediate format, which could be easily parsed on the server side to produce HTML (just like wiki) and at the same transformed back again to XML source even if modified. To be able to patch XML we will need some kind of diff engine. That engine will match structure of source document with structure of modified document in intermediate format, find modified, new and deleted nodes, provide checks for validity of the nodes. For rapid integrity checking (if added link or function is actually available in manual), we need some kind of database just to contain all available links and their linkends. The same technique may be applied to other parts of manual that are not easy to map between PHPWEB and XML. After paragraph to paragraph mapping is complete, the system could be extended with additional transformation mappings, but to make it extensible and maintainable it is very desirable to keep system architecture clear and in sync with every obstacle or workaround. To survive complexity architecture of the project should be suitable for restructurizing, rewriting and iterative development. 3. Final KISS Without proper organization of the work the whole project is impossible or very hard to maintain. If we want to make documentation editing process more convenient and easy, if we want to attract contributors - we need to increase SIMPLICITY and reduce CONFUSION. SIMPLICITY At last I propose to use SCONS for build automation. It is written in Python. Ahhh.. I can only imagine the kind of feeling some of us could experience after these words. =) But then ask yourself - why do we use Perl for our build system? You may say that we use Autotools not Perl, but does that really matters - are autotools simple? are they cross-platform? are they easy to customize? I am not defending SCONS in any way - I just want to say that if we have a choice - it is better to use tools that are easier for everyone - not just for Linux/Unix hackers. Although it was a chance for me to learn "autotools" and "make" quirks I want to stress that our audience are manual editors with a little or no experience in that stuff. CONFUSION Look here - http://news.php.net/group.php?group=php.doc.chm It is our beautiful list dedicated to xCHM development. Before Gabor left we have already agreed that it should be closed and a new list PHPDOC-TOOLS created to discuss everything concerned DOC.PHP.NET, Livedocs, manual building tools and all additional instrumentation. http://beeblex.com/lists/index.php/php.doc.chm/1260?s= After the post we had asked systems@ to do the switch, but unfortunately no action was taken. Such dead or spammed lists confuse users and contributors and make them think that the whole project is dead. I do not feel like sticking with that disgusting stuff, but there is nothing I could do except annoying systems@ to a degree of adding me into their black list of very poisonous people. The proposal is the same - the summary: - kill long dead PHP-DOC-CHM spam list http://news.php.net/group.php?group=php.doc.chm - create new PHPDOC-TOOLS mailing list for PHPDOC instrumentary development (leaving PHPDOC to manual writers and editors) - route CVS commits to instrumentation parts of CVS tree to new list - fight spam in new list by using existing filters from PHPDOC list or work out alternative scheme like using Google Group for discussions In the end the phrase "preserve the environment" that made TikiWiki monster possible in this case of PHPDOC would sound like "clean up the environment". 4. Conclusion At first the idea was simple - to develop simplified DocBook-like XML format with as less tags as possible for pagination and cross-referencing specialized for PHP Manual, but it turned out that we need a little bit more. I thought about making ordinary PHP+SQL web-app to work with custom and simple wiki-like, but structural format for editing of manual, but complicated PHP and SQL system will experience a lack of visibility to gain appropriate level of support from specialized open source folks we've got here. With a lot of PHP code it eventually will be a problem to distinguish between various parts or components of application. Using combination of technologies we could draw clear borders to make various system parts separated from each other without sacrificing flexibility and control over the whole. That's all. Please CC me if you have any comments as I am not directly subscribed to the list. -- --t.
