As some of you are aware, the Apache Infrastructure team has mandated that
all projects move to the new svnpubsub process for publishing their websites
by the end of the year. Camel (as are all Confluence based sites) is
affected by this mandate. We currently use a multi-step rsync process
where Confluence exports the space to HTML, that gets rsynced to an area on
people.apache.org once an hour. A cron process in someone's crontab on
people.apache.org then runs to rsync it to the appropriate place
(/www/camel.apache.org) once an hour or so as well. Then, another rsync
will sync from there to the live sites. This causes a lot of delays in
publishing (can be a few hours between change and live), but also involves a
LOT of disk IO to sync things all over the place. The svnpubsub process is
a lot faster as the site changes are committed to svn and anything that
needs it (the live site) can listen for the changes and update immediately.
Anyway, over the last couple of months, a bit of work has been done with
various projects to start helping projects transition to svnpubsub. Joe
Schaefer has been working with the Maven folks so the "mvn site:deploy"
stuff can deploy via svnpubsub. Obviously the CMS uses it heavily.
We also now have a "solution" for Confluence based sites based on work I've
done for CXF's site. Confluence has a SOAP interface for retrieving
information and rendering pages. Well, if I see a SOAP interface (even a
crappy one like Confluence's)...... ;-)
Seriously, I have a program now that can render an entire confluence space
by using the SOAP API's and Velocity (which is what the current AutoExport
stuff uses, so migration is easy). However, it does more than that by also
recording the modified times, checking the RSS feed for changes first,
tracking {children} and {include} tags, etc... Thus, if you change a page
that is "included" in another (think about the "Book in One Page" page),
those pages will also get re-rendered. If you add/delete a page, any page
that uses the {children} tag to generate a tableof contents will
automatically re-render.
It ALSO cleans up the resulting HTML via tagsoup and some custom cleanup
code. The Confluence generated HTML is aweful with invalid attributes, bad
links, etc... They are now "mostly" cleaned up.
I've uploaded a "build" of the site to:
http://people.apache.org/~dkulp/camel/
so you can see that the result is pretty much identical to the live site.
A couple of things are actually better such as the image links for the blog
entries. Also, the new page actually validates with the w3c validator:
http://validator.w3.org/check?uri=http%3A%2F%2Fpeople.apache.org%2F~dkulp%2Fcamel%2F
To run this, a buildbot build will be setup to run the process once an hour
to generate new html if it detects a change (rss feed). Once run, you get a
commit message to the commits list and the changes are live immedately.
Thus, changes are now "at most" one hour till they are on the live site.
However, any commiter can checkout the stuff and run it manually if they
need/want things live immiately.
For CXF, this new process is now "live" (since Monday). I've filed a
ticket with INFRA to start the process for Camel. (requires a content area
in the web svn repo for the live content, then a buildbot build, then some
configs to make it all live) It's definitely still a "work in progress",
but it's a good start. For example, it doesn't track the blog/news entries
so currently if you add a blog (for a release), you would need to manually
trigger an Index page update. However, the code is there so it's something
we can add/enhance. I also want to update it to render pages in parallel if
possible to make it a bit quicker.
For camel, the main "pom" and scripts are in:
http://svn.apache.org/repos/asf/camel/website/
and there is a README there. The code for the stuff is grabbed via an
svn:externals to the area in CXF so I just need to update the code in one
place:
http://svn.apache.org/repos/asf/cxf/web/
I want to avoid "forking" the code as I *DO* know that if/when they move to
Confluence 4.x (currently on 3.4.x), it will need some updating as the SOAP
API's changes a bit.
Anyway, I'm hoping to have Camel flipped over by the end of the week or
early next week.
BTW: if you are interested, it has to render 644 pages for the full Camel
website. Takes about 15 minutes to do right now, but like I said, once
it's all setup, it can do incremental updates which is MUCH quicker. 644
pages is quite a bit.
--
Daniel Kulp
[email protected] - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com