Re: [Proposal] add DTDs to Apache website
Thanks to everyone who responded. The discussion seems to have gone quiet, so it is time to summarise. I am not going to call a Vote on this, rather just do it. If anyone thinks otherwise then say so. These are the original set of issues. I have added comments based on the discussion, and added the new issues that arose. Issues 6, 7, 13, 14 are where the action is. 1) The DTDs will still be managed in the Forrest CVS. Okay. 2) The Forrest build system is complex. It would be good to automate the publishing of DTD versions, but that may not be possible. 3) The Forrest website is built using the stable version of Forrest (currently v0.5.1). So how will DTDs from the current CVS (v0.6-dev) get into the website CVS [3]? Manual copy? See 4). 4) If some committer changes the DTDs in CVS then they will be out-of-sync. Will committers remember to do the manual copy? See 3). Issues 2, 3, and 4 are solved in one fell swoop by the use of .htaccess and mod_rewrite magic. See Issue 13 below. 5) Extra impact on the Apache webserver. Is this a big bandwidth consumer? See some estimates in [4]. No comments, so don't worry about it. 6) What will be the URLs for the DTDs? http://xml.apache.org/forrest/dtd/... We presume that Forrest will not be a top-level project soon. This is the big problem. The XML Project is talking about moving Forrest under the wings of Cocoon. So we should anticipate that and use cocoon URLs: http://cocoon.apache.org/forrest/dtd/... or even http://cocoon.apache.org/dtd/... See also new issues 12 and 13. 7) Does the Apache webserver deliver the supporting files using the appropriate Content-Type? Is that text/plain? Does it matter? We have *.dtd and *.mod and *.pen and *.ent extensions. Tests proved that there is a problem with some of these with the default webserver config. Never mind, .htaccess to the rescue. See Issue 13 below. Steven posted an RFC which might help us to decide. See the discussion, there are some outstanding questions: http://marc.theaimsgroup.com/?l=forrest-devm=107408183417874 8) We have two separate directories in CVS. The /dtd/ and the /entity/ directories are parallel. We can probably merge everything into /dtd/ and change the Catalogs. No comments. Merge them, because the entity sets are only used by DTDs anyway. 9) We will never know if the Catalog Entity Resolver gets broken after an upgrade. Forrest will still work but will be slower, doing downloads of the DTD and supporting files on each document parse. We can probably add a test document in the forrest seed site to detect failure. Do not use the seed site as a test mechanism. Add some proper tests. Not sure yet whether JUnit or Anteater. 10) Cocoon has a copy of the DTDs and stuff in its own CVS so that it can build its own documentation via its webapp and command-line builds. This can still continue, but needs a better solution. Perhaps Forrest can provide later. Okay. 11) Do we need to ask infrastructure@ about this proposal or just do it? No comments, so just do it. The new issues ... 12) With the last Forrest release, the System Identifiers were changed to be URIs instead of local paths, e.g. http://apache.org/forrest/dtd/... This was unfortunate because now we need to handle the fact that people will expect a DTD resource to be there. In the upcoming Forrest and Cocoon release we will change all default System Identifiers to be that adopted at Issue 6. 13) We are working on a .htaccess to rewrite the URLs to deliver the resources via ViewCVS. This still needs some work. See the previous message in this thread. 14) Test that it works. Yes, it does with XXE. Disable the Forrest catalog.xcat then change the System Identifier in one of your xdocs to point to http://cocoon.apache.org/dtd/document-v12.dtd Does it work with other tools? 15) It was suggested to do the same for XSD and RNG for the Sitemap. None published yet, so address this later. 16) It was suggested that we need a document to assist with configuration of local XML Tools. Here is a start: http://xml.apache.org/forrest/catalog.html 17) Need to make extra effort to ensure that we adhere to the proper naming convention for our DTDs and increment the version numbers for every change. --David
Re: [Proposal] add DTDs to Apache website
- Original Message - From: David Crossley [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, January 16, 2004 2:50 AM Subject: Re: [Proposal] add DTDs to Apache website Roger I Martin PhD wrote: There are schema definitions at C:\apache\cocoon-2.1\src\blocks\databases\samples\xsp\esql.xsd for esql C:\apache\cocoon-2.1\src\documentation\xdocs\drafts\sitemap-2.1-draft.xsd for *.xmap Are these to be organised with the DTDs of this proposal? No, there are no such plans. This proposal is only attempting to deal with the set of document DTDs. Why would those XSDs need to be on a website? For the same purpose as DTDs? Or moved into Cocoon so they can be utilized by tools? I was thinking about visualization and editing assistants. The sitemap XSD is draft and did not receive ongoing community support. It was followed a RELAX NG effort which suffered the same fate. So they remain in draft state.. --David
Re: [Proposal] add DTDs to Apache website
On 15.01.2004 05:05, David Crossley wrote: 9) We will never know if the Catalog Entity Resolver gets broken after an upgrade. Forrest will still work but will be slower, doing downloads of the DTD and supporting files on each document parse. We can probably add a test document in the forrest seed site to detect failure. A really bad argument against the proposal :) Of course a real test is the way to go here. I gather that you mean a good argument. That is why i listed that issue. It would be a bad thing if Forrest/Cocoon silently started doing network retrievals like the current Xerces-2.6 web.xml issue. No, I meant bad argument. Having the whole application as test case is nicht im Sinne des Erfinders - or in English: The application itself should just not serve as a test case. Forrest does now have a build test target which tries to build the forrest seed site. Are you suggesting that we would be better to re-instate the JUnit tests that Cocoon used to have? I am no expert, but i think that we need the test to actually be a part of the Forrest machinery so that when users create a new project then they get the test happening too. I imagined no special type of a test, only there should be a test. I don't know how you can control whether Xerces retrieves the DTDs from network or not. Joerg
Re: [Proposal] add DTDs to Apache website
On 15 Jan 2004, at 05:05, David Crossley wrote: Joerg Heinicke wrote: David Crossley wrote: 3) The Forrest website is built using the stable version of Forrest (currently v0.5.1). So how will DTDs from the current CVS (v0.6-dev) get into the website CVS [3]? Manual copy? See 4). 4) If some committer changes the DTDs in CVS then they will be out-of-sync. Will committers remember to do the manual copy? See 3). I don't see this problem. On the one hand there are the older files like document 0.10 or 0.11 that won't be touched, on the other hand 0.12 (or is it already old too?) which is developed at the moment. You can't make incompatible changes for one version, otherwise you will break possibly thousands of documents out there. So only extensions are possible. Absolutely. I think that i got a bit mixed up with whatever i was trying to say in item 4. We need proper version control and we have a naming convention for that. Forrest has been careful not to introduce any incompatible changes. However, i think that we need to be more careful about adding even new optional stuff. Every change should be a totally new DTD version. In conclusion: the update cycle must not be once per minute, but maybe once per day or only week. Now what about having a cron job running on the website server that checks out recent DTD versions? Forcing manual work that's critical and without much effort automatically doable sounds not that good. Good idea. Nicola Ken suggested something similar. Ok, a little more .htaccess magic: # First, proxy the content straight out of ViewCVS ProxyPass/forrest/ http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-forrest/src/core/ context/resources/schema/dtd/ ProxyPassReverse /forrest/ http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-forrest/src/core/ context/resources/schema/dtd/ # Now, since ViewCVS is pretty slow, make sure you cache it CacheEnable mem /forrest/ # for a day CacheDefaultExpire 86400 MCacheSize 4096 MCacheMaxObjectCount 100 MCacheMinObjectSize 1 MCacheMaxObjectSize 2048 # and in case your client is a good web citizen, tell the proxies down the road # to avoid calling us since we guarantee the content is fresh for a day ExpiresActive On ExpiresDefault access plus 1 day -- Stefano, who has been waiting for some 18 months for somebody else to come up with the idea of having forrest pregenerating the .htaccess file to do some sort of poor-man multichannel or content negotiation, but has lost hope so it's time to inject notion in the system.
Re: [Proposal] add DTDs to Apache website
Stefano Mazzocchi wrote: htaccess stuf/ Stefano, who has been waiting for some 18 months for somebody else to come up with the idea of having forrest pregenerating the .htaccess file to do some sort of poor-man multichannel or content negotiation, but has lost hope so it's time to inject notion in the system. Stefano, who forgot to add that request on Jira, bugzilla or something similar, because he realized that people is not able to read his mind :-)
Re: [Proposal] add DTDs to Apache website
There are schema definitions at C:\apache\cocoon-2.1\src\blocks\databases\samples\xsp\esql.xsd for esql C:\apache\cocoon-2.1\src\documentation\xdocs\drafts\sitemap-2.1-draft.xsd for *.xmap Are these to be organised with the DTDs of this proposal? - Original Message - From: Stefano Mazzocchi [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, January 15, 2004 10:01 AM Subject: Re: [Proposal] add DTDs to Apache website On 15 Jan 2004, at 05:05, David Crossley wrote: Joerg Heinicke wrote: David Crossley wrote: 3) The Forrest website is built using the stable version of Forrest (currently v0.5.1). So how will DTDs from the current CVS (v0.6-dev) get into the website CVS [3]? Manual copy? See 4). 4) If some committer changes the DTDs in CVS then they will be out-of-sync. Will committers remember to do the manual copy? See 3). I don't see this problem. On the one hand there are the older files like document 0.10 or 0.11 that won't be touched, on the other hand 0.12 (or is it already old too?) which is developed at the moment. You can't make incompatible changes for one version, otherwise you will break possibly thousands of documents out there. So only extensions are possible. Absolutely. I think that i got a bit mixed up with whatever i was trying to say in item 4. We need proper version control and we have a naming convention for that. Forrest has been careful not to introduce any incompatible changes. However, i think that we need to be more careful about adding even new optional stuff. Every change should be a totally new DTD version. In conclusion: the update cycle must not be once per minute, but maybe once per day or only week. Now what about having a cron job running on the website server that checks out recent DTD versions? Forcing manual work that's critical and without much effort automatically doable sounds not that good. Good idea. Nicola Ken suggested something similar. Ok, a little more .htaccess magic: # First, proxy the content straight out of ViewCVS ProxyPass/forrest/ http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-forrest/src/core/ context/resources/schema/dtd/ ProxyPassReverse /forrest/ http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-forrest/src/core/ context/resources/schema/dtd/ # Now, since ViewCVS is pretty slow, make sure you cache it CacheEnable mem /forrest/ # for a day CacheDefaultExpire 86400 MCacheSize 4096 MCacheMaxObjectCount 100 MCacheMinObjectSize 1 MCacheMaxObjectSize 2048 # and in case your client is a good web citizen, tell the proxies down the road # to avoid calling us since we guarantee the content is fresh for a day ExpiresActive On ExpiresDefault access plus 1 day -- Stefano, who has been waiting for some 18 months for somebody else to come up with the idea of having forrest pregenerating the .htaccess file to do some sort of poor-man multichannel or content negotiation, but has lost hope so it's time to inject notion in the system.
Re: [Proposal] add DTDs to Apache website
On 15 Jan 2004, at 16:15, Juan Jose Pablos wrote: Stefano Mazzocchi wrote: htaccess stuf/ Stefano, who has been waiting for some 18 months for somebody else to come up with the idea of having forrest pregenerating the .htaccess file to do some sort of poor-man multichannel or content negotiation, but has lost hope so it's time to inject notion in the system. Stefano, who forgot to add that request on Jira, bugzilla or something similar, because he realized that people is not able to read his mind :-) nono, you guys don't get it: it was a social experiment about cross pollination between the java/xml world and the httpd world. Forrest, by generating static stuff, is the closest thing to the original HTTPd mindset. All the fancy dynamic stuff didn't catch up over httpd, it was simply too painful to write a web application in C and there are so many modules that can be useful all over the place. The rest was modules that glued other languages, but moved away large chunks of the community. So much so that nowadays, very few web-app power users are also httpd power users, because they isolate themselves. Since forrest is now slowly taking over all apache.org web sites, this exposes this project to all sort of different mindsets, I wanted to see how long it would take for stronger httpd interaction to surface, but it didn't happen. It's not criticism to the forrest community, not at all. I would say it's criticism for those coming from a non-java/non-xml world: they failed to provide the input that might have shaped the project in such a way that would have pleased them more. Anyway, since David was ready to propose a massive URL change for DTDs I had to say something and the .htaccess magic is the way I would solve many forrest issues that are now solved with hacky client-side javascript. but, at the very end, I don't really care since i think that static pregeneration of web sites will (very slowly but constantly) die out: all web content will need some form of dynamism. But you need a bridge over this huge and nasty river. And this is what forrest is all about in my mind. -- Stefano.
Re: [Proposal] add DTDs to Apache website
Stefano, Stefano Mazzocchi wrote: nono, you guys don't get it: it was a social experiment about cross pollination between the java/xml world and the httpd world. Are you talking about forrest?, I did like the idea but I do not know what would be posible to do with .htaccess. To tell you the truth, I though that was only to do about usernames/passwd. but, at the very end, I don't really care since i think that static pregeneration of web sites will (very slowly but constantly) die out: all web content will need some form of dynamism. Still, there is a need, and possibly a lot of people with same needs. But you need a bridge over this huge and nasty river. And this is what forrest is all about in my mind. -- Stefano.
Re: [Proposal] add DTDs to Apache website
Stefano Mazzocchi wrote: David Crossley wrote: Dave Brondsema wrote: David Crossley wrote: 6) What will be the URLs for the DTDs? http://xml.apache.org/forrest/dtd/... We presume that Forrest will not be a top-level project soon. The forrest docs and forrest seed site docs (and likely many other projects' docs) have doctype lines as follows: !DOCTYPE document PUBLIC -//APACHE//DTD Documentation V1.2//EN http://apache.org/forrest/dtd/document-v12.dtd; The url is apache.org, not xml.apache.org The Forrest project only has control of the xml.apache.org/forrest/ space. So it must be there. All material must be in a project's website CVS and we do not have access to the top-level site CVS. put these two lines in a .htaccess file in site/forrest snip/ Thanks for the suggestion. However there is still the issue that we do not have a site/forrest/ directory and we do not have access to the site CVS to manage the .htaccess file. Can you, or someone else who has such powers, do it for us when the time comes? If we go ahead with the online DTDs proposal, then we should still put the DTDs at xml.apache.org/forrest/dtd/ and use the .htaccess trick to solve the problem of the incorrect apache.org/forrest/ URLs. -- Stefano, kind of worried that people don't know basic httpd skills Er, no need for sarcasm. Many thanks for the contribution. --David
Re: [Proposal] add DTDs to Apache website
Stefano Mazzocchi wrote: Juan Jose Pablos wrote: Stefano Mazzocchi wrote: htaccess stuf/ Stefano, who has been waiting for some 18 months for somebody else to come up with the idea of having forrest pregenerating the .htaccess file to do some sort of poor-man multichannel or content negotiation, but has lost hope so it's time to inject notion in the system. Stefano, who forgot to add that request on Jira, bugzilla or something similar, because he realized that people is not able to read his mind :-) nono, you guys don't get it: it was a social experiment about cross pollination between the java/xml world and the httpd world. snip/ Please do not discuss new topics inside Proposal threads. I seem to be the poor sucker that has to pull all this together and i do not have time to wade through other (albeit good) stuff when trying to summarise it. http://cocoon.apache.org/community/contrib.html#Contribution+Notes+and+Tips Item 4. --David
Re: [Proposal] add DTDs to Apache website
David Crossley wrote: 7) Does the Apache webserver deliver the supporting files using the appropriate Content-Type? Is that text/plain? Does it matter? We have *.dtd and *.mod and *.pen and *.ent extensions. This is what the webserver reports: *.dtd ... Content-Type: application/xml-dtd *.mod *.pen ... Content-Type: text/plain; charset=ISO-8859-1 Are those okay? --David
Re: [Proposal] add DTDs to Apache website
David Crossley wrote: David Crossley wrote: Marshall Roch wrote: snip/ You've convinced me that it's not worth the effort to get the DTDs online. See below. (...about configuration notes.) It is worth the effort. We get way too many questions from users. One of main reasons that we are getting questions, is that people try to view an XML doc with their web browser. Lo and behold it breaks. Then they blame us. I actually wonder if putting the DTDs online will even solve that. Are the browsers able to follow relative links from one part of the DTD to the next? For example: faq-v12.dtd ... document-v12.mod ... faq-v12.mod ... common-charents-v10.mod ISO*.pen (5 separate files) I tried a test by putting some of the DTDs at http://www.apache.org/~crossley/dtd-test/ Please try the test doc in that directory faq.xml using your favourite XML editor and your browser. For me, Mozilla did not get past the mdash; entity which is present in the ISOpub.pen entity set. --David
Re: [Proposal] add DTDs to Apache website
On Jan 14, 2004, at 11:46 AM, David Crossley wrote: David Crossley wrote: 7) Does the Apache webserver deliver the supporting files using the appropriate Content-Type? Is that text/plain? Does it matter? We have *.dtd and *.mod and *.pen and *.ent extensions. This is what the webserver reports: *.dtd ... Content-Type: application/xml-dtd *.mod *.pen ... Content-Type: text/plain; charset=ISO-8859-1 Dunnow, http://www.faqs.org/rfcs/rfc3023.html indicates otherwise for the .mod en .pen in this case: The media type application/xml-dtd SHOULD be used for external DTD subsets or external parameter entities. How do we easily find out whether rfc 3023 has any official status - it appears on the standards track. /Steven -- Steven Noelshttp://outerthought.org/ Outerthought - Open Source Java XMLAn Orixo Member Read my weblog athttp://blogs.cocoondev.org/stevenn/ stevenn at outerthought.orgstevenn at apache.org
Re: [Proposal] add DTDs to Apache website
David Crossley wrote: ... 2) The Forrest build system is complex. It would be good to automate the publishing of DTD versions, but that may not be possible. Could you please explain a bit more? If it's just about placing the schema dir, or some of those dirs, in a predefined place, it can be done quite easily. 3) The Forrest website is built using the stable version of Forrest (currently v0.5.1). So how will DTDs from the current CVS (v0.6-dev) get into the website CVS [3]? Manual copy? See 4). Aaah, so you want to publish them on the site CVS (I was thinking of pushing them to the site, but you are right)... 4) If some committer changes the DTDs in CVS then they will be out-of-sync. Will committers remember to do the manual copy? See 3). I can check and make sure that the build system tells the committer to do so, or a script that checks every night for consistency can be done. -- Nicola Ken Barozzi [EMAIL PROTECTED] - verba volant, scripta manent - (discussions get forgotten, just code remains) -
Re: [Proposal] add DTDs to Apache website
I'm not CCing Forrest as I'm not subscribed there and they don't moderate my mails through. On 14.01.2004 05:53, David Crossley wrote: 3) The Forrest website is built using the stable version of Forrest (currently v0.5.1). So how will DTDs from the current CVS (v0.6-dev) get into the website CVS [3]? Manual copy? See 4). 4) If some committer changes the DTDs in CVS then they will be out-of-sync. Will committers remember to do the manual copy? See 3). I don't see this problem. On the one hand there are the older files like document 0.10 or 0.11 that won't be touched, on the other hand 0.12 (or is it already old too?) which is developed at the moment. You can't make incompatible changes for one version, otherwise you will break possibly thousands of documents out there. So only extensions are possible. In conclusion: the update cycle must not be once per minute, but maybe once per day or only week. Now what about having a cron job running on the website server that checks out recent DTD versions? Forcing manual work that's critical and without much effort automatically doable sounds not that good. 9) We will never know if the Catalog Entity Resolver gets broken after an upgrade. Forrest will still work but will be slower, doing downloads of the DTD and supporting files on each document parse. We can probably add a test document in the forrest seed site to detect failure. A really bad argument against the proposal :) Of course a real test is the way to go here. Joerg
Re: [Proposal] add DTDs to Apache website
Joerg Heinicke wrote: David Crossley wrote: 3) The Forrest website is built using the stable version of Forrest (currently v0.5.1). So how will DTDs from the current CVS (v0.6-dev) get into the website CVS [3]? Manual copy? See 4). 4) If some committer changes the DTDs in CVS then they will be out-of-sync. Will committers remember to do the manual copy? See 3). I don't see this problem. On the one hand there are the older files like document 0.10 or 0.11 that won't be touched, on the other hand 0.12 (or is it already old too?) which is developed at the moment. You can't make incompatible changes for one version, otherwise you will break possibly thousands of documents out there. So only extensions are possible. Absolutely. I think that i got a bit mixed up with whatever i was trying to say in item 4. We need proper version control and we have a naming convention for that. Forrest has been careful not to introduce any incompatible changes. However, i think that we need to be more careful about adding even new optional stuff. Every change should be a totally new DTD version. In conclusion: the update cycle must not be once per minute, but maybe once per day or only week. Now what about having a cron job running on the website server that checks out recent DTD versions? Forcing manual work that's critical and without much effort automatically doable sounds not that good. Good idea. Nicola Ken suggested something similar. I think that we need to be careful how far the automation goes. I mean that there are DTD versions in the HEAD CVS that are perhaps not yet ready to go public. Perhaps a deliberate manual process is better, but have a cronjob that reminds us if there are missing files on the website. 9) We will never know if the Catalog Entity Resolver gets broken after an upgrade. Forrest will still work but will be slower, doing downloads of the DTD and supporting files on each document parse. We can probably add a test document in the forrest seed site to detect failure. A really bad argument against the proposal :) Of course a real test is the way to go here. I gather that you mean a good argument. That is why i listed that issue. It would be a bad thing if Forrest/Cocoon silently started doing network retrievals like the current Xerces-2.6 web.xml issue. Forrest does now have a build test target which tries to build the forrest seed site. Are you suggesting that we would be better to re-instate the JUnit tests that Cocoon used to have? I am no expert, but i think that we need the test to actually be a part of the Forrest machinery so that when users create a new project then they get the test happening too. --David
Re: [Proposal] add DTDs to Apache website
Nicola Ken Barozzi wrote: David Crossley wrote: ... 2) The Forrest build system is complex. It would be good to automate the publishing of DTD versions, but that may not be possible. Could you please explain a bit more? I was mainly hinting that it might need to be a manual task. If it's just about placing the schema dir, or some of those dirs, in a predefined place, it can be done quite easily. It is only the /dtd/ and /entity/ dirs. These might also need to be re-arranged in xml-forrest CVS so that dumb clients can find all the bits properly. 3) The Forrest website is built using the stable version of Forrest (currently v0.5.1). So how will DTDs from the current CVS (v0.6-dev) get into the website CVS [3]? Manual copy? See 4). Aaah, so you want to publish them on the site CVS (I was thinking of pushing them to the site, but you are right)... I am just trying to meet the requirement that all Apache website content needs to be managed in the project's website CVS. We also need to get any stable DTD stuff that is still in 0.6-dev out to the website. 4) If some committer changes the DTDs in CVS then they will be out-of-sync. Will committers remember to do the manual copy? See 3). I can check and make sure that the build system tells the committer to do so, or a script that checks every night for consistency can be done. Either one would be good. --David