Hi Dennis,

On Mar 15, 2012, at 9:58 AM, Dennis E. Hamilton wrote:

> Here's my understanding of the situation with regard to DOCTYPE and how pages 
> may be assembled from parts prior to being stored (static) or delivered 
> (dynamic) from the server.
> 
> If there are any tools that mechanically generate web page <body> content, 
> partly or in their entirety, you have to ensure that when the final page is 
> served up from the server, the result is compatible with some single DOCTYPE 
> declaration.  I assume that the CMS is the likely determining factor, since 
> it will generally be designed to generate a particular grade of HTML.  That 
> includes the result following server-side includes as well.

Yes. The skeleton is in control and what needs adjustment, but it has to 
interpret many types, some of which use UPPER CASE tags.


> 
> There is no reason that national-language choice should be the determining 
> factor.  I wonder if that has simply been that the different authoring 
> communities had their own preferences, perhaps related to agreements around 
> authoring tools.

You don't need to wonder. That is what happened. I particularly like the 
Mongolian site: http://www.openoffice.org/mn/

The NLC sites are done in many different styles of HTML - there is no general 
conformance to particular DOCTYPE like there is in say the "DE" site.


> 
> Likewise, the character encoding has to be the same throughout the served web 
> page.  I presume that is UTF8, since there are NL concerns and it is simply a 
> good choice.  That means the httpd setting ensure the proper MIME type with 
> specific character setting is also part of the response header.

There are exceptions. Some sites like Sinhalese 
(http://www.openoffice.org/si/.) do not use UTF-8, but instead use "Thai". 
There are BODY Tags which are a to-do for insertion.

Of course, you might want Pastun / Pashto which *is* UTF-8 - 
http://www.openoffice.org/ps/

There really are a lot of NL sites in various stages.

> 
> The only way to be able to operate this in a sane way is to have it be the 
> same for all pages as delivered from the server.

Most likely it should always be the same. The question is whether conversion to 
a particular DOCTYPE choice will break parts of the site. In that case we can 
use the ssi.mdtext trick. NL sites are in the mix because that will be the 
determining factor.

So far the only content area that has a divergent ssi.mdtext is the api. I 
mention NL because that is where the divergence exists. Have a look you will 
see great variation.

Regards,
Dave

>  There may be similar considerations for the Community Forums and the 
> MediaWiki as well.  Those choices can be resolved independently but the 
> DOCTYPE declarations should be accurate at all times, of course.  That is not 
> always the case on many sites.
> 
> - Dennis
> 
>  PS: I'm ignoring the HTML 4.01 vs XHTML 1.0 debate.  Going to HTML5 still 
> requires a decision whether it is done using the HTML or XML flavor.  No 
> matter what the direction, the problem is going to be how page assembly is 
> done and which page-generating products have to be accommodated.  Finally, it 
> is important to have valid pages under whatever the DOCTYPE is and also have 
> a successful result with as many browsers and their users as possible.  It 
> might be more valuable to consider what it takes to make the pages adaptable 
> on small-format device browsers (i.e., smartphones and tablets) and pay close 
> attention to accessibility requirements than fuss about not-yet-approved HTML 
> specifications.
> 
> - Dennis
> 
> -----Original Message-----
> From: Dave Fisher [mailto:dave2w...@comcast.net] 
> Sent: Thursday, March 15, 2012 08:33
> To: ooo-dev@incubator.apache.org
> Subject: Re: Doctype of websites
> 
> 
> On Mar 15, 2012, at 7:37 AM, Rob Weir wrote:
> 
>> On Thu, Mar 15, 2012 at 10:33 AM, Dave Fisher <dave2w...@comcast.net> wrote:
>>> 
>>> On Mar 15, 2012, at 12:22 AM, Regina Henschel wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Joe Schaefer schrieb:
>>>>>> ________________________________
>>>>>> From: Regina Henschel<rb.hensc...@t-online.de>
>>>>>> To: ooo-dev@incubator.apache.org
>>>>>> Sent: Tuesday, March 13, 2012 5:31 PM
>>>>>> Subject: Re: Doctype of websites
>>>>>> 
>>>>>> Hi Joe,
>>>>>> 
>>>>>> Joe Schaefer schrieb:
>>>>>>> Those de.openoffice.org pages should redirect
>>>>>>> to www.openoffice.org/de pages, if not your
>>>>>>> DNS resolver is busted.
>>>>>> 
>>>>>> I had indeed set de.openoffice.org to 192.9.163.104. Removing it makes
>>>>>> redirecting work.
>>>>>> 
>>>>>> That means the pages at de.openoffice.org had been the original ones,
>>>>>> but will be deleted in near future. They had been imported to
>>>>>> ooo-site.apache.org/de and here they have got a different doctype. Right?
>>>>> 
>>>>> 
>>>>> 
>>>>> Well sort of. If you look at the actual document on the site
>>>>> you will probably find it contains an XHTML doctype even now.
>>>>> The thing is that the CMS build system as Dave has designed it
>>>>> will strip most of the header matter out of the file and replace
>>>>> it with a generic one supplied by a template.
>>>>> 
>>>>> 
>>>>>> 
>>>>>>   If that's not the problem
>>>>>>> then you need to refresh your pages as they
>>>>>>> are identical on the server.
>>>>>>> 
>>>>>>> As to why the doctype is different from the original
>>>>>>> document, that's probably due to the way Dave worked
>>>>>>> out the templates for the site.  If we need to scrape
>>>>>>> the doctype out of each individual page that will require
>>>>>>> some perl coding work, some templating work,
>>>>>>> and another sledgehammer style commit- ie not something
>>>>>>> to be taken lightly.
>>>>>> 
>>>>>> Our pages had been XHTML with all the differences to HTML. And we tried
>>>>>> to produce valid pages (including W3C check button). It is not
>>>>>> impossible to change the pages and it can be done bit by bit while
>>>>>> reviewing the pages. But the aim should be clear.
>>>>> 
>>>>> 
>>>>> Well I can't advise you how to proceed from here, only point out
>>>>> that there is some impedance mismatch between how your site builds
>>>>> work and what's actually in these documents.  The choice seems
>>>>> to be either standardize all the documents on a common doctype
>>>>> or have the perl code pull the doctype out of the original document
>>>>> if it exists and pass it along to the template as an argument.
>>>>> 
>>>>> 
>>>>> You might even be better off just not supplying a doctype at all
>>>>> and letting the browser figure it out.  Up to you folks.
>>>>> 
>>>> 
>>>> If we want valid pages, a common doctype is needed because the inserted 
>>>> part has to be written in a way, that it fits this doctype. For example 
>>>> you need for the feather-logo an <img .../> element in XHTML and in HTML 
>>>> only <img ...>. So I think we need to agree on one doctype.
>>>> 
>>>> Is it possible to count, how many pages of all are actually having an 
>>>> XHTML doctype? (I'm not familiar with command line.)
>>>> 
>>>> Kind regards
>>>> Regina
>>>> 
>>>> P.S. The feather img-Element is missing the alt-attribute.
>>> 
>>> I have been looking into this. In general the skeleton is the non-compliant 
>>> part and is what should be changed. However there are many of the NLC sites 
>>> that are very much HTML.
>>> 
>>> One more sledgehammer will happen ... but planning needs to be careful.
>>> 
>> 
>> What if we went subdomain by subdomain and ran HTML Tidy on the
>> content to coerce it to a single doctype. Would that butcher things?
> 
> We have a file called content/brand.mdtext that controls the branding 
> language and logo for each page. 
> 
> In templates we have templates/ssi.mdtext and templates/api/ssi.mdtext
> 
> David-Fishers-MacBook-Air:templates dave$ more ssi.mdtext
> brand:  /brand.html
> footer: /footer.html
> topnav: /topnav.html
> home:           home
> 
> I think that ssi.mdtext should add a line like:
> 
> doctype:      <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
> "http://www.w3.org/TR/html4/loose.dtd";>
> 
> And if "mn" needs a different treatment:
> 
> templates/mn/ssi.mdtext
> brand:  /mn/brand.html
> footer: /footer.html
> topnav: /mn//topnav.html
> home:           home
> doctype: 
> 
> This fits the NL plan. I want to avoid divergent skeleton.html files, and it 
> may be the case that some sections will want an xhtml skeleton while others 
> get a html.
> 
> I still intend to avoid changing every file.
> 
> I've $job to pay attention to until late today ... sorry that I'm dribbling 
> out these plans bit by bit.
> 
> Regards,
> Dave
> 
> 
>> 
>> -Rob
>> 
>>> Regards,
>>> Dave
>>> 
>>> 
> 

Reply via email to