"J.H.M. Dassen" wrote:

> "predecessor" might suggest HTML replaces SGML. It most certainly does not.

This might be a little off, as it was a couple of years since my SGML
class, but here goes:

SGML was created in the 70s by an IBM lawyer, Charles Goldfarb, who
wanted a format that could be easily read by both people and machines
and would not be tied to one commercial product and thus subject to
superannuation. The resulting system separates document structure from
document formatting and content. It became an ISO standard at some point
and is widely used today mostly by very large organizations such as the
IRS and Microsoft. 

Wtih SGML, you have a Document Type Definition that defines a document
in terms of a hierarchical set of nesting parts, or elements. These
definitions are given using regular expressions, and it is possible to
declare the scope of any element in terms of what other elements it can
or must include, and how many. So you have a large validation capability
there.

HTML is merely one DTD, and tags such as <P> are merely elements of that
DTD. Many HTML documents will actually refer to the DTD that was current
when they were created -- look for the <!DOCTYPE ... > declaration at
the top line, above the <html> tag. 

With HTML, though, validation has not been much enforced, since TBL and
those who wrote the first browsers wanted to encourage everyone to write
HTML and not get discouraged with error messages. Other SGML browsers
are usually "validating," in that they return a list of errors rather
than a best guess if you don't get it right.

Besides HTML, two really important DTDs are CAL, used by the US Defense
Department, and Docbook, used in publishing and elsewhere. There are
some other important ones that I can't remember. 

Many vertical industries (i.e., small electronic parts industry) will
standardize on a DTD for the purpose of electronic data interchange (a
kind of wholesale-level electronic commerce) or other kinds of
communication. Also the legal industry is a big SGML user.

One cool thing about SGML is that you can pretty easily <fun
smile=yes>create your own markup system</fun>. Then you can mark up your
documents with your own custom tags that make sense to you for your
purposes. Like if you are creating a menu you have <entree>leg of
lamb</entree> and it becomes really easy to markup. You leave the
formatting to the browser, which knows that entrees are supposed to be
14-pt Garamond, either because it was programmed that way or because a
style sheet tells it so.

People involved in SGML tend to be a little fanatical about thinking it
is the only proper format for data, and so forth. I recall sitting there
listening to Charles Goldfarb lecture to our class. He was saying
something like, "Once your data is in SGML, you can easily convert it to
any other format du jour, such as HTML or the latest Word format. Or you
can just leave it in SGML." At that point he wrote "Leave in SGML" on
the board and it struck me that it was an anagram for "Evangelism," as
long as you write it in a circle so that it repeats infinitely, i.e.
LeaveInSGMLeaveInSGMLeaveInSGMLeaveInSGML...

XML is basically SGML with a sexier name and a few of the more arcane
bits left off. But like SGML, you can define your own DTDs and tags and
stuff, so it's really an order of magnitude beyond HTML. 

Hope that helps.

Reply via email to