>[EMAIL PROTECTED] (ewitness - Ben Fowler) wrote:
>[snip]
>> I don't mind admitting that as an outsider to the XML standard, this
>> looks like a bad, even a really bad, idea.
>>
>> My reading of your commentary is "Whitespace is sometimes respected,
>> and only a langauge lawyer can tell you when".
>
>Well, in some sense you are right, there are a lot of "really
>bad ideas" hidden in this area. However, you have to see this
>in context.

I most certainly am looking at it in context. I was trying to
do something simple and intuitive and it turned out gnarly and
difficult. XML is meant to build on other things such as SGML,
DSSSL and HTML by avoiding their mistakes.

>A *real* typesetter doesn't care about whitespace and line feeds,
>he thinks in paragraphs and columns and pages of flowing text,
>with various indentations and margins and such.

Exactly so, and he thinks of leading and line height, and
he thinks of paragraphs with 'space before' and 'space after'.
I am prepared to argue that FO is a 'real' typesetter here,
and should 'think' the same way.

>TeX was practically written to support this view, and this is
>the default how FO processors work.

Quoting from the XML-dev list, a gentleman wanted to play space
cadets and we got unix, another gentleman wanted to distribute his
phone list and we got the WW web. Pretty much every worthwhile
advance in the computer field has come from one person with a
problem to solve. TeX came about because Professor Knuth
<URL: http://www-cs-faculty.stanford.edu/~knuth/ > knew that
computers could aid typesetting: it was written with one practical
aim, rather than supporting a view.

I don't see how you can argue that because TeX has \newline and
\par it follows that FO should not have a semantic <br /> or
forced line-break.

>The problem: not everybody is a typesetter, many people don't
>even know about how to set indents and hanging indents and margins
>and this stuff, but they have a space and an enter key sitting
>squarely on their keyboard.

I may have misread you, but I think that you have intertwined
two, possibly three things.

1. Not everybody is a typesetter ...

Exactly, this is why there is a division of skill or labour.
Authors write and typesetters mark up and set text. This is
TeX 101, exempli gratia
<URL: http://www.ideography.co.uk/library/seybold/WYS_intro.html >,
and <URL: http://www.ecn.wfu.edu/~cottrell/wp.html >
        The author of a text should, at least in the first instance,
        concentrate entirely on the first of these sets of tasks.
        That is the author's business. Adam Smith famously pointed
        out the great benefits that flow from the division of labor.
        Composition and logical structuring of text is the author's
        specific contribution to the production of a printed text.
        Typesetting is the typesetter's business. This division of
        labour was of course fulfilled in the traditional production
        of books and articles in the pre-computer age. The author
        wrote, and indicated to the publisher the logical structure
        of the text by means of various annotations. The typesetter
        translated the author's text into a printed document,
        implementing the author's logical design in a concrete
        typographical design. One only has to imagine, say, Jane
        Austen wondering in what font to put the chapter headings of
        Pride and Prejudice to see how ridiculous the notion is.
        Jane Austen was a great writer; she was not a typesetter.
        
        You may be thinking this is beside the point. Jane Austen's
        writing was publishable; professional typesetters were
        interested in laying it out and printing it. You and I are
        not so lucky; if we want a printed article we will have to
        do it ourselves (and besides, we want it done much faster
        than via traditional typesetting). Well, yes and no. We will
        in a sense have to do it ourselves (on our own computers),
        but we have a lot of help at our disposal. In particular we
        have a professional-quality typesetting program available.
        This program (or set of programs) will in effect do for us,
        for free and in a few seconds or fractions of a second, the
        job that traditional typesetters did for Shakespeare, Jane
        Austen, Sir Walter Scott and all the rest. We just have to
        supply the program with a suitably marked-up text, as the
        traditional author did.
        
        I am suggesting, therefore, that should be two distinct
        ``moments'' in the production of a printed text using a
        computer. First one types one's text and gets its logical
        structuration right, indicating this structuration in the
        text via simple annotations. This is accomplished using a
        text editor, a piece of software not to be confused with a
        word processor. (I will explain this distinction more fully
        below.) Then one ``hands over'' one's text to a typesetting
        program, which in a very short time returns beautifully
        typeset copy.

2. If misuse of the tab/space/enter keys is the problem, then
always ignoring (unescaped) whitespace is part of the solution.
Typists can add spaces and tabs as they think fit, see
<URL: http://ricardo.ecn.wfu.edu/~cottrell/emacs-screen.jpg  >
(as if you need to), without this getting anywhere near the layout
'engine'. My problem is that whitespace is sometimes significant,
I am amazed that nobody else sees this as a problem too.

3. Sometimes when typing a document one needs to end a line, and
sometimes a paragraph. The return key is used to end a paragraph.
It follows that some other means is needed to end a line, exempli
gratia [SHIFT][RETURN], (perhaps you remember wordperfect...)
and this should be stored properly in the file.

>The correct way to express
>
>procedure foo();
> begin
>   dostuff:=false;
> end
>
>would be something like:
><fo:block>
> <fo:block>foo();</fo:block>
> <fo:block margin-left="1em">
>  <fo:block>begin</fo:block>
>  <fo:block margin-left="2em">
>   <fo:block>dostuff:=false;</fo:block>
>  </fo:block>
>  <fo:block>end</fo:block>
> </fo:block>
></fo:block>
>but chances are you'll get it space- or even (shudder!) tab-indented.
>(Take a postal address block for another, less IT-related example)

(I did).

Please don't think I intend to cause offence, but think that you
have stumbled across a trap, and possibly into it. XML whilst
written as, and intended for, structural mark-up, is
presentation-neutral. By which I mean that incorporating
presentational details in FO does not harm its XMLness, and
it can still be handled with normal XML tools. There is a
possible 'error of conflation' in the example you gave in the sense
that, to use your phrase, the correct way to express a code
procedure is something like

<proc>
        <name>foo</foo>
        <body>
                <statement>
                        <expression>dostuff:=false</expression>
                </statement>
        </body>
</proc>

and wouldn't involve fo: at all at any stage where you are thinking
in terms of "getting it space or tab indented".

(and note that plain text's being XML without the tags, a
compiler capable of generating a parse tree, could generate
that XML (and better, marking up the identifiers and operators,
for example)).

When we come to generate a FO file, we will definitely need
a lot of line breaks for the short lines: the expressions
and statements, and a lot of paragraph breaks at the
end of procedures and control structures, and particularly
at the end of comments which flow differently from the
source itself. Your example was far to short to bring
out the points under discussion. There is already a
considerable body of expertise in storing code in XML format,
see, for example, the Ant project.
<URL: http://jakarta.apache.org/ant/index.html >, and
<URL: http://craigc.com/pg/chap11.html >. There are probably
better examples as well, but I don't have URLs to hand.

I really don't see how code can be marked up for formatting
(which is how I see FO), without using both line breaks and
paragraph breaks, unless you remove all structure, emit
only presention, glyphs - which is the paper equivalent
of the single pixel gif for 'marking up' web pages - what
you see is all you get. This is postscript/pdf type solution,
where I am looking for a TeX type solution. If I am SOL,
I will just have to pipe down.

Is this how you see FO? I can't prove you wrong. I am asking
for structure (lines and paragraphs) to co-exist with
presentation. I have no refutation of a proposal that like
Dante, we should abandon structure when we enter FO, FO
will certainly work like that, just like galleys do. I merely
think that you are rejecting the XMLness of FO, and I can't see
why, and I certainly can't see any benefit. Cui bono?
You seem to be making it difficult for the human reader
and user (if any there be) of FO for no actual benefit.

>[If i'd get a chance to correct the past, i probably kill the
>inventor of the tab character before he commits his crime :-]

That is a bit extreme. The tab or tabulator key enables one
to create tables using tab stops instead of the space bar.
When I used a typewriter (or wordperfect) I found the tab
key (or its successor, tables) extremely useful.

>There is a lot of whitespace formatted data out there,

Which is why I attempt to claim that whitespace formatted
data is XML without the tags. When I need to process that
data, I would like to insert the tags, for the benefit
of XSLT processing, and for the benefit of the human reader.
This is why I need a line break tag.

>You might have noted that in HTML+CSS <br> actually *is* redundant,

I am not complaining about redundancy, given that the
'respect whitespace' mechanism is in the specifications, all
I can ask is that we have two mechanisms (also redundancy).

>it is just heavily (ab)used because it produces predictable results

If the results are predictable, how can it be abuse? About the
7th of the commandments of useability is to never force ones
users to choose between the easy way and the right way of doing
something.

>without fumbling with gnarly CSS settings.

css settings???

>Especially if you have to bring already whitespace formatted data online
>*quickly*.

But we all do this, all the time.

One good method is to ask authors with this background to use
Star Office to create their pages their way (note the congruence
of the easy way and the right way), save them as XML and convert
to HTML (XHTML), PDF, RTF with XSLT. A perfect division of labour.
(This came up on the WYLUG list recently. It works)
<URL: http://wylug.org.uk/pipermail/wylug-discuss/2002-February/001846.html >

>Typewriter habits are hard to get rid of, regardless how enraged
>professionals are about this.

Which is why I say [RETURN] for end of paragraph - </p>, and
[SHIFT][RETURN] for end of line - <br />; to make the easy way
the right way.

Ben.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to