At what point does one reach a point of absurdity?

Should there be a <punct char="." name="period"  role="full-stop" /> and a 
<punct char="," name="comma" role="pause" />

Should there be a word tag? <word role="verb"  tense="past">ran</word> 

I am sure these things would be great to have, but ultimately if somebody wants 
to make content available with that level of detail they should work on a 
conversion program that generates tagged content in XML. It would probably be 
something like the NLML Natural Language Markup Language.

If HTML is for markup of presentation content in browsers or similar user 
agents, then div and span are adequate for the job. You could namespace your 
divs and spans to accomplish what you want in terms of <span 
id="word_verb_past" >ran</span> and have a reading technology know how to 
process the ids for spans to determine how to present, read, or interact with 
the user. 

If HTML is supposed to be semantic then the argument in favor of sentence, 
sentence_fragment, phrase, word are not unreasonable because they do after all 
explain what you are seeing, at least for english speakers. Then again, I know 
its semantics, but a div with a specially formatted id, name, or perhaps a role 
attribute (if you really needed to add something) would semantically suggest 
what you are looking at. As would span. They suggest you are either looking at 
a block of content (div) or a fragment of a block or sub-set of a block (span); 
the only thing missing is the role of the div or span. While styles do imply 
role, style semantically suggests visualization.

I don't think the problem here is one of reading as much as writing. Nobody in 
their right might wants to sit down and markup their sentences unless they are 
working on something to teach someone about sentence structure. In which case 
they are better of learning XML XSLT and HTML and how to really use them and to 
work with with dedicated/controlled content. Frankly the majority of the 
content creators are not interested in teaching anybody how to read, but rather 
wants to sell a product, or blog about something, tweet their brainfart of the 
moment, or even share research as was originally the purpose of the web. 
However, if Word or other programs can tell me my grammar is wrong then it 
should be able to export my document in an xml format that marks up my content 
with grammatical markup. XSLT could transform that for use in a browser or 
translate it for use in other technologies. This request needs to start at the 
places where we produce content. Honestly, most of us still don't use even Word 
correctly (do you bold or italic individual words or do you apply a style?).

Based on how I've seen folks respond here, the HTML standard is based on what 
people are doing. So, rather than asking for something which may help something 
possibly do something, I think the key is to ask the right sector in the 
industry to actually build something that produces a dedicated markup language 
that HTML 6 can incorporate later. While I don't always agree with decisions 
made by folks here, I can understand their perspective that this is a fringe 
use case and not compelling enough to warrant new tags, especially when you can 
do that yourself with XHTML.


Art C.



On Apr 12, 2012, at 1:48 PM, Thomas A. Fine wrote:

> 
> This is in response to Benjamin Hawkes-Lewis' response to
> Adam Sobieski's proposal for sentence and phrase tags.
> 
> Speaking to the "necessity" of these tags, while I'm not sure really
> any tag, or HTML or the web or even a good slice of pizza can be
> described as necessary, these tags can definitely be useful, and
> most likely they can be important.  Sentence and phrase markings can
> be very useful to:
>  People relying on audio conversion to access the web.
>  People relying on automated translation.
>  People who are just learning to read.
>  People who are reading an article not in their native language.
>  People who are interested in inter-sentence spacing or inter-phrase spacing.
>  People with commercial interests, looking to maximize their reach.
> 
> Of course, simply adding tags won't really help any of these people.
> The real point is that such tags can facilitate tools that help
> these people.
> 
> The problem with using span tags is that they won't facilitate tool
> development.  In the absence of a real standard, no one is going
> to develop software to process sentences by searching for spans
> that might be labeled "sentence" or "sent" or "stc" or who knows
> what else.  Only in the presence of a standard tag, can developers use
> these tags to improve translation, or emphasize phrasing and sentence
> structure for improved readability.
> 
> Mr. Hawkes-Lewis wrote:
>> The web corpus is not going to get marked up with phrases and
>> sentences in the absence of NLP advances that would make such markup
>> mostly redundant.
> 
> Natural Language Processing is riddled with problems, and there is
> nothing to suggest that this will change in the near future.  On
> the other hand, someone who is authoring content is in the perfect
> situation to accurately identify sentences or phrases.  NLP can be
> an aid to that user, and can provide hints to help them select
> sentence structure.  But as I said above, no such software would
> ever be developed to use NLP to aid users in marking sentence
> structure unless there were already dedicated sentence and phrase
> tags.  So in essence, you are correct, but only because you're
> argument is a self-fulfilling prophesy.
> 
> You also suggest simply using a CSS pseudo-tag, and relying on the
> unicode sentence breaking conventions.  However, looking at these
> conventions, they are just another attempt at some sort of automated
> processing, and they acknowledge that this will not work for all cases.
> This is just one more argument in favor of giving content providers
> the ability to accurately mark up sentence structure.
> 
> I'll further note that any form of automated NLP is wholly inadequate
> when it comes to users interested simply in formatting control issues.
> Giving them a mechanism that does not provide control over where and
> when content will be formatted (other than some outside algorithm they
> don't control) is not providing any real control over formatting.
> 
> If you are saying that you don't think most people will bother, that is
> probably true.  But that doesn't mean that there aren't people with
> a legitimate and important interest.
> 
> So back to the original question, are these tags necessary?  I would
> now say yes, these tags are necessary to the development of software
> tools to aid users in marking sentence structure, and they are
> necessary to the development of tools that allow content providers
> to improve readability of their web pages for several classes of
> web users.
> 
>     tom
> 
> 


Reply via email to