Hey Jukka,

So you're seeing the delineation more as:


 *   metadata = document level stuff
 *   XHTML = textual representation [which can included finer-grained what I 
would call "metadata" too]

?

If so, interesting, I wonder then if there should be some sort of rethinking 
then of the way that we capture or represent the XHTML because I would think 
that our existing Metadata object could be reused at that level too. Maybe have 
like a textual/XHTML metadata object as well, where the keys were things like 
the IDs (or some generated ID) representing each XHTML tag (where nesting is 
something like key=outer tag/inner tag 1/inner tag 2) and where the values were 
the attribute values themselves.

I wonder if this would work as a representation format. Then it's easy to 
define "views" on top of the Metadata object like an hCard view, or an "XHTML" 
view [with attributes and w/o]. WDYT?

Cheers,
Chris



On 5/26/10 8:02 AM, "Jukka Zitting" <jukka.zitt...@gmail.com> wrote:

Hi,

On Wed, May 26, 2010 at 3:49 PM, Mattmann, Chris A (388J)
<chris.a.mattm...@jpl.nasa.gov> wrote:
> I'm worried that we're mixing concerns here. Some of the information that
> you cite above sounds more to me like metadata (and in fact, thinking about
> it, you could argue that attributes themselves on the XHTML amount that
> defines the textual structure) are more like metadata attributes too. Where
> do you see the delineation?

The Metadata object can only represent document-level metadata, so
it's not suitable for things like:

* this paragraph is written in French
* the bounding box of this word is X on PDF page Y
* this phrase is a hyperlink to URL X
* these words denote a physical address

XHTML attributes are a perfect way to represent such annotations. It
would be great if we could leverage some of the applicable microformat
standards like hCard to simplify downstream use of such information.

BR,

Jukka Zitting



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to