O withother datasources

Robert O'Connor Tue, 19 Feb 2002 15:42:56 -0800


Hi David, thanks for the feedback.


> > Expat was chosen as the base because it is the fastest, validation isn't
> > needed (or planned), and expat is already in the Plucker Desktop code as
> > the XML resource parser is based on expat, so no extra library size is
> > needed in the final executable.
>
>       There's actually more differences than that in your choice of
> parser. As you know, there's two different models, SAX and DOM. With
> something like Plucker, we probably want a SAX model, not DOM, since it's
> not easy to store an in-memory copy of the document tree during parse time
> for something with serveral hundred/thousand links. expat doesn't
> do DOM, so
> that's in our favor.

Amen. An entire tree in memory would bring this laptop of mine to its knees.
While some of the DOM parsers certainly some have nice features, can't beat
expat on the performance end.

> > [] Make it RDF-like.
>
>       No need for this, since we probably aren't going to "publish" or
> syndicate it.

I don't really like RDF that much, since until one has looked at a few, its
not really evident what the logic is. However, RDF does deal with some
similarity (ie cataloging a site description for indexing, location and
publishing). We aren't publishing anything, but one of the more reasonable
places for a site description is from a site itself (similar to RSS, they
just keep a file on their site to describe how an offline browser should
crawl it. Somewhat similar to a robots.txt file, but might as well use an
XML syntax if starting from scratch). Not much syndication, but I will
probably use the same element terminology for updates (update
period,frequency,base). I originally used maxage=#ofseconds since that is
more flexible, but it is harder to human read.

> > [] Organized by <some_property>1</some_property> instead of
> <some_property
> > value="1" />
>
>       Ah, here's where it gets really fun, elements vs. attributes.
>
> > [] For new-user readability, either use namespaces, such as
> > <images:max_compression> or to nest the things inside an <images> tag
> > for clarity, such as
> > <images><max_compression>1</max_compression><bpp>1</bpp></images>
>
>       Here's a perfect case for attributes. Don't get element happy with
> your XML, it complicates things. remember XML describes content. Let's
> describe it properly.
>
>       <images bpp="1" maxheight="200" alt_maxwidth="300"></images>



> > [] Use underscores or case for element names. I like underscores better,
> > but case is usually the way the rest of the world works. Others use
> > updatePeriod instead of update_period. Better to use one style or the
> > other consistently in the format.
>
>       Underscores makes sense, much moreso in our context than
> CamelNotation, which is what you're used to in the C++ world. Let's stick
> with meaningful type names, I vote underscores.

I agree. But I guess it is the "our context" aspect that I am yet
considering, both for case, and what the elements are named. Limits adoption
if things aren't done in a logical and common way.
Rest of the parties may look and say, why on Earth are these underscores
instead of the usual format. Others decide on a format to set standard,
Plucker format is non-compliant. Or worse, 15 different standards, none
compatible, everyone has to write waste time making an import and export for
every format (which is the way it is now).

> > <?xml version="1.0" encoding="utf-8"?>
>
>       <?xml version="1.0" encoding="utf-8" standalone="yes" ?>
>
>       It's important to note that XML processors are only required to
> support UTF-8 and UTF-16. You could use the following encoding
> declaration,
> but it may not give you the results you want, depending on the
> parser used.
>
>       <?xml version="1.0" encoding="ISO-8859-1" standalone="yes" ?>

Yes, that is what will be used. Thanks for catching that.

>         <link>http://www.advogato.org</link>
>
>       <html xmlns="http://www.advogato.org/";></html>
>
>       (Remember, you can't have // characters inside elements)

Hmmm. Never saw that before in any spec, but things are always changing and
I may well have missed it completely in the texts. Do you have a ref handy
by any chance?

>       <gather verbosity="1" close_error="1" close_exit="0"></gather>

As the number of attributes continue to grow (images has a pile already),
the individual element lines get very long, making it harder to read an XML
document without either scrolling
around or linebreaking the element. XML is nicer to edit by hand when it is
written like
source code--everything avaialable on the screen at the same time.

Plus a query of inuitiveness of empty tags. ' / >' notation may be better in
an all-attribute approach if there is no data to be placed inside the tag.


Good stuff, all in all.

Best wishes,
Robert

RE: Request for comments: RDF/XML descriptions for Plucker I/O withother datasources

Reply via email to