Does anyone have an opinion on usage of an RDF/XML spec for describing the description of content that can be imported or exported into Plucker? Right now the Plucker Desktop showcase is using an html file for reading the descriptions, but XML is a better long-term strategy. Probably one of the more difficult aspects of a non-XML, or ini file format is arrays of items, as there is the problem of the delimiter then not allowable in the entry. One of the most useful aspects of a showcase file is the exclusions as they take effort to set up, so have the most use if they are premade, ready to go.
I finished off an XML parser in C++ to read XML Plucker channel descriptions. Expat was chosen as the base because it is the fastest, validation isn't needed (or planned), and expat is already in the Plucker Desktop code as the XML resource parser is based on expat, so no extra library size is needed in the final executable. What remains is what the XML file should be: Options are: [] Make it RDF-like. [] Use namespaces of metadata descriptions already existing: --Dublin Core (dc) for the language the channel is in, author, area of coverage (these are columns in the showcase dialog, so can sort out channels in the user's language, location). --Syndication (sy) for the update frequency, period and database. [] Non proprietary: elements aren't called plucker*. [] Room for extension by later plucker, and/or other types of programs. [] Can be logically read by a human reader. [] Organized by <some_property>1</some_property> instead of <some_property value="1" /> --Except for exclusion list, which is more readable in the exclusion_list style of <exclusion_item action="include" priority="1">.*zip$</exclusion_item> [] Use unique names of elements such as <exclusion_item> or use items and list elements such as <exclusion_list> <li>... <li>... </exclusion_list> [] For new-user readability, either use namespaces, such as <images:max_compression> or to nest the things inside an <images> tag for clarity, such as <images><max_compression>1</max_compression><bpp>1</bpp></images> The <images> can just be ignored during parse if unique names for elements, doesn't do anything, just makes it more obvious of what those group of things are related to, instead of a giant amorphous mass of data elements. [] Use underscores or case for element names. I like underscores better, but case is usually the way the rest of the world works. Others use updatePeriod instead of update_period. Better to use one style or the other consistently in the format. [] Elements named in the positive; ie <include_url_info>1</include_url_info>, so not double negative confusion. [] Ask if other interested parties (Mozilla, mazingo, open directory projects, etc) willing to participate in hammering out a standard, so that there isn't 99 versions of expressing the same data. The Dublin Core idea works well, as far as having a shared term to describe a documents library information, but no spec exists for handheld sites, so we might as well make one. [] All descriptions must live inside a top-level node, so need to decide what to call that too. Below is an organization of elements nested by topic, and with RDF currently removed for clarity. I think namespaces would work better though. Best wishes, Robert <?xml version="1.0" encoding="utf-8"?> <channel_list> <channel> <title>Advogato</title> <link>http://www.advogato.org</link> <spidering_options> <verbosity>1</verbosity> <close_on_exit>1</close_on_exit> <close_on_error>0</close_on_error> </spidering_options> <limits_options> <maxdepth>1</maxdepth> <stayonhost>1</stayonhost> <stayondomain>1</stayondomain> <exclusions> <exclusion_file>http://www.advogato.com/exclusionlist.rss</exclusion_file> <exclusion_item action="exclude" priority="0">http://\.www\.osdn\.com.*</exclusion_item> </exclusions> </limits_options> <security_options> <copyprevention_bit>0</copyprevention_bit> <owner_id>Bill</owner_id> </security_options> <images_options> <bpp>1</bpp> <maxheight>250</maxheight> <maxwidth>150</maxwidth> <alt_maxheight>1000000</alt_maxheight> <alt_maxwidth>1000000</alt_maxwidth> <image_compression_limit>50</image_compression_limit> </images_options> <output_options> <backup_bit>1</backup_bit> <launchable_bit>0</launchable_bit> <no_url_info>1</no_url_info> <categories> <category>News</category> <category>Linux</category> </categories> <compression>zlib</compression> <launcher:show_icon>1</launcher:show_icon> <launcher:large_icon_file>big.bpp</launcher:large_icon_file> <launcher:small_icon_file>small.bpp</launcher:small_icon_file> </output_options> <destinations> <doc_file>C:\windows\desktop\ethics</doc_file> <sync_users> <user>Rob O'Connor</user> <user>Steve Evans</user> </sync_users> <copy_to_directories> <directory>C:\output</directory> <directory>C:\publish</directory> </copy_to_directories> </destinations> <autoupdate_options> <sy:updatePeriod>daily</sy:updatePeriod> <sy:updateFrequency>2</sy:updateFrequency> <sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase> <autoupdate_options> <!---Dublin core---> <!---Needed? Already have a title---> <dc:title>Advogato</dc:title> <!---TGN qualifier (gettysburg thesaurus) seems like best bet here, since some---> <!---channels are city-specific, others by country, continent, or world---> <dc:coverage>Boston, MA, USA</coverage> <dc:description>The free software developer's advocate.</dc:description> <dc:subject>Free software, GPL, News</dc:subject> <!---Needed? 99% are going to be text. Only perhaps a few image only---> <dc:type>text</dc:type> <dc:creator>Advagato team([EMAIL PROTECTED])</dc:creator> <dc:publisher>Independent</dc:publisher> <dc:rights>All rights reserved</dc:rights> <dc:contributor>John Spencer</dc:contributor> <!---Qualifier here? Created, or updated? Relevant when there is an updateBase?---> <dc:date>2002-02-14</dc:date> <dc:language>en</dc:language> <!---Future additions:---> <!---Whether channel currently active or not---> <active>1</active> <!---Strip out alt tags for uncrawled images---> <strip_alt_tags>1</strip_alt_tags> <!---Future optionals---> --An image for the channel, like in an RSS. --Number of pages after which to abort spider and build a pdb of what was done. --Size (in kb) of output file after which to abort spider and build the pdb. --Estimate of (kb)size of a channel (or combo with previous?) </channel> </channel_list>