I'll at least mention an issue we've discovered when examining existing Earth science data collections: the collections have a layered or hierarchical structure, with each layer having different kinds of metadata (or, if I did it a bit more formally, different types of metadata). In Formal Concept Analysis, it shows up as hierarchical attributes.
The easiest example is perhaps the NOAA Emergency Response Imagery collection, where the top layer divides about 250,000 digital photos into about twenty collections - one for each disastrous storm that needed an emergency response. Examples include Hurricane Sandy, Hurricane Katrina, and the Tuscaloosa Tornado. The top level metadata fields are (Storm Name, Storm Date, and Storm Location). One level below that, the storm collections bifurcate into image collections organized by airplane flight path or organized by coarse geographic boxes. Each storm collection has this bifurcation - and so once a user has distinguished which storm collection he or she is interested in, Storm Name (or Date or Location) is not helpful in distinguishing a flight path collection from a geographic collection. The level below that for the flight path requires distinguishing which flight path is the one you want. Once you decide that, you can get a zipped file with several hundred jpg images. On the other hand, if you go the geographic boxes, you can get individual images. The metadata types for the boxes are quire different from the metadata for the flight paths. The same kind of structure crops up for rock core archives (wells, boxes in a well, core fragments in a box), as well as for many of the satellite data collections. The usual library science approach (e.g. Functional Requirements for Bibliographic Records, and its relatives) assumes all the inventoried objects in an archive should have the same types of metadata. With the Emergency Response Imagery collection, the "Responsible Party" field is sort of "Dept. of Commerce, NOAA, National Ocean Survey, National Geodetic Survey, Emergency Response Imagery Project" and its the same for every image or zipped image file. At least in this case, an "Author" metadata field (assuming that Dept. of Commerce is a Responsible Party and that a Responsible Party is equivalent to an Author) is of NO help at distinguishing which of the quarter million files you might want to get. In FCA, the algorithms would ignore the field if applied to the whole collection. Some care would appear to be appropriate. Bruce b. On Sat, Jun 22, 2013 at 1:31 PM, Chris A. Mattmann (JIRA) <j...@apache.org>wrote: > > [ > https://issues.apache.org/jira/browse/OODT-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Chris A. Mattmann resolved OODT-639. > ------------------------------------ > > Resolution: Fixed > > - fixed in r1495761. Added unit test and doc updates to all product types > suggesting how to use the new versioner. > > > Add a versioner based on Product Type Metadata > > ---------------------------------------------- > > > > Key: OODT-639 > > URL: https://issues.apache.org/jira/browse/OODT-639 > > Project: OODT > > Issue Type: New Feature > > Components: file manager > > Reporter: Chris A. Mattmann > > Assignee: Chris A. Mattmann > > Fix For: 0.6 > > > > > > Add a versioner that allows users to input the filePathSpec for the > MetadataBasedVersioner using ProductTypeMetadata, e.g., > > {code:xml} > > <type id="foo" name="Foo"> > > <typeMetadata> > > <keyval> > > <key>filePathSpec</key> > > <val>/[AcquisitionDate]/[Filename]</val> > > </keyval> > > </typeMetadata> > > .. > > </type> > > {code} > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira >