Hey Bruce, Totally agree. In Apache OODT, we have the layered, hierarchical attribute approach, already, wherein which the first layer is the "Product Type" concept, capturing aggregate information bout a set of products (name, id, classification, aggregate free form metadata, sets of metadata extractors, and "versioning" or file placement on disk policy, etc.). The next level is the Product level, where we capture product (or frequently changing) level metadata; apply the versioning scheme for file placement, and extract metadata using the per product type specified metadata extractors. Attributes at the product level are hierarchical in the sense that they are defined by per product type policy (though we can accept others) and are specified in the OODT File Manager server through the use of a particular ValidationLayer, and RepositoryManager selected for the running system.
HTH! Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Bruce Barkstrom <brbarkst...@gmail.com> Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org> Date: Wednesday, June 26, 2013 2:50 PM To: "dev@oodt.apache.org" <dev@oodt.apache.org> Subject: Re: [jira] [Resolved] (OODT-639) Add a versioner based on Product Type Metadata >I'll at least mention an issue we've discovered when examining existing >Earth science data collections: the collections have a layered or >hierarchical >structure, with each layer having different kinds of metadata (or, if I >did >it >a bit more formally, different types of metadata). In Formal Concept >Analysis, it shows up as hierarchical attributes. > >The easiest example is perhaps the NOAA Emergency Response Imagery >collection, where the top layer divides about 250,000 digital photos into >about twenty collections - one for each disastrous storm that needed an >emergency response. Examples include Hurricane Sandy, Hurricane >Katrina, and the Tuscaloosa Tornado. The top level metadata fields >are (Storm Name, Storm Date, and Storm Location). One level below >that, the storm collections bifurcate into image collections organized >by airplane flight path or organized by coarse geographic boxes. >Each storm collection has this bifurcation - and so once a user >has distinguished which storm collection he or she is interested in, >Storm Name (or Date or Location) is not helpful in distinguishing >a flight path collection from a geographic collection. > >The level below that for the flight path requires distinguishing which >flight path is the one you want. Once you decide that, you can get >a zipped file with several hundred jpg images. On the other hand, >if you go the geographic boxes, you can get individual images. >The metadata types for the boxes are quire different from the metadata >for the flight paths. > >The same kind of structure crops up for rock core archives (wells, boxes >in a well, core fragments in a box), as well as for many of the satellite >data collections. > >The usual library science approach (e.g. Functional Requirements >for Bibliographic Records, and its relatives) assumes all the inventoried >objects in an archive should have the same types of metadata. With >the Emergency Response Imagery collection, the "Responsible Party" >field is sort of "Dept. of Commerce, NOAA, National Ocean Survey, >National Geodetic Survey, Emergency Response Imagery Project" >and its the same for every image or zipped image file. At least in >this case, an "Author" metadata field (assuming that Dept. of Commerce >is a Responsible Party and that a Responsible Party is equivalent >to an Author) is of NO help at distinguishing which of the quarter >million files you might want to get. In FCA, the algorithms would >ignore the field if applied to the whole collection. > >Some care would appear to be appropriate. > >Bruce b. > >On Sat, Jun 22, 2013 at 1:31 PM, Chris A. Mattmann (JIRA) ><j...@apache.org>wrote: > >> >> [ >> >>https://issues.apache.org/jira/browse/OODT-639?page=com.atlassian.jira.pl >>ugin.system.issuetabpanels:all-tabpanel] >> >> Chris A. Mattmann resolved OODT-639. >> ------------------------------------ >> >> Resolution: Fixed >> >> - fixed in r1495761. Added unit test and doc updates to all product >>types >> suggesting how to use the new versioner. >> >> > Add a versioner based on Product Type Metadata >> > ---------------------------------------------- >> > >> > Key: OODT-639 >> > URL: https://issues.apache.org/jira/browse/OODT-639 >> > Project: OODT >> > Issue Type: New Feature >> > Components: file manager >> > Reporter: Chris A. Mattmann >> > Assignee: Chris A. Mattmann >> > Fix For: 0.6 >> > >> > >> > Add a versioner that allows users to input the filePathSpec for the >> MetadataBasedVersioner using ProductTypeMetadata, e.g., >> > {code:xml} >> > <type id="foo" name="Foo"> >> > <typeMetadata> >> > <keyval> >> > <key>filePathSpec</key> >> > <val>/[AcquisitionDate]/[Filename]</val> >> > </keyval> >> > </typeMetadata> >> > .. >> > </type> >> > {code} >> >> -- >> This message is automatically generated by JIRA. >> If you think it was sent incorrectly, please contact your JIRA >> administrators >> For more information on JIRA, see: >>http://www.atlassian.com/software/jira >>