I'll at least mention an issue we've discovered when examining existing
Earth science data collections: the collections have a layered or
hierarchical
structure, with each layer having different kinds of metadata (or, if I did
it
a bit more formally, different types of metadata).  In Formal Concept
Analysis, it shows up as hierarchical attributes.

The easiest example is perhaps the NOAA Emergency Response Imagery
collection, where the top layer divides about 250,000 digital photos into
about twenty collections - one for each disastrous storm that needed an
emergency response.  Examples include Hurricane Sandy, Hurricane
Katrina, and the Tuscaloosa Tornado.  The top level metadata fields
are (Storm Name, Storm Date, and Storm Location).  One level below
that, the storm collections bifurcate into image collections organized
by airplane flight path or organized by coarse geographic boxes.
Each storm collection has this bifurcation - and so once a user
has distinguished which storm collection he or she is interested in,
Storm Name (or Date or Location) is not helpful in distinguishing
a flight path collection from a geographic collection.

The level below that for the flight path requires distinguishing which
flight path is the one you want.  Once you decide that, you can get
a zipped file with several hundred jpg images.  On the other hand,
if you go the geographic boxes, you can get individual images.
The metadata types for the boxes are quire different from the metadata
for the flight paths.

The same kind of structure crops up for rock core archives (wells, boxes
in a well, core fragments in a box), as well as for many of the satellite
data collections.

The usual library science approach (e.g. Functional Requirements
for Bibliographic Records, and its relatives) assumes all the inventoried
objects in an archive should have the same types of metadata.  With
the Emergency Response Imagery collection, the "Responsible Party"
field is sort of "Dept. of Commerce, NOAA, National Ocean Survey,
National Geodetic Survey, Emergency Response Imagery Project"
and its the same for every image or zipped image file.  At least in
this case, an "Author" metadata field (assuming that Dept. of Commerce
is a Responsible Party and that a Responsible Party is equivalent
to an Author) is of NO help at distinguishing which of the quarter
million files you might want to get.  In FCA, the algorithms would
ignore the field if applied to the whole collection.

Some care would appear to be appropriate.

Bruce b.

On Sat, Jun 22, 2013 at 1:31 PM, Chris A. Mattmann (JIRA)
<j...@apache.org>wrote:

>
>      [
> https://issues.apache.org/jira/browse/OODT-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Chris A. Mattmann resolved OODT-639.
> ------------------------------------
>
>     Resolution: Fixed
>
> - fixed in r1495761. Added unit test and doc updates to all product types
> suggesting how to use the new versioner.
>
> > Add a versioner based on Product Type Metadata
> > ----------------------------------------------
> >
> >                 Key: OODT-639
> >                 URL: https://issues.apache.org/jira/browse/OODT-639
> >             Project: OODT
> >          Issue Type: New Feature
> >          Components: file manager
> >            Reporter: Chris A. Mattmann
> >            Assignee: Chris A. Mattmann
> >             Fix For: 0.6
> >
> >
> > Add a versioner that allows users to input the filePathSpec for the
> MetadataBasedVersioner using ProductTypeMetadata, e.g.,
> > {code:xml}
> > <type id="foo" name="Foo">
> >  <typeMetadata>
> >   <keyval>
> >     <key>filePathSpec</key>
> >     <val>/[AcquisitionDate]/[Filename]</val>
> >   </keyval>
> >  </typeMetadata>
> > ..
> > </type>
> > {code}
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

Reply via email to