suggestion for a re-implementation of Bundles in the DSpace data model 
-----------------------------------------------------------------------

                 Key: DS-893
                 URL: https://jira.duraspace.org/browse/DS-893
             Project: DSpace
          Issue Type: Improvement
          Components: DSpace API
    Affects Versions: 1.8.0
            Reporter: Bill Hays


Preliminary ideas for a new implementation of "Bundle" in the DSpace data model

Current database model relationships:        
   Item <- Item2Bundle -> Bundle <- Bundle2Bitstream -> Bitstream
Current java object model relationships:     
   Item <-> Bundle <-> Bitstream

Proposed database model relationships (1):   
  Item <- Item2Bitstream -> Bitstream(id, ..., bundlename, ...)
or even more succinctly:                    
  Item <- Bitstream(id, item_id, bundlename, ...)

In current DSpace, there is no realized benefit from the container complexity 
in the current model for Bundles.
This first step in the proposal removes the Bundle table and directly 
associates Bitstream to Item.  The concept of "bundle" is replaced by an enum 
field in the Bitstream that identifies a bundle type (ORIGINAL, THUMBNAIL, 
etc).     Functionally this is very similar to what we get now:  A bitstream 
belongs to one item and is associated with one bundle.  The bundle names are 
not constrained, but some names are expected in various parts of the codebase.  
           
              
Proposed database model relationships (2):   
Item <- Bitstream -> MBundle(id, name, collection_id, derivative ...)

This variation replaces the bundlename enum with a new class and database table 
"MBundle."  Here bundles are not implemented as containers but are an 
associated type concept for a bitstream.  With the association to a collection, 
bundles can be managed per collection or use a default set.  Other properties 
of MBundle can be added to further enhance management capabilities, e.g.:

   isDerivative - identify bundles for Thumbnails and DerivativeText
   isVisible    - indicate that the related bitstreams should be visible in 
display contexts
   isReserved   - such as for very large "source" objects not for display or 
filtermedia
     [needs work - how complex does bundle "metadata" need to be?]
     
Issues:

Primary Bitstream Id:  This is currently only used for the ORIGINAL bundle, so 
conceptually there is one per item. Note that the current model (API and 
database) allows for multiple ORIGINAL bundles which therefore allows multiple 
primary bitstream ids; however, the implementation doesn't expose this 
possibility. 
 Possible replacement API calls, depending on the implementation:
    item.setPrimaryBitstream(Bitstream b)
    bitstream.isPrimary(Boolean b)
 Various db solutions:
    item.primaryBitstreamId  - not standard database normalization but 
consistent with dspace practice
    item2bitstream.primaryBitstream - a boolean, standard normalization but 
requires some management to avoid duplicates
    mbundle.primaryBitstreamId - not standard database normalization but 
consistent with dspace practice
    item.primaryBitstream - a boolean, standard normalization but requires some 
management to avoid duplicates
    
    In the event that someone has used primaryBitstreamId in non-ORIGINAL 
bundles for special purposes, only 3 or 4 would work.

Affected Java code:  Item and Bitstream would need to be adjusted.  This is 
fairly low-level so should not be visible to much of the API.  Group 
authorizations would need some work (this has not been fully analyzed).  Custom 
code that uses the API might be affected.  Custom SQL such as for reporting 
might break, but the replacement is shorter code.  Collection management of 
bundles types would need a new tab on the collection page (XMLUI).  
 
Upgrading a DSpace instance:  The database can be modified with queries.  No 
affect on assetstores.   
   
Benefits:

   Simpler, more concise model which removes unused/unnecessary containership 
structure.
   Enhanced bitstream management with bundle properties.
   Enumeration of names instead of uncontrolled strings, preventing typos in 
bundle names (e.g. from ItemImport)
   Provides easy solution to making derivative bitstreams visible.
   Moving bitstreams between bundles does not require deleting and re-adding 
the bitstream.
   Fixes data model problem with primary bitstream and multiple bundles with 
the same name

Drawbacks:

   Not a backwardly compatible change.  A fundamental change to the data model.
   Custom SQL code connecting using bundles will require rework.

Summary:
   Bundles are categories for Bitstreams and do not need to be imlemented as 
containers.
   Bundles could be improved with added metadata and management features.
   The current Bundle implementation may not be a priority issue to merit the 
work suggested.  However, the ideas
   above may be suggestive for other work, including metadata for all DSpace 
objects and exposing the data model
   to external systems (e.g. Fedora)


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://jira.duraspace.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to