We have two opposing problems that are glaring at each other with
fists raised:

o  Is 2GB enough?  Maybe not:  how big is an hour of video? how big is
   the take from an instrument in a typical run on the Large Hadron
   Collider?

o  HTTP has problems with the uploading of huge quantities of data.

The resolution, to date, has been: "give us the file and all
particulars, and we'll put it in for you."  Some contributors probably
like that, and others probably dislike it.  Some sites probably wish
they didn't have to get so involved with submissions about which they
know very little.

A potential problem is that the item importer imports *items*, so
someone has to make up item batch-load descriptors for these mondo
submissions.  IMHO this is incorrect partitioning:  the contributor
could easily submit everything else, if not for the size of the the
bitstream(s).  The admin.'s help is only needed for the bitstream(s).

Here are a few ideas to stir up our thinking:

o  Attach a bitstream by URL.  A submission could include a URL and
   credentials to pull a bitstream rather than push it.  Then a
   background process can take as much time as needed, use protocols
   like FTP, keep restarting from where the connection broke off until
   it gets the whole thing, etc.

   There's potential for abuse of the server to attack others' sites
   with file-transfer requests, so maybe "can request pulls" is an
   EPerson attribute that an administrator would have to enable for
   trusted users.

o  Attach a bitstream by token.  "Data to follow.  Give me a UUID to
   identify this bitstream when you see it again."  Then the
   contributor can fill out all the forms and launch his submission
   without wait or frustration.  DSpace can email copies of the UUIDs
   to the contributor so they don't get mislaid.  When the
   contributor's pallet-load of SSDs arrives from the loading dock and
   they're cabled up, the admin. tells DSpace, "that file over there
   is bitstream UGLY-UUID" and DSpace will associate it with the Item
   sitting in some workspace or workflow.

   This could be abstracted to facilitate the automation of uptake of
   files that arrive by some magical means behind DSpace's back.
   You'd wind up with a package descriptor file again, as with item
   registration, but look how simple it is: "UUID filename".  No
   dublin_core.xml, no descriptions, no collection-IDs, no EPerson-IDs
   but the admin's own (for the provenance).  No having to track new
   forms of metadata when they are implemented in the GUIs.
   Everything but the bitstream itself is already in an existing
   submission.

Probably we should define a bulk-transport provider interface and let
people cook up their own pluggable variations on the theme.

-- 
Mark H. Wood, Lead System Programmer   mw...@iupui.edu
Asking whether markets are efficient is like asking whether people are smart.

Attachment: pgpbjNDGUhBOS.pgp
Description: PGP signature

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to