Re: [Beowulf] Re: Interesting

Robert G. Brown Wed, 03 Nov 2010 04:03:08 -0700

On Wed, 3 Nov 2010, John Hearns wrote:

What I think we should be doing is working towards a media-agnostic
form of storing data.


Media agnostic with OPEN codices.  A great curse on all such IT
mechanisms is the closed codex.  ASCII works because it is utterly open,
as is its logical extension to UTF.  But what about music?  What about
movies?  What about books?  What about spreadsheets, processed text
documents, etc etc?  Perhaps html5 will magically solve all such
problems, but I doubt it.

A recognition that scientific data (and other forms, like movies and
music etc.) will carry with them metadata and that the data will
migrate through many types of physical media in its lifetime, and will
from the outset have multiple copies made.
I guess the HPC Grid computing types are doing this already, what I'm
rather thinking about is a universal standard for this, and a way of
carrying the metadata with the actual data in a way it cannot be lost.


I think this is all dead on correct, but bearing in mind the forces of
darkness arrayed on the other side of this, concerned with everything
from DRM to encryption to owning and controlling the codex, I personally
am not holding my breath.  There are also numerous purely technical
issues -- modulus problems, for example, in conversion between ogg and
mp3 that result in artifacts when switching between lossy compression
algorithms that result in nonlinear degradation of information.  Similar
issues when dealing with old VGA vs 1080p and so on.  None of which will
go away as the technology evolves.  I'm not certain that this is a truly
solvable problem.

Its also funny that I use the term "lifetime"  - I guess in the past
we all have assumed digital data will have an infinite lifetime, as as
discussed above it has come to pass that the decay of media, or
reading apparatus being unavailable has made data have a finite
lifetime.

The real point I am making here is that with cloud type data storage
over IP connections even in HPC we will be seeing data accessed not on
SCSI volumes (be that direct SCSI, fibrechannel, iSCSI, RAID etc) but
from an HTTP accessed object store. You might then say that "Hey -
performance matters and that's why we still have SCSI" - I would
counter that you will see home users accessing data via ADSL, business
users via gigabit, and those HPC class systems will have 10 / 40 / 100
gigabit interfaces.


All of which is groovy and I would never argue, but that doesn't address
the relative vulnerability of centralized data both to certain kinds of
attack and to other kinds of accidents.  Or to political control.  If
Google (ultimately) controls all the data, who controls Google?  What
happens if they use it for evil instead of for good?  How could one
<i>stop</i> them from using it for evil if they have your data and also
provide you with all of the software you are using to access that data?

Who will, after all, guard the guardians?

   rgb


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[email protected]


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Re: Interesting

Reply via email to