On Wed, 3 Nov 2010, John Hearns wrote:
What I think we should be doing is working towards a media-agnostic form of storing data.
Media agnostic with OPEN codices. A great curse on all such IT mechanisms is the closed codex. ASCII works because it is utterly open, as is its logical extension to UTF. But what about music? What about movies? What about books? What about spreadsheets, processed text documents, etc etc? Perhaps html5 will magically solve all such problems, but I doubt it.
A recognition that scientific data (and other forms, like movies and music etc.) will carry with them metadata and that the data will migrate through many types of physical media in its lifetime, and will from the outset have multiple copies made. I guess the HPC Grid computing types are doing this already, what I'm rather thinking about is a universal standard for this, and a way of carrying the metadata with the actual data in a way it cannot be lost.
I think this is all dead on correct, but bearing in mind the forces of darkness arrayed on the other side of this, concerned with everything from DRM to encryption to owning and controlling the codex, I personally am not holding my breath. There are also numerous purely technical issues -- modulus problems, for example, in conversion between ogg and mp3 that result in artifacts when switching between lossy compression algorithms that result in nonlinear degradation of information. Similar issues when dealing with old VGA vs 1080p and so on. None of which will go away as the technology evolves. I'm not certain that this is a truly solvable problem.
Its also funny that I use the term "lifetime" - I guess in the past we all have assumed digital data will have an infinite lifetime, as as discussed above it has come to pass that the decay of media, or reading apparatus being unavailable has made data have a finite lifetime. The real point I am making here is that with cloud type data storage over IP connections even in HPC we will be seeing data accessed not on SCSI volumes (be that direct SCSI, fibrechannel, iSCSI, RAID etc) but from an HTTP accessed object store. You might then say that "Hey - performance matters and that's why we still have SCSI" - I would counter that you will see home users accessing data via ADSL, business users via gigabit, and those HPC class systems will have 10 / 40 / 100 gigabit interfaces.
All of which is groovy and I would never argue, but that doesn't address the relative vulnerability of centralized data both to certain kinds of attack and to other kinds of accidents. Or to political control. If Google (ultimately) controls all the data, who controls Google? What happens if they use it for evil instead of for good? How could one <i>stop</i> them from using it for evil if they have your data and also provide you with all of the software you are using to access that data? Who will, after all, guard the guardians? rgb
_______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:[email protected] _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
