MIT to create digital library Aims to save scholars' output By Peter J. Howe, Globe Staff, 11/4/2002
CAMBRIDGE - Never shy about pursuing epic concepts, the Massachusetts Institute of Technology today is formally launching one of its boldest projects in years: an effort to create a long-term ''digital library'' encompassing virtually the entire intellectual output of MIT scholars and researchers. Called DSpace, the joint venture between MIT and technology giant Hewlett-Packard Co. is aiming to create a ''superarchive'' to save trillions of bytes' worth of digital information. It will cover everything from recordings of classroom lectures and experiments to brain scans, ocean-floor surveys, and monitoring of interstellar space. DSpace aims to solve the digital era's version of a problem that is plaguing conventional libraries holding troves of intellectual content stored in formats, such as Dictabelt recordings, 5-inch floppy disks, and rotting newsprint, that face the risk of becoming unusable by future generations. MIT is hoping to lead a ''federation'' of universities around the world that would build systems using the DSpace technology that would make scholarly information available to any Internet-connected computer in the world. At least eight other schools are expected to join DSpace by next September, including Cambridge University in England, Columbia, and the universities of Rochester, Toronto, and Washington. MIT is also working to make the system interconnect with a similar effort at California's university system and Ohio State. MIT president Charles M. Vest said DSpace aims to ''set the new standard for the stewardship of knowledge in the research environment.'' Ann Wolpert, director of MIT's libraries, which already have more than 5 million conventional volumes including books and papers, said MIT officials have been working on the system since 1998. More and more of professors' and researchers' intellectual work is now ''born digitally'' rather than as easily cataloged or scanned papers, Wolpert said. As the library began to get an increasing number of requests to archive digital files like videos and huge research ''data sets,'' Wolpert said, ''We realized this was probably the tip of the iceberg. We thought if we want to be a library of the 21st century, we'd better start cracking.'' Hewlett-Packard provided a $1.8 million grant to launch the project, which could create millions of dollars in new business for the company if other universities follow suit. MIT expects to spend about $250,000 annually on maintaining and operating DSpace, which would include a Google-like search engine enabling visitors to search for information using content ''tags'' identifying files. The project is using freely available ''open source'' software to make it possible for other universities and organizations to join. MacKenzie Smith, the DSpace project director, said about 1,000 items totaling over 2 terabytes of data have been archived already - comparable to the hard-disk memory of 200 high-end personal computers. In time, MIT expects to be saving petabytes of data, or thousands of terabytes. All of the words in every book in the Library of Congress, excluding pictures, are often described as being equivalent to 20 terabytes, an indication of the enormous scale of the MIT project. In conducting a survey of potential demand for digital storage, Smith said, she found some MIT researchers own ''data sets'' totaling 30 terabytes. Hal Abelson, an MIT computer science and electrical engineering professor who is helping lead the effort, said: ''I think the problem that libraries are going to have is what they don't put into it'' given the potentially limitless demand for storage capacity. As MIT moves to put more and more course material online and available across the Net, Abelson said DSpace will prove to be a crucial way to provide an archive of each year's course material and course Web sites. About 50 MIT classes now make their course materials available on line, with another 150 expected to come online next year and all 1,500 to 2,000 yearly courses by 2008 or 2009. A public symposium to launch DSpace is being held this morning from 8:30 a.m. to 12:30 p.m. at MIT's Bartos Theatre, room E15-070. Peter J. Howe can be reached at [EMAIL PROTECTED]