Some thoughts. BTW, new to the list - librarian working for a study-abroad program in Beijing here, building a new catalog with Koha these days and previously did competitive intelligence for investors looking at China's IT industries. I appreciate Matt trying to start an open-ended conversation about innovation and thought I'd toss my own rant in the ring.
One of the things that really struck me about libraries when studying for my MLIS was how much library systems were designed primarily for the backend and not consumer-facing until post-Internet, and built and maintained by third parties that aren't practicing or even trained librarians (and charging a pretty penny for it). There's a lot of catch up going on by a profession that outsourced these skill sets and is now rebuilding through groups like CODE4LIB, hence we may be behind the curve on innovation for a long time. I'm not sure how much "Big Data" really comes into play for most libraries. You might need terabytes of cloud storage for a digital preservation project, but considering the bulk of that would be the digitized images/videos/recordings themselves, each with a metadata record, you don't necessarily have a very large or complex a data structure. How many library projects are "beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time"? I'm honestly not sure, and I wonder about the nebulous definition. What is "commonly used"? Hadoop? On the other hand preserving "Big Data", say from the Large Hadron Collider, and creating discovery tools for future researchers, is something that librarians could potentially be involved in, but if CERN already built the database and discovery tools before it reached the library, did we miss the game? Do Big Data projects say to themselves in the planning stage "We need a librarian?" Should they? If so are we ready? Then there's the privacy issue: Even before Snowden, the ALA Code of Ethics bumped up against the power of crunching user data for recommendation systems and the like. Even if you adequately anonymize your data, taking it only in aggregate, it goes against the grain of traditional library culture. Any discussion of retaining user social profiles, search history, or activity tracking means talking about patron rights to anonymity. The goal I've been fixated on for library software development has been to deliver staff and patron-friendly open-source cataloging, discovery, and curation tools for libraries that take back control of our systems from closed corporate vendors, provide a user experience that matches or exceeds expectations created in the marketplace, and remain committed to the ethical standards and social contract traditionally held by libraries in our society. When you consider that most of the professional news industry delivers information discovery services using Drupal, Django, or Wordpress, why can't there be robust ecosystems like these for libraries? Hope I didn't bore anyone. Dave Lyons Digital Librarian The Beijing Center for Chinese Studies