[CODE4LIB] Position announcement - Digital Library Architect
The Los Alamos National Laboratory (LANL) Research Library is one of the premier digital libraries in the world, providing state-of-the-art information technology tools to our research community and developing innovative web technologies to further information availability, accessibility, and digital preservation. The Research Library is seeking an analytical, creative individual interested in scientific research data and digital resources to review & lead implementation of the next generation digital library. The Digital Library Architect (Solutions Architect 3) will be responsible for researching, collaboratively planning and delivering a forward-looking, sustainable infrastructure for the content stewardship and delivery of scientific content by applying community established best practices to cutting-edge technology. The new infrastructure will integrate diverse content repositories of both data and scientific literature to create a cohesive and extensible suite of access, discovery, preservation, curation, security, repository, archival & storage services to underpin LANL's unique collections of scientific and technical information. Key Requirements: * Expert knowledge of standard web programming tools/frameworks, database application development, content and data management, hardware and systems programming technologies and storage management. * Demonstrated experience and commitment to designing and developing resource-centric applications that adhere to core architectural principles of the Web. * Demonstrated experience with using mainstream Web 2.0 technologies. * Ongoing interest in Semantic Web technologies and concepts including RDF and Linked Open Data. * Experience with standards emerging from the digital library community (e.g., OAI-PMH, ORE, Open URL, PREMIS, and other metadata standards, DOI, etc.). * Ability to provide project leadership from specification to launch. * Demonstrated proficiency applying best practices to technical projects, including automated testing and use of software development collaboration tools, build management, and version control software. * Masters degree in Library/Information Science or Computer Science or equivalent combination of education and experience highly desired. Must have U.S. citizenship. For more information or to apply, please see : http://www.hr.lanl.gov/JobListing/SingleJobAd.aspx?JobNumber=221760
[CODE4LIB] JOB - Team Leader for Digital Initiatives, Los Alamos National Lab Research Library
The Los Alamos National Laboratory Research Library is seeking a lead for its Digital Initiatives Team with responsibility for researching and developing a forward-looking, sustainable infrastructure for preservation, management and delivery of scientific and institutional content by applying community-established best practices to cutting-edge technology. We seek a creative, inspiring manager who will lead a small team on projects aimed at tackling challenges of information interoperability and integration across our large scale digital collections and growing repositories. The library is ambitiously looking forward to becoming embedded and integral to LANL's information infrastructure by providing a robust repository for our unique collections supporting the e-research needs of our world-class scientific community. Key Requirements: Demonstrated experience/ expertise in all of the following: * designing & developing resource-centric applications that adhere to core principles of the Web Architecture / REST; * digital library community-based standards (OAI-PMH / ORE, OpenURL, etc.); * standard web programming tools/frameworks & database query syntax; mainstream Web 2.0 technologies; * interest in Semantic Web technologies & concepts (RDF, RDFS, OWL, triple stores, SPARQL, information resources vs non-information resources, etc.); * embrace concepts from the Social Web & Linked Data effort; * existing and emergent content & storage management standards/technologies; * ability to lead complex, cross-organizational projects & guide diverse constituents toward common goals; * management, leadership, motivational & influencing skills. For more information or to apply, please go to http://www.hr.lanl.gov/JobListing/SingleJobAd.aspx?JobNumber=221059
Re: [CODE4LIB] DIY aggregate index
And its true that if you get the article metadata directly from the publishers, you avoid the issues with duplication that we have with the secondary databases who all re-format and add data to each record they receive. However, I would guess this requires many more negotiations (many more publishers) than dealing with the A&I vendors. Miriam LANL On 7/2/10 6:57 AM, "Laurence Lockton" wrote: Eric is right, a few European institutions have been doing this for several years. At the University of Bath we've been using ELIN http://elin.lub.lu.se/elinInfo, which Lund University in Sweden had been operating since 2001 (until recently - it's now been effectively spun off.) This is also what underlies the DOAJ site http://www.doaj.org/ It seems to me that there are two approaches to building these aggregated indexes: (1) load whole databases (mostly A&I) and catalogues, as an alternative to federated search, and (2) collect article-level metadata, mostly from primary publishers, to build an index of the library's e-journals collection, then possibly add the "print catalogue." LANL sounds like it's taken the first approach; ELIN and Journal TOCs http://www.journaltocs.hw.ac.uk/ are based on the second. The approach taken by the commercial vendors is somewhat blurred between the two, but I would suggest that EBSCO Discovery Service and OCLC WorldCat Local are broadly based on the first approach and Serials Solutions Summon and Ex Libris Primo Central are more focussed on the second. I think this is an important consideration for anyone selecting a service, or contemplating building their own. Laurence Lockton University of Bath UK
Re: [CODE4LIB] DIY aggregate index
On 7/1/10 9:44 AM, "Jonathan Rochkind" wrote: the technical issues of maintaining the regular flow of updates from dozens of content providers, and normalizing all data to go in the same index, are non-trivial, I think now. >> This is very much one of the hardest parts, Jonathan. Also, thinking about the kinds of services that users want from this data, we've found the biggest need is to focus on citation references if you can get them. (e.g. ISI) And if you think the bibliographic metadata is poor quality, try matching on brief reference metadata (that which doesn't contain unique identifiers, of course.) Complex fuzzy string matching and it still is never really great. (this is part of the problem with cite counts being all over the map in the the apps out there!) My words to the wise are to NOT do local loading unless you have a lot of time and money. Vendors who are doing it have economies of scale. Individual institutions typically do not. If the community were to make agreements to have centralized management at a few institutions for this kind of "open" dataset, maybe. But, as someone noted, the middle-men ("value add" A&I producers - Thompson, EBSCO, etc.) are not going to love this idea. Miriam Blake Los Alamos National Laboratory Research Library
Re: [CODE4LIB] DIY aggregate index
We are one of those institutions that did this -negotiated for lots of content YEARS ago (before the providers really knew what they or we were in for.) We have locally loaded records from the ISI databases, INSPEC, BIOSIS, and the Department of Energy (as well as from full-text publishers, but that is another story and system entirely.) Aside from the contracts, I can also attest to the major amount of work it has been. We have 95M bibliographic records, stored in > 75TB of disk, and counting. Its all running on SOLR, with a local interface and the distributed aDORe repository on backend. ~ 2 FTE keep it running in production now. Over the 15 years we've been loading this, we've had to migrate it 3 times, and deal with all the dirty metadata, duplication, and other difficult issues around scale and lack of content provider "interest" in supporting the few of us who do this kind of stuff. We believe we have now achieved a standardized format (MPEG-21 DIDL and MARCXML with some other standards mixed in) and accessible through protocol-based services (OpenURL, REST, OAI-PMH), etc. so that we hope we won't have to mess with the data records again and can move on to other more interesting things. It is nice to have, very fast - very much beats federated search - and allows us (finally) to begin to build neat services (for licensed users only!) Data mining? Of course a goal, but talk about sticky areas of contract negotiation. And in the end, you never have everything someone needs when they want all content about something specific. And yes, local loading is expensive, for a lot of reasons. Ex Libris, Summon, etc. are now getting into the game from this angle. We will so feel their pain, but I hope technology and content provider engagement have improved to make it a bit easier for them! And it definitely adds a level of usability much improved over federated search. My .02, Miriam Blake Los Alamos National Laboratory Research Library On 6/30/10 3:20 PM, "Rosalyn Metz" wrote: i know that there are institutions that have negotiated contracts for just the content, sans interface. But those that I know of have TONS of money and are using a 3rd party interface that ingests the data for them. I'm not sure what the terms of that contract were or how they get the data, but it can be done. On Wed, Jun 30, 2010 at 5:07 PM, Cory Rockliff wrote: > We're looking at an infrastructure based on Marklogic running on Amazon > EC2, so the scale of data to be indexed shouldn't actually be that big of an > issue. Also, as I said to Jonathan, I only see myself indexing a handful of > highly-relevant resources, so we're talking millions, rather than 100s of > millions, of records. > > > On 6/30/2010 4:22 PM, Walker, David wrote: > >> You might also need to factor in an extra server or three (in the cloud or >> otherwise) into that equation, given that we're talking 100s of millions of >> records that will need to be indexed. >> >> >> >>> companies like iii and Ex Libris are the only ones with >>> enough clout to negotiate access >>> >>> >> I don't think III is doing any kind of aggregated indexing, hence their >> decision to try and leverage APIs. I could be wrong. >> >> --Dave >> >> == >> David Walker >> Library Web Services Manager >> California State University >> http://xerxes.calstate.edu >> >> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Jonathan >> Rochkind [rochk...@jhu.edu] >> Sent: Wednesday, June 30, 2010 1:15 PM >> To: CODE4LIB@LISTSERV.ND.EDU >> Subject: Re: [CODE4LIB] DIY aggregate index >> >> Cory Rockliff wrote: >> >> >>> Do libraries opt for these commercial 'pre-indexed' services simply >>> because they're a good value proposition compared to all the work of >>> indexing multiple resources from multiple vendors into one local index, >>> or is it that companies like iii and Ex Libris are the only ones with >>> enough clout to negotiate access to otherwise-unavailable database >>> vendors' content? >>> >>> >>> >> A little bit of both, I think. A library probably _could_ negotiate >> access to that content... but it would be a heck of a lot of work. When >> the staff time to negotiations come in, it becomes a good value >> proposition, regardless of how much the licensing would cost you. And >> yeah, then the staff time to actually ingest and normalize and >> troubleshoot data-flows for all that stuff on the regular basis -- I've >> heard stories of libraries that tried to do that in the early 90s and it >> was nightmarish. >> >> So, actually, I guess i've arrived at convincing myself it's mostly >> "good value proposition", in that a library probably can't afford to do >> that on their own, with or without licensing issues. >> >> But I'd really love to see you try anyway, maybe I'm wrong. :) >> >> >> >>> Can I assume that if a database vendor has exposed their content to me >>> as a subscri