[CODE4LIB] Position announcement - Digital Library Architect

2011-06-20 Thread Blake, Miriam E
The Los Alamos National Laboratory (LANL) Research Library is one of the 
premier digital libraries in the world, providing state-of-the-art information 
technology tools to our research community and developing innovative web 
technologies to further information availability, accessibility, and digital 
preservation.

The Research Library is seeking an analytical, creative individual interested 
in scientific research data and digital resources to review & lead 
implementation of the next generation digital library. The Digital Library 
Architect (Solutions Architect 3) will be responsible for researching, 
collaboratively planning and delivering a forward-looking, sustainable 
infrastructure for the content stewardship and delivery of scientific content 
by applying community established best practices to cutting-edge technology. 
The new infrastructure will integrate diverse content repositories of both data 
and scientific literature to create a cohesive and extensible suite of access, 
discovery, preservation, curation, security, repository, archival & storage 
services to underpin LANL's unique collections of scientific and technical 
information.

Key Requirements:


 *   Expert knowledge of standard web programming tools/frameworks, database 
application development, content and data management, hardware and systems 
programming technologies and storage management.
 *   Demonstrated experience and commitment to designing and developing 
resource-centric applications that adhere to core architectural principles of 
the Web.
 *   Demonstrated experience with using mainstream Web 2.0 technologies.
 *   Ongoing interest in Semantic Web technologies and concepts including RDF 
and Linked Open Data.
 *   Experience with standards emerging from the digital library community 
(e.g., OAI-PMH, ORE, Open URL, PREMIS, and other metadata standards, DOI, etc.).
 *   Ability to provide project leadership from specification to launch.
 *   Demonstrated proficiency applying best practices to technical projects, 
including automated testing and use of software development collaboration 
tools, build management, and version control software.
 *   Masters degree in Library/Information Science or Computer Science or 
equivalent combination of education and experience highly desired.

Must have U.S. citizenship.

For more information or to apply, please see : 
http://www.hr.lanl.gov/JobListing/SingleJobAd.aspx?JobNumber=221760


[CODE4LIB] JOB - Team Leader for Digital Initiatives, Los Alamos National Lab Research Library

2011-03-30 Thread Blake, Miriam E
The Los Alamos National Laboratory Research Library is seeking a lead for its 
Digital Initiatives Team with responsibility for researching and developing a 
forward-looking, sustainable infrastructure for preservation, management and 
delivery of scientific and institutional content by applying 
community-established best practices to cutting-edge technology. We seek a 
creative, inspiring manager who will lead a small team on projects aimed at 
tackling challenges of information interoperability and integration across our 
large scale digital collections and growing repositories. The library is 
ambitiously looking forward to becoming embedded and integral to LANL's 
information infrastructure by providing a robust repository for our unique 
collections supporting the e-research needs of our world-class scientific 
community.


Key Requirements:

Demonstrated experience/ expertise in all of the following:

*   designing & developing resource-centric applications that adhere to 
core principles of the Web Architecture / REST;
*   digital library community-based standards (OAI-PMH / ORE, OpenURL, 
etc.);
*   standard web programming tools/frameworks & database query syntax; 
mainstream Web 2.0 technologies;
*   interest in Semantic Web technologies & concepts (RDF, RDFS, OWL, 
triple stores, SPARQL, information resources vs non-information resources, 
etc.);
*   embrace concepts from the Social Web & Linked Data effort;
*   existing and emergent content & storage management 
standards/technologies;
*   ability to lead complex, cross-organizational projects & guide diverse 
constituents toward common goals;
*   management, leadership, motivational & influencing skills.


For more information or to apply, please go to 
http://www.hr.lanl.gov/JobListing/SingleJobAd.aspx?JobNumber=221059


Re: [CODE4LIB] DIY aggregate index

2010-07-02 Thread Blake, Miriam E
And its true that if you get the article metadata directly from the publishers,
you avoid the issues with duplication that we have with the secondary databases
who all re-format and add data to each record they receive.  However, I would
guess this requires many more negotiations (many more publishers) than
dealing with the A&I vendors.

Miriam
LANL


On 7/2/10 6:57 AM, "Laurence Lockton"  wrote:

Eric is right, a few European institutions have been doing this for
several years. At the University of Bath we've been using ELIN
http://elin.lub.lu.se/elinInfo, which Lund University in Sweden had been
operating since 2001 (until recently - it's now been effectively spun
off.) This is also what underlies the DOAJ site http://www.doaj.org/

It seems to me that there are two approaches to building these
aggregated indexes:
  (1) load whole databases (mostly A&I) and catalogues, as an
alternative to federated search, and
  (2) collect article-level metadata, mostly from primary publishers, to
build an index of the library's e-journals collection, then possibly add
the "print catalogue."

LANL sounds like it's taken the first approach; ELIN and Journal TOCs
http://www.journaltocs.hw.ac.uk/ are based on the second. The approach
taken by the commercial vendors is somewhat blurred between the two, but
I would suggest that EBSCO Discovery Service and OCLC WorldCat Local are
broadly based on the first approach and Serials Solutions Summon and Ex
Libris Primo Central are more focussed on the second. I think this is an
important consideration for anyone selecting a service, or contemplating
building their own.

Laurence Lockton
University of Bath
UK


Re: [CODE4LIB] DIY aggregate index

2010-07-01 Thread Blake, Miriam E
On 7/1/10 9:44 AM, "Jonathan Rochkind"  wrote:

the technical issues of maintaining the regular flow of updates from
dozens of content providers, and normalizing all data to go in the same
  index, are non-trivial, I think now.

>>
This is very much one of the hardest parts, Jonathan.
Also, thinking about the kinds of services that users want from this data, we've
found the biggest need is to focus on citation references if you can get them. 
(e.g. ISI)
And if you think the bibliographic metadata is poor quality, try
matching on brief reference metadata (that which doesn't contain unique 
identifiers, of course.)
Complex fuzzy string matching and it still is never really great.
(this is part of the problem with cite counts being all over the map in the the 
apps out there!)

My words to the wise are to NOT do local loading unless you have a lot of time 
and money.
Vendors who are doing it have economies of scale.  Individual institutions 
typically
do not.  If the community were to make agreements to have centralized management
at a few institutions for this kind of "open" dataset, maybe. But, as someone 
noted, the middle-men
("value add" A&I producers - Thompson, EBSCO, etc.) are not going to love this 
idea.

Miriam Blake
Los Alamos National Laboratory Research Library


Re: [CODE4LIB] DIY aggregate index

2010-06-30 Thread Blake, Miriam E
We are one of those institutions that did this -negotiated for lots of content 
YEARS ago (before the providers really knew what
they or we were in for.)

We have locally loaded records from the ISI databases, INSPEC, BIOSIS, and the 
Department of Energy (as well as from full-text
publishers, but that is another story and system entirely.)  Aside from the 
contracts, I can also attest to the major amount of
work it has been.  We have 95M bibliographic records, stored in >  75TB of 
disk, and counting.  Its all running on SOLR, with a local interface
and the distributed aDORe repository on backend.   ~ 2 FTE keep it running in 
production now.

Over the 15 years we've been loading this, we've had to migrate it 3 times, and 
deal with all the dirty metadata, duplication,
and other difficult issues around scale and lack of content provider "interest" 
in supporting the few of us who do this kind of stuff.
We believe we have now achieved a standardized format (MPEG-21 DIDL and MARCXML 
with some other standards mixed in) and accessible
through protocol-based services (OpenURL, REST, OAI-PMH), etc. so that we hope 
we won't have to mess with the data records
again and can move on to other more interesting things.

It is nice to have, very fast - very much beats federated search -  and allows 
us (finally) to begin to build neat services (for licensed users only!)  Data 
mining?
Of course a goal, but talk about sticky areas of contract negotiation.  And in 
the end, you never have everything someone
needs when they want all content about something specific.  And yes, local 
loading is expensive, for a lot of reasons.

Ex Libris, Summon, etc. are now getting into the game from this angle.  We will 
so feel their pain, but I hope technology
and content provider engagement have improved to make it a bit easier for them! 
 And it definitely adds a level of usability
much improved over federated search.

My .02,

Miriam Blake
Los Alamos National Laboratory Research Library




On 6/30/10 3:20 PM, "Rosalyn Metz"  wrote:

i know that there are institutions that have negotiated contracts for just
the content, sans interface.  But those that I know of have TONS of money
and are using a 3rd party interface that ingests the data for them.  I'm not
sure what the terms of that contract were or how they get the data, but it
can be done.



On Wed, Jun 30, 2010 at 5:07 PM, Cory Rockliff wrote:

> We're looking at an infrastructure based on Marklogic running on Amazon
> EC2, so the scale of data to be indexed shouldn't actually be that big of an
> issue. Also, as I said to Jonathan, I only see myself indexing a handful of
> highly-relevant resources, so we're talking millions, rather than 100s of
> millions, of records.
>
>
> On 6/30/2010 4:22 PM, Walker, David wrote:
>
>> You might also need to factor in an extra server or three (in the cloud or
>> otherwise) into that equation, given that we're talking 100s of millions of
>> records that will need to be indexed.
>>
>>
>>
>>> companies like iii and Ex Libris are the only ones with
>>> enough clout to negotiate access
>>>
>>>
>> I don't think III is doing any kind of aggregated indexing, hence their
>> decision to try and leverage APIs.  I could be wrong.
>>
>> --Dave
>>
>> ==
>> David Walker
>> Library Web Services Manager
>> California State University
>> http://xerxes.calstate.edu
>> 
>> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Jonathan
>> Rochkind [rochk...@jhu.edu]
>> Sent: Wednesday, June 30, 2010 1:15 PM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] DIY aggregate index
>>
>> Cory Rockliff wrote:
>>
>>
>>> Do libraries opt for these commercial 'pre-indexed' services simply
>>> because they're a good value proposition compared to all the work of
>>> indexing multiple resources from multiple vendors into one local index,
>>> or is it that companies like iii and Ex Libris are the only ones with
>>> enough clout to negotiate access to otherwise-unavailable database
>>> vendors' content?
>>>
>>>
>>>
>> A little bit of both, I think. A library probably _could_ negotiate
>> access to that content... but it would be a heck of a lot of work. When
>> the staff time to negotiations come in, it becomes a good value
>> proposition, regardless of how much the licensing would cost you.  And
>> yeah, then the staff time to actually ingest and normalize and
>> troubleshoot data-flows for all that stuff on the regular basis -- I've
>> heard stories of libraries that tried to do that in the early 90s and it
>> was nightmarish.
>>
>> So, actually, I guess i've arrived at convincing myself it's mostly
>> "good value proposition", in that a library probably can't afford to do
>> that on their own, with or without licensing issues.
>>
>> But I'd really love to see you try anyway, maybe I'm wrong. :)
>>
>>
>>
>>> Can I assume that if a database vendor has exposed their content to me
>>> as a subscri