Re: [CODE4LIB] EAD in Blacklight (was: Re: [CODE4LIB] Batch loading in fedora)

Eric Lease Morgan Mon, 09 Aug 2010 16:21:47 -0700

  To index and display did-level EAD elements,
  Or to index finding aids as a whole.
  That is the question.


Seriously, this discussion surrounding the indexing and display of EAD files is 
extraordinarily timely, but the loose consensus on how to do it does not jive 
with my experience. In short, I have been told by my archivist friends that I 
need to index and display each and every did-level element in my EAD files, and 
then provide a link to the finding aid as a whole. Let me explain.

Here at Notre Dame we are leading an effort we colloquially call the "Catholic 
Portal". [1] We use VUFind as our "discovery system" and thus Solr as the 
underlying indexer. Much of the metadata I index is MARC-based, but 
increasingly it is and will be EAD-based. Using VUFind to index MARC records is 
well-understood. Until only very recently has it been truly feasible to index 
content other than MARC, such as EAD. A few months ago time was spent parsing 
EAD and stuffing it into the underlying Solr index. We took metadata from the 
EAD header and mapped it to Solr fields. We then free text indexed the balance. 
Thus searches for anything found in the EAD was returned complete with EAD 
title, author, etc. Links to the original EAD were then provided. The process 
functioned, but it was not deemed good enough by the archivists in the crowd.

As you know, EAD files are not structured like most MARC records. An EAD file 
represents an entire collection. Within that collection there may be 
sub-collections upon sub-collections. While the EAD's header and archdesc 
element may describe the collection as a whole, the sub-level and nested did 
elements are the real meat of the matter. Free text searches over the entire 
EAD that only return the over-arching metadata do not put search results in 
context, even if one does provide links out to the full finding aid. Instead 
(ideally), each and every did needs to be indexed and displayed in search 
results. Moreover (ideally), these search results need to be displayed in their 
hierarchal relationship with the balance of the EAD file.

We began work to implement this (ideal) solution [2], but the developer went on 
to a more permanent job here on campus.

Here is what I plan to do:

  1. acquire EAD files from "Catholic Portal" participants
  2. cache them locally
  3. pre-process each EAD making sure they have eadid elements
  4. pre-procees each EAD making sure each did element contains
     a unitid element, and if they don't then assign them one
  5. store and index each EAD file in Archon [3]
  6. parse each did from each EAD file and integrate the result
     into the VUFind/Solr index along with the MARC metadata
  7. use VUFind as the primary interface to the "Catholic Portal"
  8. use Archon as the means for displaying and navigating EAD files
  9. go to Step #1

Actually, my plan is not very much different from everybody else's plan. I'm 
using Solr as my indexer but the VUFind/Solr schema instead of Blacklight's. 
For simplicity's sake, I'm using Archon for storing/displaying my EAD instead 
of Fedora. (You say tomāto. I say tomäto. [4]) The most significant difference 
is the level at which I am expected to index and display the EAD files. I see a 
whole lot of XPath queries in my future.


[1] Catholic Portal - http://www.catholicresearch.net
[2] indexing EAD - 
http://serials.infomotions.com/code4lib/archive/2010/201007/1957.html
[3] Archon - http://www.archon.org/
[4] (Don't ya just gotta love Unicode.)

-- 
Eric Lease Morgan

Re: [CODE4LIB] EAD in Blacklight (was: Re: [CODE4LIB] Batch loading in fedora)

Reply via email to