To index and display did-level EAD elements,
Or to index finding aids as a whole.
That is the question.
Seriously, this discussion surrounding the indexing and display of EAD files is
extraordinarily timely, but the loose consensus on how to do it does not jive
with my experience. In short, I have been told by my archivist friends that I
need to index and display each and every did-level element in my EAD files, and
then provide a link to the finding aid as a whole. Let me explain.
Here at Notre Dame we are leading an effort we colloquially call the "Catholic
Portal". [1] We use VUFind as our "discovery system" and thus Solr as the
underlying indexer. Much of the metadata I index is MARC-based, but
increasingly it is and will be EAD-based. Using VUFind to index MARC records is
well-understood. Until only very recently has it been truly feasible to index
content other than MARC, such as EAD. A few months ago time was spent parsing
EAD and stuffing it into the underlying Solr index. We took metadata from the
EAD header and mapped it to Solr fields. We then free text indexed the balance.
Thus searches for anything found in the EAD was returned complete with EAD
title, author, etc. Links to the original EAD were then provided. The process
functioned, but it was not deemed good enough by the archivists in the crowd.
As you know, EAD files are not structured like most MARC records. An EAD file
represents an entire collection. Within that collection there may be
sub-collections upon sub-collections. While the EAD's header and archdesc
element may describe the collection as a whole, the sub-level and nested did
elements are the real meat of the matter. Free text searches over the entire
EAD that only return the over-arching metadata do not put search results in
context, even if one does provide links out to the full finding aid. Instead
(ideally), each and every did needs to be indexed and displayed in search
results. Moreover (ideally), these search results need to be displayed in their
hierarchal relationship with the balance of the EAD file.
We began work to implement this (ideal) solution [2], but the developer went on
to a more permanent job here on campus.
Here is what I plan to do:
1. acquire EAD files from "Catholic Portal" participants
2. cache them locally
3. pre-process each EAD making sure they have eadid elements
4. pre-procees each EAD making sure each did element contains
a unitid element, and if they don't then assign them one
5. store and index each EAD file in Archon [3]
6. parse each did from each EAD file and integrate the result
into the VUFind/Solr index along with the MARC metadata
7. use VUFind as the primary interface to the "Catholic Portal"
8. use Archon as the means for displaying and navigating EAD files
9. go to Step #1
Actually, my plan is not very much different from everybody else's plan. I'm
using Solr as my indexer but the VUFind/Solr schema instead of Blacklight's.
For simplicity's sake, I'm using Archon for storing/displaying my EAD instead
of Fedora. (You say tomāto. I say tomäto. [4]) The most significant difference
is the level at which I am expected to index and display the EAD files. I see a
whole lot of XPath queries in my future.
[1] Catholic Portal - http://www.catholicresearch.net
[2] indexing EAD -
http://serials.infomotions.com/code4lib/archive/2010/201007/1957.html
[3] Archon - http://www.archon.org/
[4] (Don't ya just gotta love Unicode.)
--
Eric Lease Morgan