Hi, Ethan.

You can see another example of blacklight being used to search and display EAD 
guides at 

http://nwda.projectblacklight.org/?f%5Bformat_facet%5D%5B%5D=Archival+Collection+Guide

I've used solr and/or lucene for EAD documents a few times, and here are some 
observations: 

> I've also heard about scalability issues with Solr and large XML documents,
> but I've never seen benchmarks.

Solr is incredibly scalable, so describing this as a solr scalability issue 
isn't really accurate. What might be more accurate would be to say that Solr is 
designed for searching, while most people looking for an EAD solution are 
trying to get it to do a lot more than that. The problem is that you want to be 
able to discover and view an EAD guide at several levels, right? You want to be 
able to discover at the collection level, and at the item level, and presumably 
at the level of some section of the EAD document (e.g., biographical history or 
whatever). Solr and lucene really just know how to tell you whether a given 
document in the index matches a query you've entered, though, so if you want to 
be able to discover on each of those levels, you have to index your document 
once to represent the collection, then again for each section you want to be 
independently discoverable, then again for each item you want to be 
discoverable. Creating a UI that is going to represent a sing!
 le EAD, which has now been transformed into potentially hundreds or thousands 
of independently discoverable items and EAD sections is quite challenging. I 
liked what Matt Mitchell and I did for the Northwest Digital Archives, but I'm 
always interested in other ways one might approach this. 

We indexed each EAD guide into separate lucene documents for each EAD section, 
then collapsed them under the main EAD title in the search results, so that 
when you search for an archival collection you only see the EAD guide 
represented once, but each section of it is still independently viewable and 
bookmarkable:

Here is the guide for the Bing Crosby Historical Society in a search result:

http://nwda.projectblacklight.org/catalog?q=crosby&qt=search&per_page=10&f%5Bformat_facet%5D%5B%5D=Archival+Collection+Guide&commit=search

But in order to look at the guide, you have to look at a specific part of it: 
http://nwda.projectblacklight.org/catalog/bcc_1-summary

Additionally, we treated each item as a first class independently discoverable 
object, but still linked them all to the section of the EAD document where they 
came from:

http://nwda.projectblacklight.org/catalog/bcc_1-v

Matt and I were thinking it would be nice to allow blacklight to handle all of 
the display of the EAD too, which is why we stored a lot of EAD markup in the 
solr document, and that can potentially have scalability problems, because 
lucene is not a database but we were treating it like one. This works, but it's 
a bit of a hack. 

Bess

Reply via email to