Also, specifically:

  1.  Using an XML database like eXist-db or BaseX with XPath/XQuery was 
invaluable when doing analysis of issues and of the impact of changes
  2.  One of the tools I wrote, the EAD Checker, is available online: 
https://eadchecker.lib.harvard.edu – it doesn’t catch this specific issue, but 
it does catch a bunch of issues, some of which cause corrupted data rather than 
failure to import.

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: <archivesspace_users_group-boun...@lyralists.lyrasis.org> on behalf of 
"Mayo, Dave" <dave_m...@harvard.edu>
Reply-To: Archivesspace Users Group 
<archivesspace_users_group@lyralists.lyrasis.org>
Date: Thursday, June 18, 2020 at 9:23 AM
To: Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

So, with the caveat that we put a lot of resources (a bunch of archivists’ 
time, a full year of a full time developer (me!)), we had very solid results; I 
think remediating issues prior to import is almost always worth the expense of 
significant effort, particularly over a large corpus.

My main advice would be to be very, very careful about changes – version your 
EADs, compare before and after scripts run, and in general be very systematic 
about how you find, report, and correct changes.

I don’t know if you’ve seen it, but Kate Bowers and I did a write-up of what we 
did during our migration – it has links to a number of open source tools I 
wrote for doing this kind of work.  They’re a bit involved to get running, but 
they definitely work at basically any scale out there, and I’m happy to help 
people get started with them.  
https://journal.code4lib.org/articles/12239<https://urldefense.proofpoint.com/v2/url?u=https-3A__journal.code4lib.org_articles_12239&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE&m=MDvEtnIJJpOOfJzfDMsXF5u8QJ22oJqGB1UWDHD9Gmc&s=0ky2pQ2HoOxy34kpHGjThpBcFVj1ERUBf7LwbRZMMP4&e=>

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: <archivesspace_users_group-boun...@lyralists.lyrasis.org> on behalf of 
"Lucas, Dawne Howard" <dawne_lu...@unc.edu>
Reply-To: Archivesspace Users Group 
<archivesspace_users_group@lyralists.lyrasis.org>
Date: Thursday, June 18, 2020 at 9:12 AM
To: Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

Thanks, Dave.  I guess I should have specified that changing the EAD isn’t a 
viable solution for us unless it’s automated. We do not plan to edit individual 
finding aids manually except in cases where the ranges aren’t regular.

If you’ve done this at Harvard, have there been any drawbacks? Anything we 
should be looking to avoid?

Thanks again,

Dawne


From: Mayo, Dave<mailto:dave_m...@harvard.edu>
Sent: Thursday, June 18, 2020 9:04 AM
To: Archivesspace Users 
Group<mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

The two options I see here are essentially:

1. Change the EAD
2. Change the containers after they’re ingested.

Of the two, changing the EAD seems _easier_ to me; if you wouldn’t mind going 
more into why that’s not a viable solution for you, it might help us provide 
better advice?

Either way, at 7000 finding aids, the solution would basically need to be 
automated – if your box ranges are very regular (i.e. only single number or 
range, no “3,4,7-10” or similar), it wouldn’t be too difficult – split the 
range on ‘-‘, generate list of numbers, replace container with multiple 
containers.
--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: <archivesspace_users_group-boun...@lyralists.lyrasis.org> on behalf of 
"Lucas, Dawne Howard" <dawne_lu...@unc.edu>
Reply-To: Archivesspace Users Group 
<archivesspace_users_group@lyralists.lyrasis.org>
Date: Thursday, June 18, 2020 at 8:13 AM
To: Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org>
Subject: [Archivesspace_Users_Group] Top container ranges


Hi all,



We are formulating a plan to import our 7000+ EAD finding aids into 
ArchivesSpace and are wondering how other institutions have handled top 
container ranges.



For example, we have finding aids coded like this:



<c02><did><container type="box" 
label="Box">3-4</container><unittitle>Photographs</unittitle></did></c02>



This imports into ASpace just fine (yay!), but of course also creates a top 
container for Box 3-4 instead of Box 3 and Box 4 (boo!). We assume this will be 
an issue later when we integrate with Aeon.



The most obvious solution to this problem appears to be to change the encoding 
to:



<c02><did><container type="box" 
label="Box">3</container><unittitle>Photographs</unittitle></did></c02>



<c02><did><container type="box" label="Box">4 
</container><unittitle>Photographs</unittitle></did></c02>



For several reasons, this is not a viable solution for us. Have other 
institutions figured out a way to deal with this issue that does not include 
editing the EAD in individual finding aids?

Thanks for your help,

Dawne

--
Dawne Howard Lucas (she/her/hers)
Technical Services Archivist

Wilson Special Collections Library
200 South Road, CB #3926
Chapel Hill, NC 27515
The University of North Carolina at Chapel Hill
P  919-966-1776   E  dawne_lu...@unc.edu<mailto:dawne_lu...@unc.edu>

[cid:image001.png@01D5F200.0D957C80]<https://urldefense.proofpoint.com/v2/url?u=https-3A__library.unc.edu_wilson_&d=DwMFAg&c=WO-RGvefibhHBZq3fL85hQ&r=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE&m=tkJE1JdGvSoNb5i6NSRbF3z1n28dGeVJ4ogcFmpTpQo&s=e9r4LIAN87oWg7LLTrzui9bCYcCMX-8twYfh3y0I8tY&e=>



_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

Reply via email to