Thanks for providing detailed context about how specific versions of AS have 
handled XML, Andrew.  We are currently running 2.8.1.  Yes, I would like to try 
your fix.*  I would like to install it on our development instance and see what 
happens.  When you have a moment, please let me know how I would go about 
implementing your code.

*I don't know anything about EAD encoding, so remain uncertain whether or not 
these apparently missing semicolons count as EAD-compliant.

Kyle Breneman
Integrated Digital Services Librarian
The University of Baltimore
kbrene...@ubalt.edu<mailto:kbrene...@ubalt.edu>
I believe in freedom of thought and
freedom of speech. Do you?

From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
<archivesspace_users_group-boun...@lyralists.lyrasis.org> On Behalf Of Andrew 
Morrison
Sent: Friday, January 7, 2022 5:19 AM
To: archivesspace_users_group@lyralists.lyrasis.org
Subject: Re: [Archivesspace_Users_Group] Advice on what to look for in AS logs 
after printing error?

[EXTERNAL EMAIL: This message originated from a non-UBalt email system. Hover 
over any links before clicking and use caution when opening attachments.]


My fix should prevent all "Failed to clean XML: The reference to entity..." 
errors triggered by EAD-compliant encoding. But, depending on what version of 
ArchivesSpace you are running, it may only make a difference in niche cases. As 
I understand it (although Blake may wish to correct me if I am wrong) the 
timeline is this:

Up to 2.7.0, PDFs generated by the PUI did not fail in this precise way, at 
least not if your records used EAD-compliant encoding of characters such as 
ampersands, greater-than, less-than, etc.

In 2.7.1, a change was made to allow people to include a HTML entity reference, 
specifically the one for non-breaking spaces (&nbsp;) in their records. That is 
not strictly EAD-compliant encoding, but some people use them for formatting 
purposes, or because their records are converted from old web pages. But that 
broke generation of PDFs for records containing EAD-compliant encoding of 
ampersands which happened to be immediately followed by an uppercase letter 
(e.g. "B&amp;W").

In 2.8.1, the case of ampersands immediately followed by an uppercase letter 
was fixed, but PDFs will still fail if a record contains an ampersand 
immediately followed by a character which isn't an ASCII upper or lowercase 
alphabetic character or space. The specific case I've encountered is numbers in 
citations of printed resources (e.g. "Vols. 1&amp;2") but it could also happen 
with UTF-8 characters outside the ASCII range.

Now, my proposed fix would, I believe, prevent PDFs from breaking whatever 
immediately follows an ampersand. Also potentially other problems such as 
records containing &lt; in certain contexts. Admittedly these are rare, but if 
you've got enough records they will occur somewhere, and they are fiendishly 
difficult to track down.

So, if you are running 2.7.1 or 2.8.0, and you are sure that your records only 
contain things like "B&amp;W", and never things like "Vols. 1&amp;2", then 
upgrading to 2.8.1 or higher would probably fix your problem.

If you're already running 2.8.1 or higher, my fix is currently untested by 
anyone but me, but if you want to give it a try, let me know.

Andrew.


On 06/01/2022 14:28, Kyle Breneman wrote:
Andrew, thank you for taking the time to point me to your Github fix.  I do see 
the "Failed to clean XML" error in my logs, but in each case it is seemingly 
upset about missing semicolons: "Failed to clean XML: The reference to entity 
"W" must end with the ';' delimiter."

If I understand your Github repo code, it is narrowly targeted at dealing with 
situations where &amp is immediately followed by a digit, and so would not help 
in my situation.  Have I got that right?

Kyle Breneman
Integrated Digital Services Librarian
The University of Baltimore
kbrene...@ubalt.edu<mailto:kbrene...@ubalt.edu>
I believe in freedom of thought and
freedom of speech. Do you?

From: 
archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>
 
<archivesspace_users_group-boun...@lyralists.lyrasis.org><mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>
 On Behalf Of Andrew Morrison
Sent: Thursday, January 6, 2022 5:14 AM
To: 
archivesspace_users_group@lyralists.lyrasis.org<mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Advice on what to look for in AS logs 
after printing error?

[EXTERNAL EMAIL: This message originated from a non-UBalt email system. Hover 
over any links before clicking and use caution when opening attachments.]


If you do see that "Failed to clean XML" message in the logs, then you might be 
interested in this pull request I submitted recently:



https://github.com/archivesspace/archivesspace/pull/2553



I could put the same fix into the form of a plug-in, if that is what you are 
seeing, you have the ability to install plug-ins, and you are running 2.7.1 or 
newer.



It might be a different markup issue, but in my experience the logs never tell 
you which archival object the problem is in. It cannot, because by that point 
it has converted the collection into a temporary HTML file, which is the 
intermediate step before converting to PDF. You could try exporting as EAD from 
the staff interface, then validating in an XML editor, but if the issue is 
something which is valid in EAD, then it can be very difficult to trace. If you 
have a local development instance of ArchivesSpace, you can modify the code so 
it doesn't delete the temporary HTML files, then validate those.



Andrew.




On 05/01/2022 18:14, Blake Carver wrote:
It's going to be a bit of looking for a bunch of needles in a very short hay 
stack kinda thing.
The errors should have either FATAL or ERROR and something about pdf around 
there somewhere. Sometimes there will be allotta other FATAL and ERROR around, 
so you'll need to narrow it down based on what each one says.
You could also look for "92" "126" and "21" I think the resource number should 
show up around the error as well.
Also wouldn't surprise me to see this error in particular, but not always:


RuntimeError (Failed to clean XML: The entity name must immediately follow the 
'&' in the entity reference.):

________________________________
From: 
archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>
 
<archivesspace_users_group-boun...@lyralists.lyrasis.org><mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>
 on behalf of Kyle Breneman <kbrene...@ubalt.edu><mailto:kbrene...@ubalt.edu>
Sent: Wednesday, January 5, 2022 12:36 PM
To: Archivesspace Users Group 
<archivesspace_users_group@lyralists.lyrasis.org><mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Advice on what to look for in AS logs 
after printing error?


Thank you for that reminder, Blake!  Another question: the print action was 
being run from the following pages.  Wouldn't clicking the AS print button 
itself register in the logs?  If so, how could I efficiently find those lines?



https://archivesspace.ubalt.edu/repositories/2/resources/92

https://archivesspace.ubalt.edu/repositories/2/resources/126

https://archivesspace.ubalt.edu/repositories/2/resources/21



Kyle Breneman

Integrated Digital Services Librarian

The University of Baltimore

kbrene...@ubalt.edu<mailto:kbrene...@ubalt.edu>

I believe in freedom of thought and

freedom of speech. Do you?



From: 
archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>
 
<archivesspace_users_group-boun...@lyralists.lyrasis.org><mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>
 On Behalf Of Blake Carver
Sent: Wednesday, January 5, 2022 12:31 PM
To: Archivesspace Users Group 
<archivesspace_users_group@lyralists.lyrasis.org><mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Advice on what to look for in AS logs 
after printing error?



[EXTERNAL EMAIL: This message originated from a non-UBalt email system. Hover 
over any links before clicking and use caution when opening attachments.]



grep the logs for  ERROR or FATAL

________________________________

From: 
archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>
 
<archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of Kyle Breneman <kbrene...@ubalt.edu<mailto:kbrene...@ubalt.edu>>
Sent: Wednesday, January 5, 2022 12:28 PM
To: 
archivesspace_users_group@lyralists.lyrasis.org<mailto:archivesspace_users_group@lyralists.lyrasis.org>
 
<archivesspace_users_group@lyralists.lyrasis.org<mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] Advice on what to look for in AS logs 
after printing error?



Our archives staff have noticed that AS tends to get hung up when users click 
the Print button on some of our largest collections.  Campus IT tested this 
today.  The server did not hang for them, but the print action also did not 
complete.  They got a very, very generic error message (attached).



I have access to the ArchivesSpace files on the server, including the /logs 
directory, but I'm not sure how to parse the logs for clues.  Does anyone have 
advice for how I can sift through the logs?



Kyle Breneman

Integrated Digital Services Librarian

The University of Baltimore

kbrene...@ubalt.edu<mailto:kbrene...@ubalt.edu>

I believe in freedom of thought and

freedom of speech. Do you?






_______________________________________________

Archivesspace_Users_Group mailing list

Archivesspace_Users_Group@lyralists.lyrasis.org<mailto:Archivesspace_Users_Group@lyralists.lyrasis.org>

http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group



_______________________________________________

Archivesspace_Users_Group mailing list

Archivesspace_Users_Group@lyralists.lyrasis.org<mailto:Archivesspace_Users_Group@lyralists.lyrasis.org>

http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

Reply via email to