Trevor,

That would be fantastic!!!!!

Benn,

For your PDF error, I think that might be caused by a slightly different issue. 
 The new PUI “Print to PDF” process converts that ArchivesSpace JSON record to 
HTML and then converts that HTML into a PDF file.  So, it doesn’t use the same 
JSON --> EAD --> PDF process as the staff interface.  I’m assuming that a small 
tweak to this file 
https://github.com/archivesspace/archivesspace/blob/master/public/app/lib/xml_cleaner.rb
 might allow it to still create the PDF successfully (assuming that 
ArchivesSpace would want the application to handle both “b&w” and “b&w”, 
which might not be the case).

We should log this issue in JIRA at some point, regardless, just so that it’s 
captured there.  I don’t have time to do that right now, but I did update one 
of the files in the sandbox to illustrate the problem.  Here it is: 
http://public.archivesspace.org/repositories/2/resources/1008/


  *   Before I added the lone collection-level note to this record, the PDF 
printed fine.
  *   Once I added a note of “b&w”, it failed.
  *   When I change the note to “b&w”, the PDF file works....
  *   It also still displays fine in the PUI, which might mean that the problem 
that I noted in my previous message only occurs when the note is in one of 
those “see more” / “see less” sections.

Mark




From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
[mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org] On Behalf Of 
Trevor Thornton
Sent: Friday, 22 September, 2017 9:45 AM
To: Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] ampersand issue with PDF button in 
2.1.2 public interface

The logic for converting ampersands in the EAD exporter is to only convert them 
if they are immediately followed by a space, otherwise they are assumed to be 
an entity. This is part of the process of sanitizing mixed content, which is 
actually applied to most fields. However, the ampersand conversion is included 
in the routine that handles line breaks (converting 2 line breaks into 
paragraphs as appropriate), and this is only applied to fields for which the 
corresponding EAD tag allows <p> as a child, which excludes untititle, 
abstract, etc.

There's no good reason that I can think of why the ampersand conversion should 
be restricted in this way, so it can probably be moved to apply more broadly. 
Unfortunately, since the new EAD3 exporter is based on the existing EAD 
exporter, this problem persists in the EAD3 exporter, because I didn't really 
notice it until now. I'll try to fix it in both places and do a pull request.

On Fri, Sep 22, 2017 at 8:48 AM, Mayo, Dave 
<dave_m...@harvard.edu<mailto:dave_m...@harvard.edu>> wrote:
Hi Benn,

This is a recurring issue I hit over both Harvard and Smith’s collections – 
it’s a consequence of ASpace not really having a distinction between mixed 
content and plaintext content.

Unfortunately, there isn’t really a good solution.  The best solution as far as 
I’ve been able to figure is to use HTML/XML entity for ampersand (&amp;) 
wherever it appears in a context that’s treated by the interface/etc as markup; 
title fields _definitely_ fall under that category.  There’s unfortunately no 
reliable guide to what fields are “mixed content” and what fields are 
“plaintext content” because, well, the underlying system doesn’t track that 
distinction – it’s up to how the fields are eventually displayed/used to build 
exports/etc.

As to _how_ to fix it – well, it depends somewhat on whether you can be 
ABSOLUTELY SURE you don’t have any HTML/XML entities in your title fields.  If 
you are ABSOLUTELY SURE of this, you should be able to make the change via API 
or on the SQL level, but if you DO have entities, it gets a lot harder, to the 
point where manual review is probably appropriate.
- Dave Mayo
ASpace Core Committer’s Group Member
From: 
<archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of Benn Joseph 
<benn.jos...@northwestern.edu<mailto:benn.jos...@northwestern.edu>>
Reply-To: Archivesspace Users Group 
<archivesspace_users_group@lyralists.lyrasis.org<mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Date: Thursday, September 21, 2017 at 4:21 PM
To: Archivesspace Users Group 
<archivesspace_users_group@lyralists.lyrasis.org<mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] ampersand issue with PDF button in 2.1.2 
public interface


Hi all,

We've encountered an issue with the v2.1.2 Print-to-PDF button in the public 
interface--apparently for any resource record with an ampersand that is 
followed immediately by another character that is not a space (e.g. "b&w" or 
"AT&T"), the ampersand is misinterpreted and causes the Print-to-PDF button to 
fail with an error. For me, that error is just "something went wrong", but the 
log shows this (when it gets tripped up on "b&w"):



RuntimeError (Failed to clean XML: The reference to entity "w" must end with 
the ';' delimiter.):



So we're guessing ArchivesSpace is thinking "&w" should be "&w;", and so forth 
for any other string of text with an ampersand. I checked this by going into a 
record that wouldn’t print and changing the lone suspect ampersand (“AT&T” to 
“AT and T”) and the PDF generated just fine.



This doesn't impact being able to just view resource records in the public 
interface, it's just the PDF function that isn't working. It's a problem, 
though, because we want to be able to use that PDF functionality but we also 
have a lot of ampersands in our resource records! Has anyone else experienced 
this issue or possibly come up with a fix?



Thanks,

--Benn

Benn Joseph
Head of Archival Processing
Northwestern University Libraries
Northwestern University
www.library.northwestern.edu<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.library.northwestern.edu&d=DwMFAg&c=WO-RGvefibhHBZq3fL85hQ&r=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE&m=m73cREghXWiIzy9ulXvIZW1Mx-NoJoH_rB1LSdzHQ6Q&s=Xj5cFVS13R-ioWYCsYqxItOviZziBf6vpg_FBhiC1c4&e=>
benn.jos...@northwestern.edu<mailto:benn.jos...@northwestern.edu%0d>
847.467.6581<tel:(847)%20467-6581>


_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org<mailto:Archivesspace_Users_Group@lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group<https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=DwMFaQ&c=cjytLXgP8ixuoHflwc-poQ&r=7Ez68qVcrmRD6nn1FqwoHBDEOxeRUCPm3xGvnFT0zjU&m=ASih4rgNxMJp6m5dmeL1mVOoGt7cOqauO1OOOVMFVCk&s=sm3mG9h_xfuV90zjlyp023Pp-FWQnLOXr7crqUw-g3k&e=>



--
Trevor Thornton
Applications Developer, Digital Library Initiatives
North Carolina State University Libraries
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

Reply via email to