Mark,

Thanks for the reply.  The information is being pulled from a MySQL database.  
These are old ETD entries that were entered into the system by students.  We 
are pulling the specific fields to create the Dublin_core.xml file ingest file. 
 


-Dale

-----Original Message-----
From: Mark H. Wood,UL 0115A,+1 317 274 0749, <mw...@iupui.edu> On Behalf Of 
Mark H. Wood
Sent: Monday, May 18, 2020 10:18 AM
To: DSpace Technical Support <dspace-tech@googlegroups.com>
Subject: Re: [dspace-tech] CDATA use for imports

On Mon, May 18, 2020 at 01:11:17PM +0000, Poulter, Dale wrote:
> We are migrating several items from an older system to DSpace using the 
> simple item import.  As is often the case with older systems,  the data is 
> not as clean as we would like.  As a result several items fail due to bad 
> html (open tags no closing tags, and a few diacritic issues).  One way to 
> allow the data to migration is to wrap the text in <![CDATA[[ ....]]> .  
> However, it appears the import ignores anything in the CDATA section.  Is 
> this expected behavior?

I assume that it was a typo, but a CDATA section opens with "<![CDATA[" not 
"<![CDATA[[".

Are you talking about the content files or the metadata?  IOW would you 
describe the problem more thoroughly.

A tool like HTML Tidy might help if you are ingesting HTML files.

For metadata, you should know that only some fields will be interpreted as 
HTML, and in those only a subset of HTML is processed.
I have a small and slowly growing set of substitution rules wired into my batch 
ingestion process, to take care of things like naked left brokets and "R&D".

--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

--
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/20200518151820.GC16830%40IUPUI.Edu.

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/BN8PR08MB6163157EBB1206F7FAFBF63AFEB80%40BN8PR08MB6163.namprd08.prod.outlook.com.

Reply via email to