Thanks.  I'll see if this helps. 

I'm sure IE was used to view the files 4.5 years ago. I don't think I looked at 
them, but we had super employees (recent grads from library school) that worked 
with the files and I trust that they would have noticed problems.  

Fortunately we only have 7 of these to try to fix. 

Wendy

-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of Jon 
Gorman
Sent: Monday, December 09, 2013 3:17 PM
To: [email protected]
Subject: Re: [CODE4LIB] problem in old etd xml files

A lot of modern systems won't load entities (or will limit it somehow) because 
of the denial of service attack that is possible.  Look for XML Entity 
Reference Denial of Service. I can't remember if Public declarations are 
treated any differently than System ones. (I would have suspected it to trust 
SYSTEM ones more, but they'd still be exploitable by the same bug).


(There's also a fair number of other errors, I'm somewhat skeptical that the 
example worked on many browsers even then. It's possible IE was flexible enough 
it would have worked).

One thing you might want to do is is take out the entities.

I can't remember why I had to do this, but xmllint seemed to do the trick.
( I found a snippet at
http://stackoverflow.com/questions/614067/how-to-resolve-all-entity-references-in-xml-and-create-a-new-xml-in-c,
but it' smissing the necessary --loaddtd)

xmllint --loaddtd --noent --dropdtd FRONT.xml > FRONT_nodtdent.xml

I mean, you don't need the dtd for validation, particularly since I suspect 
given the errors it may not validate anyhow.

It might make the files a little harder to read when reading the raw source, 
but I suspect that's not typically a problem.

Jon Gorman
University of Illinois



On Mon, Dec 9, 2013 at 2:10 PM, Robertson, Wendy C < [email protected]> 
wrote:

> Back in 1999-2002 a handful of our theses were submitted  as a 
> collection of xml files.  We posted the files in our repository 
> several years ago (we posted a zipped folder with all the files).  At 
> that time, if you opened front.xml you would be able to access the 
> thesis. We have not touched the files in the close to 5 years since we 
> posted them, but the files no longer open correctly. One of the problem 
> theses is http://ir.uiowa.edu/etd/189/.
>
> Front.xml begins
> <?xml version="1.0" encoding="UTF-8"?> <?xml:stylesheet 
> type="text/css" href="UIowa2K1.css" ?> <!DOCTYPE thesis SYSTEM 
> "UIowa2K.dtd">
>
> I have tried the following changes but they do not help
>
> 1)      Adding standalone="no"? to the xml declaration  -- <?xml
> version="1.0" " encoding="UTF-8" standalone="no"?>
>
> 2)      Changing the case of "UIowa2K1.css" and "UIowa2K.dtd" to match the
> files (which are in all caps)
>
> 3)      Changing xml:stylesheet to xml-stylesheet
>
> Chrome shows errors that entities are not defined, but they are 
> defined in the dtd.
>
> I would appreciate any assistance in making these documents available 
> again. Thanks!
>
> Wendy Robertson
> Digital Scholarship Librarian *  The University of Iowa Libraries
> 1015 Main Library  *  Iowa City, Iowa 52242 [email protected] 
> * 319-335-5821
>

Reply via email to