[CODE4LIB] eebo

Eric Lease Morgan Fri, 05 Jun 2015 05:31:25 -0700

Does anybody here have experience reading the SGML/XML files representing the 
content of EEBO?


I’ve gotten my hands on approximately 24 GB of SGML/XML files representing the 
content of EEBO (Early English Books Online). This data does not include page 
images. Instead it includes metadata of various ilks as well as the transcribed 
full text. I desire to reverse engineer the SGML/XML in order to: 1) provide an 
alternative search/browse interface to the collection, and 2) support various 
types of text mining services. 

While I am making progress against the data, it would be nice to learn of other 
people’s experience so I do not not re-invent the wheel (too many times). ‘Got 
ideas?

—
Eric Lease Morgan
University Of Notre Dame

[CODE4LIB] eebo

Reply via email to