Re: [CODE4LIB] Looking for a Word to EAD converter

2010-10-09 Thread Jason Ronallo
Daniel,
I don't have a solution directly from Word. If you can get the
container lists converted to a spreadsheet and exported as CSV, then a
Ruby gem I wrote might help [1]. It accepts a properly headed CSV file
and exports valid EAD XML. The implementation is basic right now, but
I'll be extending it soon to deal with more of the elements that
Archivists Toolkit can handle importing.

I also started on a Rails 3 web application which uses the gem [2].
You can upload a CSV file and get back the EAD XML. I've got an early
version of this up on Heroku if anyone wants to give it a try.

Let me know if there's something I might be able to do to make this
work better for you.

Jason

[1] http://github.com/jronallo/stead
[2] http://github.com/jronallo/steady

On Thu, Oct 7, 2010 at 1:36 PM, Cornwall, Daniel D (EED)
 wrote:
> Hi All,
>
>
>
> While I think what I'm looking for doesn't exist, I wanted to ask some
> experts before making confident assertions.
>
>
>
> Our institution has a lot of finding aids for photo and manuscript
> collections in MS Word Format. They have pretty standard subheadings. An
> example can be found at
> www.library.state.ak.us/hist/hist_docs/finding_aids/MS220.doc
>  .
>
>
>
>
> I've had inquiries about getting these Word finding aids converted to
> EAD (Encoded Archival Description) through some sort of converter. I
> haven't been able to locate any such program, but maybe that's a
> reflection on my searching skills.
>
>
>
> There are a number of programs to create EAD finding aids from scratch
> and I've recommended acquiring one of these programs and getting staff
> to rekey/copy & paste from Word into the EAD finding aid program. Staff
> are not willing to do this at least until I can demonstrate that there
> is no automated way to convert our finding aids. Of course, if there is
> a converter, so much the better.
>
>
>
> Thanks in advance for any enlightenment you can give me. - Daniel
>
>
>
> ===
>
> Daniel Cornwall
>
> Head of Technical and Imaging Services
>
> Division of Libraries, Archives and Museums
>
> PO Box 110571
> Juneau, AK 99811-0571
> Phone (907) 465-6332
>
> Fax (907) 465-2665
> E-Mail: dan.cornw...@alaska.gov
>
> See Division resources at http://lam.alaska.gov 
> .
>
>
>
>
>
> Any opinions expressed in this e-mail are mine alone and not those of my
> employer unless explicitly stated.
>
>
>
>


Re: [CODE4LIB] Looking for a Word to EAD converter

2010-10-07 Thread Ethan Gruber
I think a lot of people have hit the nail on the head.  One has to weigh the
time and effort for creating/modifying the script and doing QA on the result
vs. the time it takes to encode the docs as text files from scratch.  I
think that someone who is well-trained in EAD would be much faster per
document than the scripting method.  The script-outputted EAD files are
likely to contain a number of errors that could take at least as long to fix
vs. the manual method.  You might be able to automate some easier structural
components, but each file has to be opened and checked by a human to make
sure it not only parses, but is semantically correct.  I have migrated ~5000
TEI files in the same manner.  Regular expression find/replace can only get
you so far.

Ethan

On Thu, Oct 7, 2010 at 2:16 PM, Catalina Oyler  wrote:

> Daniel,
>
> While I was at Arizona Archives Online I worked on a set of macros in Word
> that can be used to automate the conversion of the container list portion
> of
> finding aids.  The idea being that you paste the container list into a Word
> template, quickly apply styles to the text using either key commands or
> find
> and replace, then run a macro which applies tags based on the styles.
>  While
> the Word documents have to be formatted very specifically, using find and
> replace can quickly accomplish this.   It can be a little hard to set up
> and
> figure out if you haven't used macros before but I wrote some basic user
> guides that try to explain the process.  Also, I should mention the
> original
> macros are all from the Bentley Historical Library which Nathan mentioned,
> I
> just modified them and created a more detailed user guide.
>
> So that everyone can see the documents I created a quick Google Site with
> then on it at https://sites.google.com/site/wordtoeadmacros/
> Start with the Installing ASH macros document and it will tell you how to
> set up macros (you have to do it on a PC the macros don't work on Macs)
> Then just use the conversion guide skipping all the intro stuff and going
> right to the section on running the macros
>
> Feel free to email me as you hit snags- there are lots of places for the
> setup to go wrong and the macros are picky, but if you're converting a
> large
> number of finding aids this will save you a lot of time once you have it
> down.  These macros are in use at a dozen institutions in Arizona and with
> a
> little set up they should be able to work for you as well.
>
> Catalina Oyler
> Digital Initiatives Coordinator
> The Five Colleges of Ohio
>
> On Thu, Oct 7, 2010 at 1:36 PM, Cornwall, Daniel D (EED) <
> daniel.cornw...@alaska.gov> wrote:
>
> > Hi All,
> >
> >
> >
> > While I think what I'm looking for doesn't exist, I wanted to ask some
> > experts before making confident assertions.
> >
> >
> >
> > Our institution has a lot of finding aids for photo and manuscript
> > collections in MS Word Format. They have pretty standard subheadings. An
> > example can be found at
> > www.library.state.ak.us/hist/hist_docs/finding_aids/MS220.doc
> >  .
> >
> >
> >
> >
> > I've had inquiries about getting these Word finding aids converted to
> > EAD (Encoded Archival Description) through some sort of converter. I
> > haven't been able to locate any such program, but maybe that's a
> > reflection on my searching skills.
> >
> >
> >
> > There are a number of programs to create EAD finding aids from scratch
> > and I've recommended acquiring one of these programs and getting staff
> > to rekey/copy & paste from Word into the EAD finding aid program. Staff
> > are not willing to do this at least until I can demonstrate that there
> > is no automated way to convert our finding aids. Of course, if there is
> > a converter, so much the better.
> >
> >
> >
> > Thanks in advance for any enlightenment you can give me. - Daniel
> >
> >
> >
> > ===
> >
> > Daniel Cornwall
> >
> > Head of Technical and Imaging Services
> >
> > Division of Libraries, Archives and Museums
> >
> > PO Box 110571
> > Juneau, AK 99811-0571
> > Phone (907) 465-6332
> >
> > Fax (907) 465-2665
> > E-Mail: dan.cornw...@alaska.gov
> >
> > See Division resources at http://lam.alaska.gov 
> > .
> >
> >
> >
> >
> >
> > Any opinions expressed in this e-mail are mine alone and not those of my
> > employer unless explicitly stated.
> >
> >
> >
> >
>


Re: [CODE4LIB] Looking for a Word to EAD converter

2010-10-07 Thread Aaron Rubinstein
It might be worth experimenting with MS Word's keyboard macros.  If 
there is predictable spacing in your MS Word finding aids, a keyboard 
macro could save you a lot of key strokes.  Some of us here at UMass use 
this method to encode container lists in EAD that were formatted in MS 
Word tables.


No matter what, you'll still have to do the conversion one document at a 
time since, as others have explained, Word documents and EAD are two 
totally different concepts.



Aaron

On 10/7/2010 1:46 PM, Houghton,Andrew wrote:

Don't know whether one exists or not, but the fact that the documents are in MS 
Word means that you could attach some VBA (Visual Basic for Applications) 
macros to the documents and run a macro that extracts and creates XML.

Andy.


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Cornwall, Daniel D (EED)
Sent: Thursday, October 07, 2010 01:36 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Looking for a Word to EAD converter

Hi All,



While I think what I'm looking for doesn't exist, I wanted to ask some
experts before making confident assertions.



Our institution has a lot of finding aids for photo and manuscript
collections in MS Word Format. They have pretty standard subheadings.
An
example can be found at
www.library.state.ak.us/hist/hist_docs/finding_aids/MS220.doc

.




I've had inquiries about getting these Word finding aids converted to
EAD (Encoded Archival Description) through some sort of converter. I
haven't been able to locate any such program, but maybe that's a
reflection on my searching skills.



There are a number of programs to create EAD finding aids from scratch
and I've recommended acquiring one of these programs and getting staff
to rekey/copy&  paste from Word into the EAD finding aid program. Staff
are not willing to do this at least until I can demonstrate that there
is no automated way to convert our finding aids. Of course, if there is
a converter, so much the better.



Thanks in advance for any enlightenment you can give me. - Daniel



===

Daniel Cornwall

Head of Technical and Imaging Services

Division of Libraries, Archives and Museums

PO Box 110571
Juneau, AK 99811-0571
Phone (907) 465-6332

Fax (907) 465-2665
E-Mail: dan.cornw...@alaska.gov

See Division resources at http://lam.alaska.gov

.





Any opinions expressed in this e-mail are mine alone and not those of
my
employer unless explicitly stated.




--
Aaron Rubinstein
Digital Project Manager
Special Collections and University Archives
University of Massachusetts, Amherst
Tel: (413)545-9637
Email: arubi...@library.umass.edu
Web: http://people.umass.edu/arubinst


Re: [CODE4LIB] Looking for a Word to EAD converter

2010-10-07 Thread Catalina Oyler
Daniel,

While I was at Arizona Archives Online I worked on a set of macros in Word
that can be used to automate the conversion of the container list portion of
finding aids.  The idea being that you paste the container list into a Word
template, quickly apply styles to the text using either key commands or find
and replace, then run a macro which applies tags based on the styles.  While
the Word documents have to be formatted very specifically, using find and
replace can quickly accomplish this.   It can be a little hard to set up and
figure out if you haven't used macros before but I wrote some basic user
guides that try to explain the process.  Also, I should mention the original
macros are all from the Bentley Historical Library which Nathan mentioned, I
just modified them and created a more detailed user guide.

So that everyone can see the documents I created a quick Google Site with
then on it at https://sites.google.com/site/wordtoeadmacros/
Start with the Installing ASH macros document and it will tell you how to
set up macros (you have to do it on a PC the macros don't work on Macs)
Then just use the conversion guide skipping all the intro stuff and going
right to the section on running the macros

Feel free to email me as you hit snags- there are lots of places for the
setup to go wrong and the macros are picky, but if you're converting a large
number of finding aids this will save you a lot of time once you have it
down.  These macros are in use at a dozen institutions in Arizona and with a
little set up they should be able to work for you as well.

Catalina Oyler
Digital Initiatives Coordinator
The Five Colleges of Ohio

On Thu, Oct 7, 2010 at 1:36 PM, Cornwall, Daniel D (EED) <
daniel.cornw...@alaska.gov> wrote:

> Hi All,
>
>
>
> While I think what I'm looking for doesn't exist, I wanted to ask some
> experts before making confident assertions.
>
>
>
> Our institution has a lot of finding aids for photo and manuscript
> collections in MS Word Format. They have pretty standard subheadings. An
> example can be found at
> www.library.state.ak.us/hist/hist_docs/finding_aids/MS220.doc
>  .
>
>
>
>
> I've had inquiries about getting these Word finding aids converted to
> EAD (Encoded Archival Description) through some sort of converter. I
> haven't been able to locate any such program, but maybe that's a
> reflection on my searching skills.
>
>
>
> There are a number of programs to create EAD finding aids from scratch
> and I've recommended acquiring one of these programs and getting staff
> to rekey/copy & paste from Word into the EAD finding aid program. Staff
> are not willing to do this at least until I can demonstrate that there
> is no automated way to convert our finding aids. Of course, if there is
> a converter, so much the better.
>
>
>
> Thanks in advance for any enlightenment you can give me. - Daniel
>
>
>
> ===
>
> Daniel Cornwall
>
> Head of Technical and Imaging Services
>
> Division of Libraries, Archives and Museums
>
> PO Box 110571
> Juneau, AK 99811-0571
> Phone (907) 465-6332
>
> Fax (907) 465-2665
> E-Mail: dan.cornw...@alaska.gov
>
> See Division resources at http://lam.alaska.gov 
> .
>
>
>
>
>
> Any opinions expressed in this e-mail are mine alone and not those of my
> employer unless explicitly stated.
>
>
>
>


Re: [CODE4LIB] Looking for a Word to EAD converter

2010-10-07 Thread Robert Sanderson
Agreed.

The easiest way, IMHO of working with EAD for several years at the Archives
Hub (http://www.archiveshub.ac.uk), would be to set up a set of EAD
templates and then cut and paste the text into them from Word.

Rob

On Thu, Oct 7, 2010 at 11:58 AM, Eric Lease Morgan  wrote:

> On Oct 7, 2010, at 1:53 PM, Nathan Tallman wrote:
>
> > Chances are you would have to reformat all your finding aids to
> > the new format, which may be as time consuming as hand coding
> >
> > ...If faced with these two options, I would opt for hand coding.  You
> will
> > learn so much more about EAD and it's potential.  If you totally rely on
> the
> > macro converter, your limited to what the macros are built to do.
>
> I concur.
>
> --
> Eric Morgan
> University of Notre Dame
>


Re: [CODE4LIB] Looking for a Word to EAD converter

2010-10-07 Thread Mark A. Matienzo
Daniel,

A number of institutions, including the Bentley Historical Library at
the University of Michigan, have used macros in Microsoft Word to
generate EAD. See [0] for example.

Arguably macros are difficult to maintain and require overly precise
markup in Word of things like headings. I can't begin to estimate how
long it would take to modify and maintain the macros. You may want to
consider using them for an "initial push", and then create/maintain
EAD finding aids using some other set of tools after you convert your
existing finding aids.

I would also urge you to consider posting your question to the EAD
Listserv as well. [1]

[0] http://bentley.umich.edu/EAD/bhlfiles.php
[1] http://listserv.loc.gov/cgi-bin/wa?SUBED1=ead&A=1

Mark A. Matienzo
Digital Archivist, Manuscripts and Archives
Yale University Library


Re: [CODE4LIB] Looking for a Word to EAD converter

2010-10-07 Thread Eric Lease Morgan
On Oct 7, 2010, at 1:53 PM, Nathan Tallman wrote:

> Chances are you would have to reformat all your finding aids to
> the new format, which may be as time consuming as hand coding
> 
> ...If faced with these two options, I would opt for hand coding.  You will
> learn so much more about EAD and it's potential.  If you totally rely on the
> macro converter, your limited to what the macros are built to do.

I concur.

-- 
Eric Morgan
University of Notre Dame


Re: [CODE4LIB] Looking for a Word to EAD converter

2010-10-07 Thread Nathan Tallman
Hi Daniel,

Converting from MS Word to EAD is technically achievable.  But, all your MS
Word finding aids have to be formated in a very specific way as the
conversion is powered by macros that match specific formating with specific
tagging.  Chances are you would have to reformat all your finding aids to
the new format, which may be as time consuming as hand coding.  Check out
the Bentley EAD Templates and Macros web page for specifics <
http://bentley.umich.edu/EAD/bhlfiles.php>.

If faced with these two options, I would opt for hand coding.  You will
learn so much more about EAD and it's potential.  If you totally rely on the
macro converter, your limited to what the macros are built to do.

Best,
Nathan

On Thu, Oct 7, 2010 at 1:36 PM, Cornwall, Daniel D (EED) <
daniel.cornw...@alaska.gov> wrote:

> Hi All,
>
>
>
> While I think what I'm looking for doesn't exist, I wanted to ask some
> experts before making confident assertions.
>
>
>
> Our institution has a lot of finding aids for photo and manuscript
> collections in MS Word Format. They have pretty standard subheadings. An
> example can be found at
> www.library.state.ak.us/hist/hist_docs/finding_aids/MS220.doc
>  .
>
>
>
>
> I've had inquiries about getting these Word finding aids converted to
> EAD (Encoded Archival Description) through some sort of converter. I
> haven't been able to locate any such program, but maybe that's a
> reflection on my searching skills.
>
>
>
> There are a number of programs to create EAD finding aids from scratch
> and I've recommended acquiring one of these programs and getting staff
> to rekey/copy & paste from Word into the EAD finding aid program. Staff
> are not willing to do this at least until I can demonstrate that there
> is no automated way to convert our finding aids. Of course, if there is
> a converter, so much the better.
>
>
>
> Thanks in advance for any enlightenment you can give me. - Daniel
>
>
>
> ===
>
> Daniel Cornwall
>
> Head of Technical and Imaging Services
>
> Division of Libraries, Archives and Museums
>
> PO Box 110571
> Juneau, AK 99811-0571
> Phone (907) 465-6332
>
> Fax (907) 465-2665
> E-Mail: dan.cornw...@alaska.gov
>
> See Division resources at http://lam.alaska.gov 
> .
>
>
>
>
>
> Any opinions expressed in this e-mail are mine alone and not those of my
> employer unless explicitly stated.
>
>
>
>


Re: [CODE4LIB] Looking for a Word to EAD converter

2010-10-07 Thread Dave Rice

Hi Daniel,
Word and EAD aren't really equivalent concepts. The sample Word document 
that you post looks like it follows some type of definition of rules for 
the structure. Potentially if you had a lot of these documents, one 
could write a script to convert to the Word document to a raw text files 
(assuming the formatting doesn't provide any semantic meaning), then use 
text parser to isolate various discrete expression from the Word 
document and map it into an EAD structure. This may be tricky is the 
Word documents don't all follow the same rules and also if the Word 
document does not provide enough data to meet the minimal requirements 
of an EAD expression.


David Rice
AudioVisual Preservation Solutions
350 7th Avenue, Suite 1603
New York, NY 10001

ph: 212-564-2140
cell: 347-213-3517
www.avpreserve.com

On 10/7/10 1:43 PM, Ethan Gruber wrote:

Hi Daniel,

I don't see how this will be possible.  A program can't make semantically
appropriate decisions for mapping prose to EAD tags.  You'll just have to go
with the copy-paste method in something like oXygen.

Ethan Gruber

On Thu, Oct 7, 2010 at 1:36 PM, Cornwall, Daniel D (EED)<
daniel.cornw...@alaska.gov>  wrote:


Hi All,



While I think what I'm looking for doesn't exist, I wanted to ask some
experts before making confident assertions.



Our institution has a lot of finding aids for photo and manuscript
collections in MS Word Format. They have pretty standard subheadings. An
example can be found at
www.library.state.ak.us/hist/hist_docs/finding_aids/MS220.doc
  .




I've had inquiries about getting these Word finding aids converted to
EAD (Encoded Archival Description) through some sort of converter. I
haven't been able to locate any such program, but maybe that's a
reflection on my searching skills.



There are a number of programs to create EAD finding aids from scratch
and I've recommended acquiring one of these programs and getting staff
to rekey/copy&  paste from Word into the EAD finding aid program. Staff
are not willing to do this at least until I can demonstrate that there
is no automated way to convert our finding aids. Of course, if there is
a converter, so much the better.



Thanks in advance for any enlightenment you can give me. - Daniel



===

Daniel Cornwall

Head of Technical and Imaging Services

Division of Libraries, Archives and Museums

PO Box 110571
Juneau, AK 99811-0571
Phone (907) 465-6332

Fax (907) 465-2665
E-Mail: dan.cornw...@alaska.gov

See Division resources at http://lam.alaska.gov
.





Any opinions expressed in this e-mail are mine alone and not those of my
employer unless explicitly stated.






Re: [CODE4LIB] Looking for a Word to EAD converter

2010-10-07 Thread Houghton,Andrew
Don't know whether one exists or not, but the fact that the documents are in MS 
Word means that you could attach some VBA (Visual Basic for Applications) 
macros to the documents and run a macro that extracts and creates XML.

Andy.

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Cornwall, Daniel D (EED)
> Sent: Thursday, October 07, 2010 01:36 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Looking for a Word to EAD converter
> 
> Hi All,
> 
> 
> 
> While I think what I'm looking for doesn't exist, I wanted to ask some
> experts before making confident assertions.
> 
> 
> 
> Our institution has a lot of finding aids for photo and manuscript
> collections in MS Word Format. They have pretty standard subheadings.
> An
> example can be found at
> www.library.state.ak.us/hist/hist_docs/finding_aids/MS220.doc
> 
> .
> 
> 
> 
> 
> I've had inquiries about getting these Word finding aids converted to
> EAD (Encoded Archival Description) through some sort of converter. I
> haven't been able to locate any such program, but maybe that's a
> reflection on my searching skills.
> 
> 
> 
> There are a number of programs to create EAD finding aids from scratch
> and I've recommended acquiring one of these programs and getting staff
> to rekey/copy & paste from Word into the EAD finding aid program. Staff
> are not willing to do this at least until I can demonstrate that there
> is no automated way to convert our finding aids. Of course, if there is
> a converter, so much the better.
> 
> 
> 
> Thanks in advance for any enlightenment you can give me. - Daniel
> 
> 
> 
> ===
> 
> Daniel Cornwall
> 
> Head of Technical and Imaging Services
> 
> Division of Libraries, Archives and Museums
> 
> PO Box 110571
> Juneau, AK 99811-0571
> Phone (907) 465-6332
> 
> Fax (907) 465-2665
> E-Mail: dan.cornw...@alaska.gov
> 
> See Division resources at http://lam.alaska.gov
> 
> .
> 
> 
> 
> 
> 
> Any opinions expressed in this e-mail are mine alone and not those of
> my
> employer unless explicitly stated.
> 
> 


Re: [CODE4LIB] Looking for a Word to EAD converter

2010-10-07 Thread Ethan Gruber
Hi Daniel,

I don't see how this will be possible.  A program can't make semantically
appropriate decisions for mapping prose to EAD tags.  You'll just have to go
with the copy-paste method in something like oXygen.

Ethan Gruber

On Thu, Oct 7, 2010 at 1:36 PM, Cornwall, Daniel D (EED) <
daniel.cornw...@alaska.gov> wrote:

> Hi All,
>
>
>
> While I think what I'm looking for doesn't exist, I wanted to ask some
> experts before making confident assertions.
>
>
>
> Our institution has a lot of finding aids for photo and manuscript
> collections in MS Word Format. They have pretty standard subheadings. An
> example can be found at
> www.library.state.ak.us/hist/hist_docs/finding_aids/MS220.doc
>  .
>
>
>
>
> I've had inquiries about getting these Word finding aids converted to
> EAD (Encoded Archival Description) through some sort of converter. I
> haven't been able to locate any such program, but maybe that's a
> reflection on my searching skills.
>
>
>
> There are a number of programs to create EAD finding aids from scratch
> and I've recommended acquiring one of these programs and getting staff
> to rekey/copy & paste from Word into the EAD finding aid program. Staff
> are not willing to do this at least until I can demonstrate that there
> is no automated way to convert our finding aids. Of course, if there is
> a converter, so much the better.
>
>
>
> Thanks in advance for any enlightenment you can give me. - Daniel
>
>
>
> ===
>
> Daniel Cornwall
>
> Head of Technical and Imaging Services
>
> Division of Libraries, Archives and Museums
>
> PO Box 110571
> Juneau, AK 99811-0571
> Phone (907) 465-6332
>
> Fax (907) 465-2665
> E-Mail: dan.cornw...@alaska.gov
>
> See Division resources at http://lam.alaska.gov 
> .
>
>
>
>
>
> Any opinions expressed in this e-mail are mine alone and not those of my
> employer unless explicitly stated.
>
>
>
>