Re: [CODE4LIB] Anything Interesting Going on in Archival Metadata?

2016-05-24 Thread Charles Blair
I've been applying the Europeana Data Model with some success to
digital archives. Some work has already been done in this area:

Casarosa, Vittore; Meghini, Carlo; Gardasevic,
Stanislava. (2013). “Improving Online Access to Archival
Data”. Digital Libraries & Archives, pp. 153-162.

Hennicke, Steffen; Olensky Marlies; de Boer, Victor; Isaac Antoine;
Wielemaker, Jan. (2011). “Conversion of EAD into EDM Linked Data”. In:
Proceedings of the 1st International Workshop on Semantic Digital
Archives. <http://www-e.uni-magdeburg.de/predoiu/sda2011/sda2011_06.pdf>.

See also:

Gardasevic, Stanislava. (2011). “Opening Archives to the General
Public, a data modelling approach”. Master thesis. International
Master in Digital Library Learning.

-- 
Charles Blair, Director, Digital Library Development Center, University of 
Chicago Library
1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/


Re: [CODE4LIB] Why learn Unix?

2014-10-28 Thread Charles Blair
On Mon, Oct 27, 2014 at 10:02:18AM -0400, Siobhain Rivera wrote:
> I'm part of the ASIS&T Student Chapter and Indiana University, and we're
> putting together a series of workshops on Unix. We've noticed that a lot of
> people don't seem to have a good idea of why they should learn Unix,
> particularly the reference/non technology types. We're going to do some
> more research to make a fact sheet about the uses of Unix, but I thought
> I'd pose the question to the list - what do you think are reasons
> librarians need to know Unix, even if they aren't in particularly tech
> heavy jobs?
> 
> I'd appreciate any input. Have a great week!

I wouldn't necessarily typecast reference librarians: some of them are
the most tech-savvy non-IT "types" whom I have met.

I assume by "Unix" one means the Unix/Linux command line, and the
tools one can invoke from there, which gives a commonality to the
environment regardless of the implementation details of the particular
OS (which is of more importance to system administrators, who need to
understand the dialects in greater detail).

The traditional strength of the environment is in manipulating
arbitrary textual data: sed, (g)awk, (e)grep and their congeners make
some otherwise difficult tasks manageable.

All of our archivists are command-line savvy: years ago we moved them
off a (home-grown) Microsoft Windows-based solution to creating
finding aids to a (home-grown) Unix-based one without a problem. (We
use FreeBSD for non-commercial products; for commercial products we
use RedHat Linux.) Whether they use Unix ordinarily is less relevant,
I think, than that they can when the need arises: the command-line
paradigm is not a barrier for them.

One cannot predict the trajectory one's future might take. If one day
one finds oneself talking to a faculty member working with big data
(from the sciences or from the humanities), one is likely to be
talking to someone with basic Unix/Linux command-line skills. Unless
one thinks that the future of librarianship has nothing to do with
such things, then I would recommend learning (not necessarily liking
let alone becoming fluent in) Unix/Linux skills, simply to extend
one's computer literacy, and to be able to represent the profession
capably should the occasion arise.

-- 
Charles Blair, Director, Digital Library Development Center, University of 
Chicago Library
1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/


Re: [CODE4LIB] What is the real impact of SHA-256? - Updated

2014-10-03 Thread Charles Blair
Look at slide 15 here:
http://www.slideshare.net/DuraSpace/sds-cwebinar-1

I think we're worried about the cumulative effect over time of
undetected errors (at least, I am).

On Fri, Oct 03, 2014 at 05:37:14AM -0700, Kyle Banerjee wrote:
> On Thu, Oct 2, 2014 at 3:47 PM, Simon Spero  wrote:
> 
> > Checksums can be kept separate (tripwire style).
> > For JHU archiving, the use of MD5 would give false positives for duplicate
> > detection.
> >
> > There is no reason to use a bad cryptographic hash. Use a fast hash, or use
> > a safe hash.
> >
> 
> I have always been puzzled why so much energy is expended on bit integrity
> in the library and archival communities. Hashing does not accommodate
> modification of internal metadata or compression which do not compromise
> integrity. And if people who can access the files can also access the
> hashes, there is no contribution to security. Also, wholesale hashing of
> repositories scales poorly,  My guess is that the biggest threats are staff
> error or rogue processes (i.e. bad programming). Any malicious
> destruction/modification is likely to be an inside job.
> 
> In reality, using file size alone is probably sufficient for detecting
> changed files -- if dup detection is desired, then hashing the few that dup
> out can be performed. Though if dups are an actual issue, it reflects
> problems elsewhere. Thrashing disks and cooking the CPU for the purposes
> libraries use hashes for seems way overkill, especially given that basic
> interaction with repositories for depositors, maintainers, and users is
> still in a very primitive state.
> 
> kyle
> 

-- 
Charles Blair, Director, Digital Library Development Center, University of 
Chicago Library
1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/


Re: [CODE4LIB] Dewey code

2014-08-08 Thread Charles Blair
On Fri, Aug 08, 2014 at 08:46:24PM -0400, Tom Connolly wrote:
> >> Is there an open source way to format the dewey code for printing book
> >> labels? Or can someone tell me how to isolate just the dewey number from a
> >> marc file (I have MarcEdit; is there a better tool for this simple task?)
> >> so it is the only field sent to the printer? (I'm using Ubuntu 14.04 and
> >> printing to a Dymo 450) Thanks
> >> Tom Connolly

Something like this might work.

./print_marc.lisp foo.mrc | egrep -m 1 '^082|^083' | lpr -h

print_marc.lisp is:

#!/usr/bin/env clisp

;;; Program to print a file in MARC communications format
;;; human-readably.

(defvar *file* (car *args*))

(defun print-directory-entries (input base-address-of-data limit ptr step)
(cond ((= limit 0))
  (t (let* ((directory-entry (subseq input ptr (+ ptr step)))
(tag (subseq directory-entry 0 3))
(field-length (parse-integer (subseq directory-entry 3 7)))
(starting-character-position (parse-integer (subseq 
directory-entry 7 12)))
(field-start (+ base-address-of-data 
starting-character-position))
(field-end (+ field-start field-length))
(data (subseq input field-start field-end)))
   (format t "~a ~a~%" tag data)
   (print-directory-entries input base-address-of-data (decf limit) 
(+ ptr step) step)

(defun process (input)
"A MARC record contains a leader, followed by a directories listing,
followed by data. The leaders is 24 characters long. The first five
characters are the record length. Characters 12-16 indicate where the
data section begins. The first three characters of a directory entry
indicate the MARC tag. Characters 3-6 are the field length. Characters
7-11 are the starting character position of the corresponding data
relative to the base address of the data. Numbering begins from 0."
  (let* ((leader (subseq input 0 24))
 (leader-length 24)
 (directory-record-length 12)
 (record_length (parse-integer (subseq input 0 5)))
 (base-address-of-data (parse-integer (subseq input 12 17)))
 (length-of-field (parse-integer (subseq input 20 21)))
 (length-of-directory (- base-address-of-data (+ 1 leader-length)))
 (number-of-fields (/ length-of-directory directory-record-length)))
(format t "~a~%" leader)
(print-directory-entries input base-address-of-data number-of-fields 
leader-length directory-record-length)))

(with-open-file (stream *file*)
(do ((input (read-line stream nil)
(read-line stream nil)))
((null input))
    (process input)))

I'm not sure how you'd do it with MarcEdit.

-- 
Charles Blair, Director, Digital Library Development Center, University of 
Chicago Library
1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/


Re: [CODE4LIB] separate list for jobs

2014-05-07 Thread Charles Blair
I don't mind having them both in the same feed. They're easy enough to
tell apart even w/o a filter. The reason I say this is that when I see
something like "Job: Digital Assets Librarian", or " Job: Linked Data
Technologist, Metadata at Stanford University", just to pick two at
random, that's a good way for me (as a hiring manager) to see what new
kinds of positions are being posted (as opposed to those I'm already
familiar with), what new responsibilities they might entail, how a
position might be pitched in a new way, or, as in the case of
Stanford, what in particular they (as a leader in some of the sorts of
things I care about) might be up to. At the very least it adds useful
pieces to my current awareness in a convenient way, but it also has
the potential of influencing how we define the next position we post
here, and since we would like to hire from the community, it has
potential benefit for the community as well. Of course, I'm speaking
for myself, but in case this is a potentially useful perspective, that
some others might hold as well, I post it.

-- 
Charles Blair, Director, Digital Library Development Center, University of 
Chicago Library
1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/


Re: [CODE4LIB] EAD vs. HTML for finding aids

2013-05-10 Thread Charles Blair
On Fri, May 10, 2013 at 08:25:05PM -0400, Eric Lease Morgan wrote:
> Create EAD files to describe the collections in your archives
  because EAD is the MARC of the archives world. There are no two ways
  about it.  --ELM

That might not be the best way of putting it given the full extent of
the original question (see below).

EAD vs HTML? No question (EAD). Can MARC be used to describe archival
collections? Fact is, it is.

This leads to a followup question: What is the current uptake of EAD
in the archival world? I did some looking and couldn't find
percentages stated. What I'm asking is, of those archival collections
described using electronic finding aids, what percentage are EAD? What
percentage are MARC? What percentage are other? (I don't care to
distinguish HTML from PDF, for example.) Is anyone doing anything
creative with, say, linked data? (What I really mean is RDF, plain and
simple.)

I realize this is going off topic, but the original topic is pretty
much dead in the water if we confine it to EAD vs HTML. But if we add
the bit about "Or are both [EAD and MARC] going the way of the
dinosaurs?", then the answer is no, not (quite) yet, but I do recall
at the very first DLF meeting, when EAD was presented, mutterings from
about half the audience that this should be a database
application. Since then we've seen more than one viable XML database,
or software than can handle XML nicely (e.g., XTF), so while that
objection disappears, nevertheless I think that the observation
underlying those mutterings still stands: is there a better way to
this (at least, conceptually), and, is anyone working on such a way?
Today EAD is clearly the answer, but if EAD was questionable even
then, I'm wondering what a viable successor might be tomorrow (which
doesn't, of course, affect what implementation decisions we make
today).

-- 
Charles Blair   
Director, Digital Library Development Center, University of Chicago Library
1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/


Re: [CODE4LIB] Writing good documentation

2012-11-01 Thread Charles Blair
Our shop uses Org mode, http://orgmode.org/ . It makes you want to
write documentation (and we do). :-)

-- 
Charles Blair   
Director, Digital Library Development Center, University of Chicago Library
1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/


[CODE4LIB] Job Posting: Programmer/Analyst 2, The University of Chicago Library

2011-12-28 Thread Charles Blair
University Title: IT Programmer 2
Departmental Title: Programmer/Analyst 2
Department: DLDC
Classification: IT Exempt
Work Schedule: Full-time

GENERAL SUMMARY:

We envision an interoperable digital library which integrates
information resources with instruction and research, and a digital
archive which ensures the persistence and usability of these resources
over time. 

The Digital Library Development Center collaborates with librarians,
faculty, university departments and other groups, and colleagues at
other institutions, to develop and maintain networked information
systems for use today, and to ensure the long-term preservation of
information resources delivered through these systems for continued
availability in the future. 
Our core activities include designing, building, and maintaining web
sites, dynamic information resources, and digital collections;
installing and supporting vended systems that provide access to
information resources; administering the network of information server
computers which underlie our digital library and which support related
initiatives on campus; researching, evaluating, and implementing new
technologies; participating in national and international digital
library initiatives; and documenting and sharing procedures, policies,
and discoveries. 

Development work has included archives and manuscripts finding aids
databases (XML, XQuery), online digital collections (XML, XSLT, PHP,
MySQL, HTML, Javascript, CSS), and administrative databases (PHP,
MySQL). 

See Digital Library Collections and Activities,
http://www.lib.uchicago.edu/e/digital/ , for some examples of our
work. A departmental description is available at
http://dldc.lib.uchicago.edu/

ESSENTIAL FUNCTIONS

Develops, implements, customizes, tests and maintains technological
solutions to support the University of Chicago Library's digital
collections and other digital library systems for which the DLDC is
responsible.

Researches end-user requirements.
Develops design specifications.
Installs, evaluates and tests software.
Programming.
Designs and manages a workflow.
Documents work.  


QUALIFICATIONS:

Bachelor's degree required. At least three years experience with one
or more high-level programming languages, including one scripting
language (e.g., Python, Ruby) required. Ability to program at an
intermediate to advanced level in at least one programming language
required. Fluency with XML technologies (XSLT, XPath and XQuery)
required. Ability to interface to relational and XML databases from
within a programming language required. Demonstrable Unix/Linux
literacy (e.g., must be able to use sed, awk, grep, etc. effectively
from a Unix/Linux command line to accomplish small tasks)
required. Knowledge of HTML and cgi-bin programming
required. Familiarity with a web application framework (e.g., Django,
Ruby on Rails) required. Ability to work with Web Developers to
incorporate CSS and Javascript into programs required. Ability to
install and evaluate software against requirements quickly
required. Excellent verbal and written communication skills
required. Excellent interpersonal skills and ability to work well with
others required. Ability to identify and solve problems on own
initiative and as part of a team required. Ability to manage complex
technical details required. Ability to communicate technical concepts
to non-technical staff required.

Experience working in a library, academic or other research
environment preferred. Experience working in a digital library setting
preferred. Experience working in a production Unix/Linux environment
preferred. Experience working with Semantic Web technologies (RDF
triplestores; SPARQL queries; RDFa) preferred.

PHYSICAL REQUIREMENTS:

Ability to sit for 4 hours or more.  Ability to use computers
extensively for 4 hours or more.

The University of Chicago is an Affirmative Action/Equal Opportunity
Employer.

To apply for this position submit a profile along with a resume and
cover letter to
https://jobopportunities.uchicago.edu/applicants/jsp/shared/Welcome_css.jsp

University Title: IT Programmer 2
Departmental Title: Programmer/Analyst 2
Department: DLDC
Classification: IT Exempt
Work Schedule: Full-time

GENERAL SUMMARY:

We envision an interoperable digital library which integrates
information resources with instruction and research, and a digital
archive which ensures the persistence and usability of these resources
over time. 

The Digital Library Development Center collaborates with librarians,
faculty, university departments and other groups, and colleagues at
other institutions, to develop and maintain networked information
systems for use today, and to ensure the long-term preservation of
information resources delivered through these systems for continued
availability in the future. 
Our core activities include designing, building, and maintaining web
sites, dynamic information resources, and digital collections;
installing and supporting vended systems tha