Re: [CODE4LIB] Anything Interesting Going on in Archival Metadata?
I've been applying the Europeana Data Model with some success to digital archives. Some work has already been done in this area: Casarosa, Vittore; Meghini, Carlo; Gardasevic, Stanislava. (2013). “Improving Online Access to Archival Data”. Digital Libraries & Archives, pp. 153-162. Hennicke, Steffen; Olensky Marlies; de Boer, Victor; Isaac Antoine; Wielemaker, Jan. (2011). “Conversion of EAD into EDM Linked Data”. In: Proceedings of the 1st International Workshop on Semantic Digital Archives. <http://www-e.uni-magdeburg.de/predoiu/sda2011/sda2011_06.pdf>. See also: Gardasevic, Stanislava. (2011). “Opening Archives to the General Public, a data modelling approach”. Master thesis. International Master in Digital Library Learning. -- Charles Blair, Director, Digital Library Development Center, University of Chicago Library 1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/
Re: [CODE4LIB] Why learn Unix?
On Mon, Oct 27, 2014 at 10:02:18AM -0400, Siobhain Rivera wrote: > I'm part of the ASIS&T Student Chapter and Indiana University, and we're > putting together a series of workshops on Unix. We've noticed that a lot of > people don't seem to have a good idea of why they should learn Unix, > particularly the reference/non technology types. We're going to do some > more research to make a fact sheet about the uses of Unix, but I thought > I'd pose the question to the list - what do you think are reasons > librarians need to know Unix, even if they aren't in particularly tech > heavy jobs? > > I'd appreciate any input. Have a great week! I wouldn't necessarily typecast reference librarians: some of them are the most tech-savvy non-IT "types" whom I have met. I assume by "Unix" one means the Unix/Linux command line, and the tools one can invoke from there, which gives a commonality to the environment regardless of the implementation details of the particular OS (which is of more importance to system administrators, who need to understand the dialects in greater detail). The traditional strength of the environment is in manipulating arbitrary textual data: sed, (g)awk, (e)grep and their congeners make some otherwise difficult tasks manageable. All of our archivists are command-line savvy: years ago we moved them off a (home-grown) Microsoft Windows-based solution to creating finding aids to a (home-grown) Unix-based one without a problem. (We use FreeBSD for non-commercial products; for commercial products we use RedHat Linux.) Whether they use Unix ordinarily is less relevant, I think, than that they can when the need arises: the command-line paradigm is not a barrier for them. One cannot predict the trajectory one's future might take. If one day one finds oneself talking to a faculty member working with big data (from the sciences or from the humanities), one is likely to be talking to someone with basic Unix/Linux command-line skills. Unless one thinks that the future of librarianship has nothing to do with such things, then I would recommend learning (not necessarily liking let alone becoming fluent in) Unix/Linux skills, simply to extend one's computer literacy, and to be able to represent the profession capably should the occasion arise. -- Charles Blair, Director, Digital Library Development Center, University of Chicago Library 1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/
Re: [CODE4LIB] What is the real impact of SHA-256? - Updated
Look at slide 15 here: http://www.slideshare.net/DuraSpace/sds-cwebinar-1 I think we're worried about the cumulative effect over time of undetected errors (at least, I am). On Fri, Oct 03, 2014 at 05:37:14AM -0700, Kyle Banerjee wrote: > On Thu, Oct 2, 2014 at 3:47 PM, Simon Spero wrote: > > > Checksums can be kept separate (tripwire style). > > For JHU archiving, the use of MD5 would give false positives for duplicate > > detection. > > > > There is no reason to use a bad cryptographic hash. Use a fast hash, or use > > a safe hash. > > > > I have always been puzzled why so much energy is expended on bit integrity > in the library and archival communities. Hashing does not accommodate > modification of internal metadata or compression which do not compromise > integrity. And if people who can access the files can also access the > hashes, there is no contribution to security. Also, wholesale hashing of > repositories scales poorly, My guess is that the biggest threats are staff > error or rogue processes (i.e. bad programming). Any malicious > destruction/modification is likely to be an inside job. > > In reality, using file size alone is probably sufficient for detecting > changed files -- if dup detection is desired, then hashing the few that dup > out can be performed. Though if dups are an actual issue, it reflects > problems elsewhere. Thrashing disks and cooking the CPU for the purposes > libraries use hashes for seems way overkill, especially given that basic > interaction with repositories for depositors, maintainers, and users is > still in a very primitive state. > > kyle > -- Charles Blair, Director, Digital Library Development Center, University of Chicago Library 1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/
Re: [CODE4LIB] Dewey code
On Fri, Aug 08, 2014 at 08:46:24PM -0400, Tom Connolly wrote: > >> Is there an open source way to format the dewey code for printing book > >> labels? Or can someone tell me how to isolate just the dewey number from a > >> marc file (I have MarcEdit; is there a better tool for this simple task?) > >> so it is the only field sent to the printer? (I'm using Ubuntu 14.04 and > >> printing to a Dymo 450) Thanks > >> Tom Connolly Something like this might work. ./print_marc.lisp foo.mrc | egrep -m 1 '^082|^083' | lpr -h print_marc.lisp is: #!/usr/bin/env clisp ;;; Program to print a file in MARC communications format ;;; human-readably. (defvar *file* (car *args*)) (defun print-directory-entries (input base-address-of-data limit ptr step) (cond ((= limit 0)) (t (let* ((directory-entry (subseq input ptr (+ ptr step))) (tag (subseq directory-entry 0 3)) (field-length (parse-integer (subseq directory-entry 3 7))) (starting-character-position (parse-integer (subseq directory-entry 7 12))) (field-start (+ base-address-of-data starting-character-position)) (field-end (+ field-start field-length)) (data (subseq input field-start field-end))) (format t "~a ~a~%" tag data) (print-directory-entries input base-address-of-data (decf limit) (+ ptr step) step) (defun process (input) "A MARC record contains a leader, followed by a directories listing, followed by data. The leaders is 24 characters long. The first five characters are the record length. Characters 12-16 indicate where the data section begins. The first three characters of a directory entry indicate the MARC tag. Characters 3-6 are the field length. Characters 7-11 are the starting character position of the corresponding data relative to the base address of the data. Numbering begins from 0." (let* ((leader (subseq input 0 24)) (leader-length 24) (directory-record-length 12) (record_length (parse-integer (subseq input 0 5))) (base-address-of-data (parse-integer (subseq input 12 17))) (length-of-field (parse-integer (subseq input 20 21))) (length-of-directory (- base-address-of-data (+ 1 leader-length))) (number-of-fields (/ length-of-directory directory-record-length))) (format t "~a~%" leader) (print-directory-entries input base-address-of-data number-of-fields leader-length directory-record-length))) (with-open-file (stream *file*) (do ((input (read-line stream nil) (read-line stream nil))) ((null input)) (process input))) I'm not sure how you'd do it with MarcEdit. -- Charles Blair, Director, Digital Library Development Center, University of Chicago Library 1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/
Re: [CODE4LIB] separate list for jobs
I don't mind having them both in the same feed. They're easy enough to tell apart even w/o a filter. The reason I say this is that when I see something like "Job: Digital Assets Librarian", or " Job: Linked Data Technologist, Metadata at Stanford University", just to pick two at random, that's a good way for me (as a hiring manager) to see what new kinds of positions are being posted (as opposed to those I'm already familiar with), what new responsibilities they might entail, how a position might be pitched in a new way, or, as in the case of Stanford, what in particular they (as a leader in some of the sorts of things I care about) might be up to. At the very least it adds useful pieces to my current awareness in a convenient way, but it also has the potential of influencing how we define the next position we post here, and since we would like to hire from the community, it has potential benefit for the community as well. Of course, I'm speaking for myself, but in case this is a potentially useful perspective, that some others might hold as well, I post it. -- Charles Blair, Director, Digital Library Development Center, University of Chicago Library 1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/
Re: [CODE4LIB] EAD vs. HTML for finding aids
On Fri, May 10, 2013 at 08:25:05PM -0400, Eric Lease Morgan wrote: > Create EAD files to describe the collections in your archives because EAD is the MARC of the archives world. There are no two ways about it. --ELM That might not be the best way of putting it given the full extent of the original question (see below). EAD vs HTML? No question (EAD). Can MARC be used to describe archival collections? Fact is, it is. This leads to a followup question: What is the current uptake of EAD in the archival world? I did some looking and couldn't find percentages stated. What I'm asking is, of those archival collections described using electronic finding aids, what percentage are EAD? What percentage are MARC? What percentage are other? (I don't care to distinguish HTML from PDF, for example.) Is anyone doing anything creative with, say, linked data? (What I really mean is RDF, plain and simple.) I realize this is going off topic, but the original topic is pretty much dead in the water if we confine it to EAD vs HTML. But if we add the bit about "Or are both [EAD and MARC] going the way of the dinosaurs?", then the answer is no, not (quite) yet, but I do recall at the very first DLF meeting, when EAD was presented, mutterings from about half the audience that this should be a database application. Since then we've seen more than one viable XML database, or software than can handle XML nicely (e.g., XTF), so while that objection disappears, nevertheless I think that the observation underlying those mutterings still stands: is there a better way to this (at least, conceptually), and, is anyone working on such a way? Today EAD is clearly the answer, but if EAD was questionable even then, I'm wondering what a viable successor might be tomorrow (which doesn't, of course, affect what implementation decisions we make today). -- Charles Blair Director, Digital Library Development Center, University of Chicago Library 1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/
Re: [CODE4LIB] Writing good documentation
Our shop uses Org mode, http://orgmode.org/ . It makes you want to write documentation (and we do). :-) -- Charles Blair Director, Digital Library Development Center, University of Chicago Library 1 773 702 8459 | c...@uchicago.edu | http://www.lib.uchicago.edu/~chas/
[CODE4LIB] Job Posting: Programmer/Analyst 2, The University of Chicago Library
University Title: IT Programmer 2 Departmental Title: Programmer/Analyst 2 Department: DLDC Classification: IT Exempt Work Schedule: Full-time GENERAL SUMMARY: We envision an interoperable digital library which integrates information resources with instruction and research, and a digital archive which ensures the persistence and usability of these resources over time. The Digital Library Development Center collaborates with librarians, faculty, university departments and other groups, and colleagues at other institutions, to develop and maintain networked information systems for use today, and to ensure the long-term preservation of information resources delivered through these systems for continued availability in the future. Our core activities include designing, building, and maintaining web sites, dynamic information resources, and digital collections; installing and supporting vended systems that provide access to information resources; administering the network of information server computers which underlie our digital library and which support related initiatives on campus; researching, evaluating, and implementing new technologies; participating in national and international digital library initiatives; and documenting and sharing procedures, policies, and discoveries. Development work has included archives and manuscripts finding aids databases (XML, XQuery), online digital collections (XML, XSLT, PHP, MySQL, HTML, Javascript, CSS), and administrative databases (PHP, MySQL). See Digital Library Collections and Activities, http://www.lib.uchicago.edu/e/digital/ , for some examples of our work. A departmental description is available at http://dldc.lib.uchicago.edu/ ESSENTIAL FUNCTIONS Develops, implements, customizes, tests and maintains technological solutions to support the University of Chicago Library's digital collections and other digital library systems for which the DLDC is responsible. Researches end-user requirements. Develops design specifications. Installs, evaluates and tests software. Programming. Designs and manages a workflow. Documents work. QUALIFICATIONS: Bachelor's degree required. At least three years experience with one or more high-level programming languages, including one scripting language (e.g., Python, Ruby) required. Ability to program at an intermediate to advanced level in at least one programming language required. Fluency with XML technologies (XSLT, XPath and XQuery) required. Ability to interface to relational and XML databases from within a programming language required. Demonstrable Unix/Linux literacy (e.g., must be able to use sed, awk, grep, etc. effectively from a Unix/Linux command line to accomplish small tasks) required. Knowledge of HTML and cgi-bin programming required. Familiarity with a web application framework (e.g., Django, Ruby on Rails) required. Ability to work with Web Developers to incorporate CSS and Javascript into programs required. Ability to install and evaluate software against requirements quickly required. Excellent verbal and written communication skills required. Excellent interpersonal skills and ability to work well with others required. Ability to identify and solve problems on own initiative and as part of a team required. Ability to manage complex technical details required. Ability to communicate technical concepts to non-technical staff required. Experience working in a library, academic or other research environment preferred. Experience working in a digital library setting preferred. Experience working in a production Unix/Linux environment preferred. Experience working with Semantic Web technologies (RDF triplestores; SPARQL queries; RDFa) preferred. PHYSICAL REQUIREMENTS: Ability to sit for 4 hours or more. Ability to use computers extensively for 4 hours or more. The University of Chicago is an Affirmative Action/Equal Opportunity Employer. To apply for this position submit a profile along with a resume and cover letter to https://jobopportunities.uchicago.edu/applicants/jsp/shared/Welcome_css.jsp University Title: IT Programmer 2 Departmental Title: Programmer/Analyst 2 Department: DLDC Classification: IT Exempt Work Schedule: Full-time GENERAL SUMMARY: We envision an interoperable digital library which integrates information resources with instruction and research, and a digital archive which ensures the persistence and usability of these resources over time. The Digital Library Development Center collaborates with librarians, faculty, university departments and other groups, and colleagues at other institutions, to develop and maintain networked information systems for use today, and to ensure the long-term preservation of information resources delivered through these systems for continued availability in the future. Our core activities include designing, building, and maintaining web sites, dynamic information resources, and digital collections; installing and supporting vended systems tha