Re: [CODE4LIB] Hours of Operation on Website - management tool

2015-07-01 Thread Jon Stroop

Most of the rough edges are around some of the one-time administrative actions 
like setting up new libraries, locations, and term schedules, although there’s 
also some UI improvements in our near future.


FWIW, we just just 'finished' a first pass at little Rails engine around 
managing location data:


https://github.com/pulibrary/locations

It, too, is a little rough around the edges (esp wrt views) and has some 
site-specific stuff, like a gazillion 'location codes' to make it work 
with existing systems...but that's sorta why we built it.


-Jon

--
Jon Stroop
Application Development Manager
Princeton University Library
jstr...@princeton.edu



On 07/01/2015 11:35 AM, Chris Beer wrote:

Hi Ken,

We’ve recently been working on rebuilding an application for managing our 
hours. It’s Ruby on Rails, not-yet-in-production, full of rough edges, and has 
some Stanford-specific business logic, but it’s relatively simple and 
(probably) works for us:

https://github.com/sul-dlss/library_hours_rails/releases/tag/v0.0.1 
https://github.com/sul-dlss/library_hours_rails/releases/tag/v0.0.1

Currently, it’s envisioned as a backend service for staff to add and manage 
hours, with downstream consumers using the API to present the hours as 
appropriate. Our initial consumers include the main library website, our 
library catalog, and some other business process applications. We’ve also 
started thinking about embeddable HTML views of the hours to replace some of 
the clunky processing we’re currently doing in Drupal, but haven’t pursued that 
yet.

Interesting features include:

- JSON-API view of a location’s hours; (what I assume is a bespoke..) Drupal 
calendar feed; import and export for spreadsheets of hours;
- multiple library (and location-within-a-library) support;
- granular access control for updating hours; we have the notion of global 
hours administrators, but expect to also support library- and location-specific 
authorization, allowing library managers to set and update the hours for a 
subset of our locations [1];
- support for setting operating hours for a term and/or exceptions for 
particular days (e.g. holidays and the like) using an in-place editor;
- we have a notion of location-specific messages associated with exceptions to 
the normal schedule (e.g. the Art library is closed this week for Y), which can 
be reflected in applications that consume the library hours

Most of the rough edges are around some of the one-time administrative actions 
like setting up new libraries, locations, and term schedules, although there’s 
also some UI improvements in our near future.


Thanks,
Chris Beer
Digital Library Systems and Services
Stanford University Libraries


[1] Although I’m more interested in allowing any staff member to update the 
hours, and provide better notifications when a location’s hours change; that 
said, strong access control is much easier to reason about and codify..


On Jul 1, 2015, at 6:01 AM, Ken Irwin kir...@wittenberg.edu wrote:

Hi folks,

I'm hoping to find some sort of web-based app that can manage the library's 
hours of operations, including:

* Displaying today's hours

* Displaying an upcoming schedule of hours

* Updatable though a GUI interface by non-techy library staff

* Able to update our Google Places account hours (which, I note, 
currently lists our school-year hours as our open hours today), perhaps on a 
daily basis

* Preferably a stand-alone thing that can provide data on an ad hoc 
basis (as opposed to a CMS-specific thing like a WP plugin or a Drupal module)

* PHP preferred but not necessary

* OSS / free preferred but not necessary

I feel certain that someone else has already wanted this enough to create it. 
Anyone have a solution they're happy with?

Thanks
Ken


Re: [CODE4LIB] very large image display?

2014-07-25 Thread Jon Stroop

Jonathan,

We use an image server I wrote, Loris, plus OpenSeadragon. Here's an 
example:


http://libimages.princeton.edu/osd-demo/?feedme=pudl0123%2F8172070%2F01%2F0001.jp2

That image is 152500 x 4000 px:

http://libimages.princeton.edu/loris/pudl0123%2F8172070%2F01%2F0001.jp2/info.json

Loris is on Github: https://github.com/pulibrary/loris
as is OpenSeadragon: https://github.com/openseadragon/openseadragon

More generally, this is one of many problems IIIF (International Image 
Interoperability Framework) exists to try to solve. You might want to 
check out our site, which has links to other tools as well: http://iiif.io/


Hope this helps,
-Jon

On 07/25/2014 11:36 AM, Jonathan Rochkind wrote:

Does anyone have a good solution to recommend for display of very large images 
on the web?  I'm thinking of something that supports pan and scan, as well as 
loading only certain tiles for the current view to avoid loading an entire 
giant image.

A URL to more info to learn about things would be another way of answering this 
question, especially if it involves special server-side software.  I'm not sure 
where to begin. Googling around I can't find any clearly good solutions.

Has anyone done this before and been happy with a solution?

Thanks for any info!

Jonathan


Re: [CODE4LIB] iiif compatible servers

2014-07-25 Thread Jon Stroop
Eric,
FWIW, an HTTP resolver that could be used with Fedora has been a big topic for 
Loris recently, and a few of us are trying to spec out what that would look 
like.

The discussion/proposal is here: https://github.com/pulibrary/loris/issues/98 
and spreads to a few other linked issues. I'd be happy to hear what you think. 

-Jon

Sent from my mobile.  Please excuse typos. 

-Original Message-
From: James, Eric eric.ja...@yale.edu
To: CODE4LIB@LISTSERV.ND.EDU
Sent: Fri, 25 Jul 2014 17:39
Subject: [CODE4LIB] iiif compatible servers

Looking to implement a iiif compatible server, primarily for jp2s in fcrepo3.

Just read the 'very large image display?' thread and looking at the 
http://iiif.io/technical-details.html, it appears options include:

loris: https://github.com/pulibrary/loris
IIP: http://iipimage.sourceforge.net/documentation/server/
djatoka iiif: ( https://github.com/jronallo/djatoka)

The iiif djatoka gem immediately caught my eye as I've implemented djatoka w/ 
fcrepo3 in a previous project, but am interested if there are any opinions in 
choosing any one of these over another.

Thanks,
Eric

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Esmé Cowles 
[escow...@ticklefish.org]
Sent: Friday, July 25, 2014 4:44 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] very large image display?

We previously used the Zoomify Flash applet, but now use Leaflet.js with the 
Zoomify tileset plugin:

https://github.com/turban/Leaflet.Zoomify

One thing I like about this approach is that it minimizes the amount of 
Javascript code the clients have to load, since we use Leaflet.js for our maps 
and it's already loaded.

-Esme

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
 Jonathan Rochkind
 Sent: Friday, July 25, 2014 10:36 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] very large image display?

 Does anyone have a good solution to recommend for display of very large 
 images on the web?  I'm thinking of something that supports pan and scan, as 
 well as loading only certain tiles for the current view to avoid loading an 
 entire giant image.

 A URL to more info to learn about things would be another way of answering 
 this question, especially if it involves special server-side software.  I'm 
 not sure where to begin. Googling around I can't find any clearly good 
 solutions.

 Has anyone done this before and been happy with a solution?

 Thanks for any info!

 Jonathan


Re: [CODE4LIB] LC Call # splitting/sorting scripts?

2014-07-11 Thread Jon Stroop

This?

https://code.google.com/p/library-callnumber-lc/

On 07/11/2014 12:01 PM, Robert Dumas wrote:

​Hey all:

Does anyone know of any scripts (preferably in Ruby or Python) which can slice 
up an LC call number and sort a table of items by LC call number?
  


Re: [CODE4LIB] College Question!

2014-05-29 Thread Jon Stroop

Riley,

First, I wonder if there's anyone on this list who doesn't wish they had 
your foresight! You already have rare opportunity in that you're 
thinking about this now and not in your mid-20s, so way to go!


We spoke about this a little @ the c4l conference, but I'll say more. I 
majored in music performance and even did a masters in it as well, which 
means that practically speaking I have a high school education. :-) I 
don't really mean that, but until you've had the experience it's 
difficult to explain (or at least I find it difficult) how relevant a 
degree in the arts/humanities can be to a job in technology--and there's 
no shortage of people who have taken this exact path.


I did do an MLS, but see above re: high school education. At the time 
(~13 yrs ago) I felt like I needed to do it to get a job (I also didn't 
necessarily expect to wind up in systems, but that's another story), 
but, honestly, everything I know I learned on the job, or /a/ job, or 
the overnight hours between going to said job, which leads me to my 
point: Wherever you go to school, and regardless of your major, if you 
ultimately want to wind up working in a library, you should start now. 
Any brick and mortar university is going to have student jobs available 
(work study or otherwise) at the library, and while it may just be as a 
desk clerk or whatever, keep your ears open (we already know you're not 
shy): at some point there's going to be some stats that need munging, 
some Access (or even worse) database that needs migration, some web work 
to be done, or whatever and, et voilà, you're off!


The point is, professional degree != professional experience, 
and--frankly--you probably don't want to be working at a place that 
requires a systems librarian to have a MLIS anyway, and certainly not 
in 4-5 years. Get as much experience as possible, do a CS degree, but 
also learn how to write and communicate OR do an arts degree, but also 
learn how to program (etc.), and you'll be fine.


-Jon

On 05/28/2014 11:17 PM, Riley Childs wrote:

I was curious about the type of degrees people had. I am heading off to college 
next year (class of 2015) and am trying to figure out what to major in. I want 
to be a systems librarian, but I can't tell what to major in! I wanted to hear 
about what paths people took and how they ended up where they are now.

BTW Y'All at NC State need a better tour bus driver (not the c4l tour, the 
admissions tour) ;) the bus ride was like a rickety roller coaster...   

Also, if you know of any scholarships please let me know ;) you would be my BFF 
:P


Riley Childs
Student
Asst. Head of IT Services
Charlotte United Christian Academy
(704) 497-2086
RileyChilds.net
Sent from my Windows Phone, please excuse mistakes


[CODE4LIB] New IIIF API specifications drafts published

2014-05-29 Thread Jon Stroop
The IIIF Editors are pleased to announce draft revisions of the 
International Image Interoperability Framework Image and Presentation 
(formerly 'Metadata') API specifications.


 * http://iiif.io/api/image/2.0/
 * http://iiif.io/api/presentation/2.0/

These releases reflect a significant amount of input from both the IIIF 
working groups and the larger library, archives, and museum communities 
following roughly a year of experience either implementing or 
experimenting with the previous versions.


A complete list of the changes can be found on the IIIF website:

 * http://iiif.io/api/image/2.0/change-log.html
 * http://iiif.io/api/presentation/2.0/change-log.html

We welcome your feedback, questions, and use cases, and encourage you to 
submit them to the IIIF Discussion Listserv: 
iiif-disc...@googlegroups.com. Drafts will be kept open for comment 
until the beginning of August, with the goal of final release in 
September. However, we would appreciate feedback early in order to work 
on and gain consensus for any necessary changes.


Sincerely,

The IIIF Image and Presentation API Editors:
Benjamin Albritton
Michael Appleby
Robert Sanderson
Stuart Snydman
Jon Stroop
Simeon Warner

--
Jon Stroop
Digital Initiatives Developer/Analyst
Princeton University Library
jstr...@princeton.edu


[CODE4LIB] Newcomer dinner - Pit Group 4

2014-03-24 Thread Jon Stroop

Group 4 for The Pit:
It seems like there will be a sizable exodus from the conf hotel to the 
restaurant around 6 PM, so let's plan to meet then or shortly before in 
the lobby so that we can get ourselves organized. I'll find a way to 
make myself know to you.

-Jon


Re: [CODE4LIB] ruby-marc api design feedback wanted

2013-11-20 Thread Jon Stroop
Coming from nowhere on this...is there a place where it would be 
convenient to flag which behavior the user (of the library) wants? I 
think you're correct that most of the time you'd just want to blow 
through it (or replace it), but for the situation where this isn't the 
case, I think the Right Thing to do is raise the exception. I don't 
think you would want to bury it in some assumption made internal to the 
library unless that assumption can be turned off.


-Jon


On 11/19/2013 07:51 PM, Jonathan Rochkind wrote:

ruby-marc users, a question.

I am working on some Marc8 to UTF-8 conversion for ruby-marc.

Sometimes, what appears to be an illegal byte will appear in the Marc8 
input, and it can not be converted to UTF8.


The software will support two alternatives when this happens: 1) 
Raising an exception. 2) Replacing the illegal byte with a replacement 
char and/or omitting it.


I feel like most of the time, users are going to want #2.  I know 
that's what I'm going to want nearly all the time.


Yet, still, I am feeling uncertain whether that should be the default. 
Which should be the default behavior, #1 or #2?  If most people most 
of the time are going to want #2 (is this true?), then should that be 
the default behavior?   Or should #1 still be the default behavior, 
because by default bad input should raise, not be silently recovered 
from, even though most people most of the time won't want that, heh.


Jonathan


Re: [CODE4LIB] Loris

2013-11-08 Thread Jon Stroop

Ed,

I added support for IIIF syntax to OpenSeadragon:

https://github.com/openseadragon/openseadragon/blob/master/src/iiif1_1tilesource.js

so it just works. Not sure if Ian has cut a release recently, but it's 
on the master branch anyway.


-Js

On 11/08/2013 04:00 PM, Edward Summers wrote:

On Nov 8, 2013, at 3:05 PM, Jon Stroop jstr...@princeton.edu wrote:

And here's a sample of the server backing OpenSeadragon[2]: http://goo.gl/Gks6lR

Thanks for sharing that Jon. Did you have to do much to get OpenSeadragon to 
talk iiif?

//Ed


Re: [CODE4LIB] Loris

2013-11-08 Thread Jon Stroop

Whoops, wait.
I wrote a formula for Chris Thatcher to add support for IIIF 1.0 to add 
support for OSd. Then I made some changes and added support for 1.1. 
Credit where credit is due

-Js

On 11/08/2013 04:40 PM, Jon Stroop wrote:

Ed,

I added support for IIIF syntax to OpenSeadragon:

https://github.com/openseadragon/openseadragon/blob/master/src/iiif1_1tilesource.js

so it just works. Not sure if Ian has cut a release recently, but 
it's on the master branch anyway.


-Js

On 11/08/2013 04:00 PM, Edward Summers wrote:

On Nov 8, 2013, at 3:05 PM, Jon Stroopjstr...@princeton.edu  wrote:

And here's a sample of the server backing OpenSeadragon[2]:http://goo.gl/Gks6lR

Thanks for sharing that Jon. Did you have to do much to get OpenSeadragon to 
talk iiif?

//Ed




Re: [CODE4LIB] Loris

2013-11-08 Thread Jon Stroop

Bleh. You know what I meant.

On 11/8/13 5:13 PM, Jon Stroop wrote:

Whoops, wait.
I wrote a formula for Chris Thatcher to add support for IIIF 1.0 to 
add support for OSd. Then I made some changes and added support for 
1.1. Credit where credit is due

-Js

On 11/08/2013 04:40 PM, Jon Stroop wrote:

Ed,

I added support for IIIF syntax to OpenSeadragon:

https://github.com/openseadragon/openseadragon/blob/master/src/iiif1_1tilesource.js

so it just works. Not sure if Ian has cut a release recently, but 
it's on the master branch anyway.


-Js

On 11/08/2013 04:00 PM, Edward Summers wrote:

On Nov 8, 2013, at 3:05 PM, Jon Stroopjstr...@princeton.edu  wrote:

And here's a sample of the server backing OpenSeadragon[2]:http://goo.gl/Gks6lR

Thanks for sharing that Jon. Did you have to do much to get OpenSeadragon to 
talk iiif?

//Ed






Re: [CODE4LIB] Loris

2013-11-08 Thread Jon Stroop

Seriously!

On 11/8/13 6:21 PM, Michael J. Giarlo wrote:

Stick to Python, Jon. ;)


On Fri, Nov 8, 2013 at 3:17 PM, Jon Stroop jstr...@princeton.edu wrote:


Bleh. You know what I meant.


On 11/8/13 5:13 PM, Jon Stroop wrote:


Whoops, wait.
I wrote a formula for Chris Thatcher to add support for IIIF 1.0 to add
support for OSd. Then I made some changes and added support for 1.1. Credit
where credit is due
-Js

On 11/08/2013 04:40 PM, Jon Stroop wrote:


Ed,

I added support for IIIF syntax to OpenSeadragon:

https://github.com/openseadragon/openseadragon/blob/master/src/iiif1_
1tilesource.js

so it just works. Not sure if Ian has cut a release recently, but it's
on the master branch anyway.

-Js

On 11/08/2013 04:00 PM, Edward Summers wrote:


On Nov 8, 2013, at 3:05 PM, Jon Stroopjstr...@princeton.edu  wrote:


And here's a sample of the server backing OpenSeadragon[2]:http://goo.
gl/Gks6lR


Thanks for sharing that Jon. Did you have to do much to get
OpenSeadragon to talk iiif?

//Ed





Re: [CODE4LIB] Loris

2013-11-08 Thread Jon Stroop
It aims to do the same thing...serve big JP2s (and other images) over 
the web, so from that perspective, yes. But, beyond that, time will 
tell. One nice thing about coding against a well-thought-out spec is 
that are lots of implementations from which you can choose[1]--though as 
far as I know Loris is the only one that supports the IIIF syntax 
natively (maybe IIP?). We still have Djatoka floating around in a few 
places here, but, as many people have noted over the years, it takes a 
lot of shimming to scale it up, and, as far as I know, the project has 
more or less been abandoned.


I haven't done too much in the way of benchmarking, but to date don't 
have any reason to think Loris can't perform just as well. The demo I 
sent earlier is working against a very large jp2 with small tiles[1] 
which means a lot of rapid hits on the server, and between that, (a 
little bit of) JMeter and ab testing, and a fair bit of concurrent use 
from the c4l community this afternoon, I feel fairly confident about it 
being able to perform as well as Djatoka in a production environment.


By the way, you can page through some other images here: 
http://libimages.princeton.edu/osd-demo/


Not much of an answer, I realize, but, as I said, time and usage will tell.

-Js

1. http://iiif.io/apps-demos.html
2. 
http://libimages.princeton.edu/loris/pudl0052%2F6131707%2F0001.jp2/info.json



On 11/8/13 8:07 PM, Peter Murray wrote:

A clarifying question: is Loris effectively a Python-based replacement for the 
Java-based djatoka [1] server?


Peter

[1] http://sourceforge.net/apps/mediawiki/djatoka/index.php?title=Main_Page


On Nov 8, 2013, at 3:05 PM, Jon Stroop jstr...@princeton.edu wrote:


c4l,
I was reminded earlier this week at DLF (and a few minutes ago by Tom
and Simeon) that I hadn't ever announced a project I've been working for
the least year or so to this list. I showed an early version in a
lightning talk at code4libcon last year.

Meet Loris: https://github.com/pulibrary/loris

Loris is a Python based image server that implements the IIIF Image API
version 1.1 level 2[1].

http://www-sul.stanford.edu/iiif/image-api/1.1/

It can take JP2 (if you make Kakadu available to it), TIFF, or JPEG
source images, and hand back JPEG, PNG, TIF, and GIF (why not...).

Here's a demo of the server directly: http://goo.gl/8XEmjp

And here's a sample of the server backing OpenSeadragon[2]:
http://goo.gl/Gks6lR

-Js

1. http://www-sul.stanford.edu/iiif/image-api/1.1/
2. http://openseadragon.github.io/

--
Jon Stroop
Digital Initiatives Programmer/Analyst
Princeton University Library
jstr...@princeton.edu

--
Peter Murray
Assistant Director, Technology Services Development
LYRASIS
peter.mur...@lyrasis.org
+1 678-235-2955
800.999.8558 x2955


[CODE4LIB] Job: Digital Repository Software Developer at Princeton University

2013-10-11 Thread Jon Stroop
Note: this job is in Academic Services at Princeton, not in the Library, 
though we do work together from time to time. The full posting is here:


http://jobs.princeton.edu/applicants/Central?quickFind=64011

Cross-posted. Please excuse any duplicate copies you receive.

*Princeton University seeks Digital Repository Software Developer*

In September of 2011 the Faculty of Princeton University approved an 
open access policy intended to make faculty's scholarly articles 
available to a wider public. Princeton is now in the process of ramping 
up its efforts to implement the policy. These efforts will include the 
development of the repository that will hold the scholarly articles. The 
Office of Information Technology seeks a Digital Repository Software 
Developer to establish and enhance digital repositories to house 
academic publications, research data, and related digital assets.  The 
primary focus of the position will be to develop software and systems 
for collecting and depositing academic journal articles subject to 
Princeton University's Open Access Policy for Faculty Publications into 
an open access repository.  This repository will enhance both the 
preservation and dissemination of scholarship at Princeton.


The Digital Repository Software Developer will report to the Digital 
Repository Architect and will work closely with the University's 
Scholarly Communications Librarian and other IT and Library staff.


--
Jon Stroop
Digital Initiatives Programmer/Analyst
Princeton University Library
jstr...@princeton.edu


Re: [CODE4LIB] A Proposal to serialize MARC in JSON

2013-09-03 Thread Jon Stroop

It looks like it's there in pymarc as well:

https://github.com/edsu/pymarc/blob/master/pymarc/record.py#L386


On 09/03/2013 03:02 PM, Bill Dueber wrote:

I can see where you might think that no progress has been made because
the only real document of the format is that old, old blog post.

The problem, however, is not a lack of progress but a lack of documentation
of that progress. File_MARC (PHP), MARC::Record (perl), ruby-marc (ruby)
and marc4j (java) will all deal, to one extent or another, either with the
JSON directly or with a hash/map data structure that maps directly to that
JSON structure.

[BTW, can anyone summarize the state of pymarc wrt marc-in-json?]





On Tue, Sep 3, 2013 at 5:09 AM, dasos ili dasos_...@yahoo.gr wrote:


It is exactly three years back, and no real progress has been made
concerning  this proposal to serialize MARC in JSON:


http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/


Meanwhile new tools for searching and retrieving records have come in,
such as Solr and Elasticsearch. Any ideas on how one could alter (or
propose a new format) more suited to the mechanisms of these two search
platforms?

Any example implemantations would be also really appreciated,

thank you in advance






Re: [CODE4LIB] MARC record model to be inserted in mongodb

2013-07-05 Thread Jon Stroop
Have you seen Ross' post: 
http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/ 
?


pymarc can get you this json, e.g.:

```
records = pymrx.parse_xml_to_array('/path/to/some/marc.xml')

json_file = [record.as_json() for record in records]

```

or, for that matter, if you happen to be using Mongo's Python API, you 
/may/ be able to call `as_dict()` when you store the record:


```
my_mongo_collection.insert(record.as_dict())
```

It looks like ruby-marc does something similar, and presumably the Mongo 
API for Ruby uses Ruby hashes the way that the Python API uses dicts, so 
a similar approach is probably possible in Ruby.


As for ...an efficient way so as to get results with the appropriate 
queries. I guess that all depends on what you're trying to do.


-Jon

--
Jon Stroop
Digital Initiatives Programmer/Analyst
Princeton University Library
jstr...@princeton.edu



On 07/05/2013 05:47 AM, dasos ili wrote:

Could you please give us any suggestions on a data model example regarding a 
MARC record? The goal is to be able to store it in mongodb, in an efficient way 
so as to get results with the appropriate queries.

thank you in advance



Re: [CODE4LIB] Regular expression for maximum 4-digit number

2013-07-02 Thread Jon Stroop
I have zero Excel skills, but chances are you could do this with any 
scripting language if you were to export the file as text (e.g. CSV).

-Jon

On 07/02/2013 11:02 AM, Harper, Cynthia wrote:

Is there a way to return (in Excel, if possible) the largest 4-digit number (by 
word boundaries) in a string?  I've extracted the 863 fields from Millennium 
for my active periodicals, and want to find the latest year in each run.  I'm 
willing to estimate it by taking the largest 4-digit number in the string. I'm 
doing this in Excel.  Any help?

Cindy Harper
Electronic Services and Serials Librarian
Virginia Theological Seminary
3737 Seminary Road
Alexandria VA 22304
703-461-1794
char...@vts.edu


Re: [CODE4LIB] XML Parsing and Python

2013-03-05 Thread Jon Stroop

Mike,
I haven't used minidom extensively but my guess is that 
doc.toprettyxml(indent= ,encoding=utf-8) isn't actually changing the 
encoding because it can't parse the string in your content variable. I'm 
surprised that you're not getting tossed a UnicodeError, but The docs 
for Node.toxml() [1] might shed some light:


To avoid UnicodeError exceptions in case of unrepresentable text data, 
the encoding argument should be specified as “utf-8”.


So what happens if you're not explicit about the encoding, i.e. just 
doc.toprettyxml()? This would hopefully at least move your exception to 
a more appropriate place.


In any case, one solution would be to scrub the string in your content 
variable to get rid of the invalid characters (hopefully they're 
insignificant). Maybe something like this:


def unicode_filter(char):
try:
unicode(char, encoding='utf-8', errors='strict')
return char
except UnicodeDecodeError:
return ''

content = 'abc\xFF'
content = ''.join(map(unicode_filter, content))
print content

Not really my area of expertise, but maybe worth a shot
-Jon

1. 
http://docs.python.org/2/library/xml.dom.minidom.html#xml.dom.minidom.Node.toxml


--
Jon Stroop
Digital Initiatives Programmer/Analyst
Princeton University Library
jstr...@princeton.edu




On 03/04/2013 03:00 PM, Michael Beccaria wrote:

I'm working on a project that takes the ocr data found in a pdf and places it 
in a custom xml file.

I use Python scripts to create the xml file. Something like this (trimmed down 
a bit):

from xml.dom.minidom import Document
doc = Document()
Page = doc.createElement(Page)
doc.appendChild(Page)
f = StringIO(txt)
lines = f.readlines()
for line in lines:
word = doc.createElement(String)
...
word.setAttribute(CONTENT,content)
Page.appendChild(word)
return doc.toprettyxml(indent=  ,encoding=utf-8)


This creates a file, simply, that looks like this:
?xml version=1.0 encoding=utf-8?
Page HEIGHT=3296 WIDTH=2609
   String CONTENT=BuffaloLaunch /
   String CONTENT=Club /
   String CONTENT=Offices /
   String CONTENT=Installed /
   ...
/Page

I am able to get this document to be created ok and saved to an xml file. The 
problem occurs when I try and have it read using the lxml library:

from lxml import etree
doc = etree.parse(filename)


I am running across errors like XMLSyntaxError: Char 0x out of allowed range, 
line 94, column 19. Which when I look at the file, is true. There is a 0X 
character in the content field.

How is a file able to be created using minidom (which I assume would create a 
valid xml file) and then failing when parsing with lxml? What should I do to 
fix this on the encoding side so that errors don't show up on the parsing side?
Thanks,
Mike

How is the
Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


Re: [CODE4LIB] Mentorship Buddies

2012-11-28 Thread Jon Stroop

Having a sort of speed dating setup might help make better fits between 
mentors and mentees, as well.
+1, not only to satisfy the 'room full of nerds' case, but also the fact 
that people spend their free time @ code4libcon in a variety of ways, 
and not everyone might want to, e.g., wind up in the hospitality suite.



On 11/28/2012 09:45 AM, Ross Singer wrote:

On Nov 27, 2012, at 9:33 PM, Cynthia Ng cynthia.s...@gmail.com wrote:


Getting traction for mentoring online is always difficult, but what
about starting that mentorship at code4libcon?


+1 - being face-to-face might help ease the tension.

Having a sort of speed dating setup might help make better fits between 
mentors and mentees, as well.

That is, a roomful of nerds deferring passively to one another might not get us 
very far :)  Something more structured about what people want to learn and what 
mentors know and how they get along together would probably make for a more 
productive outcome.

-Ross.


Maybe almost like a buddy system, so that the first meeting between a
mentor and mentee is at a code4libcon (national, regional, or
otherwise) if possible.

This might simply be a good idea for first timers who are not going
with colleagues too.

Just throwing out some ideas here...

On Tue, Nov 27, 2012 at 7:49 PM, Nick Ruest rue...@gmail.com wrote:

Matt McCollow proposed something like this a while back. We have a page up
and everything! But, it never got much traction.

http://www.mail-archive.com/code4lib@listserv.nd.edu/msg14270.html
http://wiki.code4lib.org/index.php/Mentorship

-nruest

On 12-11-27 07:30 PM, Bess Sadler wrote:

+1 to this idea. I have benefited tremendously over the years from kind
people taking me under their wings. Many of us try to do this one-on-one,
but some kind of introduction service would be a huge benefit for the
community, I would think.

Mentorship is a great example of a robust solution - a solution that
addresses more than one problem at once. I suspect that this would not only
improve our diversity as a community, it might also solve some tech
leadership / succession planning problems and maybe expose some training
needs.

Bess

On Nov 27, 2012, at 4:20 PM, Nathan Tallman ntall...@gmail.com wrote:


This is a slightly different topic, but relates to Kelley's post: Does
code4lib have a mentor program where more inexperienced geeks can pair up
with someone to guide their development? I don't have anyone like that in
my network, but would really like to. I don't mean to discount the
existing
resources on code4lib or this list, which both have been very useful. I'm
sure I could just start by attending some of the conferences, but for
more
inexperienced people they can be a bit intimidating, albeit inspiring.

It would also be a way to directly engage minorities.

Just a thought.

Nathan


On Tue, Nov 27, 2012 at 6:20 PM, Kelley McGrath kell...@uoregon.edu
wrote:


I'll second the idea of approaching people individually and explicitly
asking them to participate. It worked on me. I never would have written
my
first article for the Code4Lib Journal or become a member of the
editorial
committee if someone hadn't encouraged me individually (Thanks
Jonathan!).

It would also be good to find a way to somehow target the pool of
lurkers
who maybe aren't already connected to someone and get them more
involved.

As far as anonymous proposals go, we recently had a very good workshop
on
implicit bias here. Someone brought up that found significant changes in
the gender proportions in symphony orchestras after candidates started
auditioning behind screens. There are also lots of studies about the
different responses to the same resume/application depending on whether
a
stereotypically male/female or white/black name was used. Probably it's
impossible to make proposals completely anonymous, but it would be an
interesting experiment to leave off the names.

Kelley

PS Interestingly, I wouldn't instinctively self-identify as a member of
the Code4Lib community, although my first thought is that that has more
to
do with not being a coder than with being a woman.


**
Kelley McGrath
Metadata Management Librarian
University of Oregon Libraries
1299 University of Oregon
Eugene, OR 97403

541-346-8232
kell...@uoregon.edu


--
-nruest


Re: [CODE4LIB] anti-harassment policy for code4lib?

2012-11-26 Thread Jon Stroop
It's sad that we have to address this formally (as formal as c4l gets 
anyway), but that's reality, so yes, bess++ indeed, and mjgiarlo++, 
anarchivist++ for the quick assist.


The responses to the list in the past couple of hours alone suggest that 
this is something much of the community would want to get behind. To 
that end, and as a show of (positive) force--not to mention how cool our 
community is--I think it might be neat if we could find a way to make 
whatever winds up being drafted something we can sign; i.e. attach our 
personal names. I don't know how that would work exactly...maybe via the 
wiki (where it seems to me a lot of good info goes to die) or the 
code4lib Github (slightly better since you could link to your 
credentials in a an environment much larger than our own, and everyone 
could have a copy), but something along those lines. I'm happy to help 
if I can.


Anyway, just a thought.
-Jon

--
Jon Stroop
Digital Initiatives Programmer/Analyst
Princeton University Library

jstr...@princeton.edu

http://pudl.princeton.edu
http://findingaids.princeton.edu


On 11/26/12 6:33 PM, Michael J. Giarlo wrote:

All,

Building on what Bess and others have written, and on the GitHub repo that
anarchivist set up, I've contributed a rough draft of a Code4Lib code of
conduct:

https://github.com/code4lib/antiharassment-policy/blob/master/code_of_conduct.md

This strawperson code of conduct is based on DLF Forum's, which is based on
the Ada Initiative's sample policy. It is modified slightly to reflect a
broader scope of the conference, conference social events, the IRC channel,
and the mailing list.

Throw darts, rinse, repeat.

-Mike


On Mon, Nov 26, 2012 at 6:10 PM, Robert Sanderson azarot...@gmail.comwrote:


+1, of course :)

You might wish to consider some further derivatives/related pages:
 http://www.diglib.org/about/code-of-conduct/
 http://wikimediafoundation.org/wiki/Friendly_space_policy
 https://thestrangeloop.com/about/policies
 http://www.apache.org/foundation/policies/anti-harassment.html

Rob



On Mon, Nov 26, 2012 at 3:57 PM, Mariner, Matthew 
matthew.mari...@ucdenver.edu wrote:


+1 for all of the below

Matthew C. Mariner
Head of Special Collections and Digital Initiatives
Assistant Professor
Auraria Library
1100 Lawrence StreetDenver, CO 80204-2041
matthew.mari...@ucdenver.edu
http://library.auraria.edu :: http://archives.auraria.edu





On 11/26/12 3:51 PM, Tom Cramer tcra...@stanford.edu wrote:


+1 for Bess's motion
+1 for Roy's expansion to C4L online interactions as well as face to

face

+1 for Karen's focus on general inclusivity and fair play


For me the hardest thing is how one monitors and resolves issues that
arise. As a group with no formal management, I suppose the conference
organizers become the deciders if such a necessity arises. If it's
elsewhere (email, IRC) -- that's a bit trickier. The Ada project's
detailed guides should help, but if there is a policy it seems that
there necessarily has to be some responsible body -- even if ad hoc.


It seems to me that there would be tremendous benefit in having

1.) an explicit statement of the community norms around harassment and
fair play in general. In the best case, this would help avoid
uncomfortable or inappropriate situations before they occur.

2.) a defined process for handling any incidents that do arise, which in
the case of this community I would imagine would revolve around
reporting, communication, negotiation and arbitration rather than
adjudication by a standing body (which I agree is hard to see in this
crowd). I know several high schools have adopted peer arbitration
networks for conflict resolution rather than referring incidents to the
Principal's Office--perhaps therein lies a model for us for any

incidents

that may not be resolved simply through dialogue.

- Tom



On Nov 26, 2012, at 2:32 PM, Karen Coyle wrote:


Bess and Code4libbers,

I've only been to one c4l conference and it was a very positive
experience for me, but I also feel that this is too valuable of a
community for us to risk it getting itself into crisis mode over some
unintended consequences or a bad apple incident. For that reason I
would support the adoption of an anti-harassment policy in part for its
consciousness-raising value. Ideally this would be not only about

sexual

harassment but would include general goals for inclusiveness and fair
play within the community. And it would also serve as an acknowledgment
that none of us is perfect, but we can deal with it.

For me the hardest thing is how one monitors and resolves issues that
arise. As a group with no formal management, I suppose the conference
organizers become the deciders if such a necessity arises. If it's
elsewhere (email, IRC) -- that's a bit trickier. The Ada project's
detailed guides should help, but if there is a policy it seems that
there necessarily has to be some responsible body -- even if ad hoc.

kc


On 11/26/12 2:16 PM, Bess

Re: [CODE4LIB] extracting tiff info

2012-11-19 Thread Jon Stroop
If you want everything in that RDF, you're probably wanting to extract 
the XMP data. Have a look at exiv2: http://www.exiv2.org/


Basically:

 exiv2 -px your_image.tif

will dump what you want to stdout.
-Jon

--
Jon Stroop
Digital Initiatives Programmer/Analyst
Princeton University Library

On 11/19/2012 04:31 PM, Kyle Banerjee wrote:

Howdy all,

I need to extract all the metadata from a few thousand images on a network
drive and put it into spreadsheet. Since the files are huge (each is
100MB+) and my connection isn't that fast, I strongly prefer to not move
them before working on them -- i.e. I'm using cygwin and/or windows.

Just eyeballing these things, I see the headers contain everything I need
in purty rdf. What's the best way to extract this? I thought tiffinfo would
do the trick, but it's just giving me technical info. Of course I can just
parse the files with perl but I'm thinking there just has to be a slicker
way to do this. What's my best option? Thanks,

kyle


Re: [CODE4LIB] haititrust

2012-08-03 Thread Jon Stroop
You can do an empty query in their catalog, and use the Original 
Location facet to filter to a holding library. Programatically, I'm not 
sure, but you'd probably need to use the Hathi files: 
http://www.hathitrust.org/hathifiles.


-Jon

On 08/03/2012 11:07 AM, Eric Lease Morgan wrote:

If I needed/wanted to know what materials held by my library were also in the 
HaitTrust, then programmatically how could I figure this out? In other words, 
do you know of a way to query the HaitTrust and limit the results to items my 
library owns? --Eric Lease Morgan


Re: [CODE4LIB] My crazed idea about dealing with registration limitations

2011-12-22 Thread Jon Stroop
Maybe keynotes happen on the middle day; the one time where the whole 
group comes together, though it would require a 2x size space... This 
could also reduce the length to 4.5 days.


On 12/22/2011 10:05 AM, Peter Murray wrote:

That is a crazy idea.  I don't know about putting the speakers on the hook for 
two days -- particularly keynote speakers.  Still, it would be interesting for 
a site to flesh this out and propose something along these lines.


Peter

On Dec 21, 2011, at 6:44 PM, Fleming, Declan wrote:

Hi - so I know this is nuts.

If we start with a couple premises for the code4lib conference:

1.  Single thread is crucial.
2.  250 is about the top limit of a single threaded conference.
3.  400+ people want to attend.
4.  The conference takes 2.5 days.

What if we ran the 2.5 day conference twice in one week?

1.  Session 1 runs from Monday until noon on Weds.
2.  Session 2 runs from 1p on Weds until the end of Friday.
3.  Every one of the 23 accepted talks is given twice, once in each Session, in 
the same order.
4.  Each Session is attended by a different set of attendees.

We could serve 500 attendees this way.

If everyone came for the week, there could be parallel seminars, hack fests, 
BootCamps, THATcamps, CURATEcamps, c4lcamps, etc... for the half of the 500 
that wasn't in the main conference.  People could also just decide to come for 
the 2.5 day main conference, I guess.

I SAID it was crazy.  ;)

D





Re: [CODE4LIB] Models of MARC in RDF

2011-11-28 Thread Jon Stroop
You may know about this one already, but the BL exposed the British 
National Bibliography as RDF last summer. The project has a page[1] with 
a good amount of info--the data model[2] might be a good place to start.

-Jon

1. http://www.bl.uk/bibliographic/datafree.html
2. http://www.bl.uk/bibliographic/pdfs/datamodelv1_01.pdf

On 11/26/2011 10:58 AM, Karen Coyle wrote:
A few of the code4lib talk proposals mention projects that have or 
will transform MARC records into RDF. If any of you have documentation 
and/or examples of this, I would be very interested to see them, even 
if they are under construction.


Thanks,
kc



--
Jon Stroop
Metadata Analyst
Firestone Library
Princeton University
Princeton, NJ 08544

Email: jstr...@princeton.edu
Phone: (609)258-0059
Fax: (609)258-0441

http://pudl.princeton.edu
http://findingaids.princeton.edu
http://www.cpanda.org


[CODE4LIB] Fwd: [semweb-25] Metropolitan Musem of Art hiring a Semantic Web Developer

2011-11-28 Thread Jon Stroop

May be of interest to someone on this list.

 Original Message 
Subject: 	[semweb-25] Metropolitan Musem of Art hiring a Semantic Web 
Developer

Date:   Thu, 24 Nov 2011 11:01:27 -0500
From:   don undeen donund...@yahoo.com
Reply-To:   semweb...@meetup.com
To: semweb...@meetup.com



Hello,
Hoping that this isn't a spam, but the Metropolitan Museum of Art's 
Digital Media Department is hiring for an Information Systems Developer.
This position will be involved in advanced data architecture solutions, 
to support a variety of web and in-gallery technology.


This work may entail:
- Setting up and administering triple stores, NoSQL dbs, and CMSs like 
Drupal

- designing interfaces, modules, and workflows for same
- Implementing collective intelligence algorithms,
- experimenting with new technologies, developing prototypes and 
proofs-of-concept
- and (to be honest) some drudgery, like data delivery, ETL, and report 
generation


See the application on linkedin, here:
http://www.linkedin.com/jobs?viewJob=jobId=2157751srchIndex=0trk=njsrch_hitsgoback=%2Efjs_information+systems+developer_*1_*1_I_us_*1_*1_1_R_true_*2_*2_*2_*2_*2_*2_*2_*2 
http://www.linkedin.com/jobs?viewJob=jobId=2157751srchIndex=0trk=njsrch_hitsgoback=%2Efjs_information+systems+developer_*1_*1_I_us_*1_*1_1_R_true_*2_*2_*2_*2_*2_*2_*2_*2 



I know many of you do more than just SemWeb work, and many of you are on 
this list because you like to find new ways to tackle vexing problems. 
That's what we're looking for.


If you choose to submit a resume, please send it to the email address 
provided, but also cc me:

don.und...@metmuseum.org

I look forward to hearing from you.

yours,
Don Undeen
Manager, Media Lab
Digital Media Department
Metropolitan Museum of Art




--
Please Note: If you hit *REPLY*, your message will be sent to 
*everyone* on this mailing list (semweb...@meetup.com 
mailto:semweb...@meetup.com)
This message was sent by don undeen (donund...@yahoo.com) from Lotico 
New York Semantic Web http://www.meetup.com/semweb-25/.
To learn more about don undeen, visit his/her member profile 
http://www.meetup.com/semweb-25/members/6026658/
To unsubscribe or to update your mailing list settings, click here 
http://www.meetup.com/semweb-25/settings/


Meetup, PO Box 4668 #37895 New York, New York 10163-4668 | 
supp...@meetup.com


--
Jon Stroop
Metadata Analyst
Firestone Library
Princeton University
Princeton, NJ 08544

Email: jstr...@princeton.edu
Phone: (609)258-0059
Fax: (609)258-0441

http://pudl.princeton.edu
http://findingaids.princeton.edu
http://www.cpanda.org


Re: [CODE4LIB] TIFF Metadata to XML?

2011-07-18 Thread Jon Stroop

Edward,
JHOVE (1)  should be able to do this, and I believe you can pass the 
included shell script a directory and have it extract data for 
everything it finds and can parse inside.

-Jon

On 07/18/2011 09:18 AM, Edward M. Corrado wrote:

Hello All,

Before I re-invent the wheel or try many different programs, does
anyone have a suggestion on a good way to extract embedded Metadata
added by cameras and (more importantly) photo-editing programs such as
Photoshop from TIFF files and save it as as XML? I have  60k photos
that have metadata including keywords, descriptions, creator, and
other fields embedded in them and I need to extract the metadata so I
can load them into our digital archive.

Right now, after looking at a few tools and having done a number of
Google searches and haven't found anything that seems to do what I
want. As of now I am leaning towards extracting the metadata using
exiv2 and creating a script (shell, perl, whatever) to put the fields
I need into a pseudo-Dublin Core XML format. I say pseudo because I
have a few fields that are not Dublin Core. I am assuming there is a
better way. (Although part of me thinks it might be easier to do that
then exporting to XML and using XSLT to transform the file since I
might need to do a lot of cleanup of the data regardless.)

Anyway, before I go any further, does anyone have any
thoughts/ideas/suggestions?

Edward


Re: [CODE4LIB] MARCXML to MODS: 590 Field

2011-05-19 Thread Jon Stroop

I'm going to guess that it's because 59x fields are defined for local use:

http://www.loc.gov/marc/bibliographic/bd59x.html

...but someone from LC should be able to confirm.
-Jon

--
Jon Stroop
Metadata Analyst
Firestone Library
Princeton University
Princeton, NJ 08544

Email: jstr...@princeton.edu
Phone: (609)258-0059
Fax: (609)258-0441

http://pudl.princeton.edu
http://diglib.princeton.edu
http://diglib.princeton.edu/ead
http://www.cpanda.org/cpanda



On 05/19/2011 11:45 AM, Richard, Joel M wrote:

Dear hive-mind,

Does anyone know why the Library of Congress-supplied MARCXML to MODS XSLT [1] 
does not handle the MARC 590 Local Notes field? It seems to handle everything 
else, not that I've done an exhaustive search... :)

Granted, I could copy/create my own XSLT and add this functionality in myself, 
but I'm curious as to whether or not there's some logic behind this decision to 
not include it. Logic that I would not naturally understand since I'm not 
formally trained as a librarian.

Thanks!
--Joel

[1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-4.xsl


Joel Richard
IT Specialist, Web Services Department
Smithsonian Institution Libraries | http://www.sil.si.edu/
(202) 633-1706 | richar...@si.edu


[CODE4LIB] [Reminder] Job posting: Princeton University Library, Library Application and Database Manager/Developer

2010-02-17 Thread Jon Stroop

Library Application and Database Manager/Developer

Princeton University Library
Requisition # 165

The Princeton University Library, one of the world's most respected research
institutions, serves a diverse community of 7,200 students and 1,100 faculty
members, with more than 6 million printed volumes, 5 million manuscripts, and 2
million nonprint items. The holdings in its central library and 9 specialized
libraries range from ancient papyri and incunabula to the most advanced
electronic databases and digital collections. The Library employs a dedicated
and knowledgeable staff of more than 300 professional and support personnel,
complemented by a large student and hourly workforce. More information can be
found at the Library's Web site:  http://libweb.princeton.edu/

Description: Princeton University Library seeks a Library Application Database
Manager/Developer to maintain, enhance and create applications for Library users
and staff. In addition, this person will help develop web based library services
for patrons and help with implementation and enhancements to Library NextGen
user interfaces.

Responsibilities: The primary function for this position is to maintain and
enhance current locally developed library applications and to create new ones as
new needs arise. The locally developed applications include various specialized
catalogs, specialized user applications such as E-Reserves and Audio-Reserves,
and internal workflow applications for managing staff travel, staff lists, and
guest access to the libraries, and many more. This position will also help with
library web services development, including maintaining and creating machine to
machine interfaces, as well as user interfaces. This position will also be
assigned other projects as needed.

Qualifications:
Required:  BA or BS from an accredited institution. At least 3 years of
experience working with: Visual Basic and the .NET Framework and application
development within the Visual Studio IDE (integrated development environment).
Experience developing Web applications for the MS Internet Information Server.
Knowledge of Perl, Javascript, relational database design and the MS Access and
SQL Server database platforms and queries. Experience with library
bibliographic, holdings and authorities data in MARC and XML format. Knowledge
of HTML, XSL and XSLT formats. Excellent communications skills, analytical and
problem solving abilities and the ability to collaborate with colleagues and
other staff as needed. Ability to function with minimal supervision.

Preferred: Familiarity and experience with: MySql, PHP, Java, Python, REST
principles, UNIX operating systems. Masters degree in Computer or Library
Science and previous experience in library systems and application development.
Previous project management experience preferred.

Compensation and Benefits:
Compensation will be competitive and commensurate with experience and
accomplishments.  Twenty-four (24) vacation days a year, plus eleven (11) paid
holidays. Annuity program (TIAA/CREF), group life insurance, health coverage
insurance, disability insurance, and other benefits are available.

Nominations and Applications:
Review of applications will begin immediately and will continue until the
position is filled. Nominations and applications (cover letter, resume and the
names, titles, addresses and phone numbers of three references) will be accepted
only from the Jobs at Princeton website: http://www.princeton.edu/jobs

PRINCETON UNIVERSITY IS AN EQUAL OPPORTUNITY/AFFIRMATIVE ACTION EMPLOYER.

For information about applying to Princeton, please link to
http://www.princeton.edu/jobs


[CODE4LIB] Job posting: Princeton University Library, Library Application and Database Manager/Developer

2010-01-28 Thread Jon Stroop

Dear code4libbers,
FYI.
I can try to answer questions about this position, but have limited 
interactions with this office.

-Jon

--
Library Application and Database Manager/Developer
Princeton University Library
Requisition # 165


The Princeton University Library, one of the world's most respected 
research institutions, serves a diverse community of 7,200 students and 
1,100 faculty members, with more than 6 million printed volumes, 5 
million manuscripts, and 2 million nonprint items. The holdings in its 
central library and 9 specialized libraries range from ancient papyri 
and incunabula to the most advanced electronic databases and digital 
collections. The Library employs a dedicated and knowledgeable staff of 
more than 300 professional and support personnel, complemented by a 
large student and hourly workforce. More information can be found at the 
Library's Web site:  http://libweb.princeton.edu/


Description: Princeton University Library seeks a Library Application 
Database Manager/Developer to maintain, enhance and create applications 
for Library users and staff. In addition, this person will help develop 
web based library services for patrons and help with implementation and 
enhancements to Library NextGen user interfaces.


Responsibilities: The primary function for this position is to maintain 
and enhance current locally developed library applications and to create 
new ones as new needs arise. The locally developed applications include 
various specialized catalogs, specialized user applications such as 
E-Reserves and Audio-Reserves, and internal workflow applications for 
managing staff travel, staff lists, and guest access to the libraries, 
and many more. This position will also help with library web services 
development, including maintaining and creating machine to machine 
interfaces, as well as user interfaces. This position will also be 
assigned other projects as needed.


Qualifications:
Required:  BA or BS from an accredited institution. At least 3 years of 
experience working with: Visual Basic and the .NET Framework and 
application development within the Visual Studio IDE (integrated 
development environment). Experience developing Web applications for the 
MS Internet Information Server. Knowledge of Perl, Javascript, 
relational database design and the MS Access and SQL Server database 
platforms and queries. Experience with library bibliographic, holdings 
and authorities data in MARC and XML format. Knowledge of HTML, XSL and 
XSLT formats. Excellent communications skills, analytical and problem 
solving abilities and the ability to collaborate with colleagues and 
other staff as needed. Ability to function with minimal supervision.


Preferred: Familiarity and experience with: MySql, PHP, Java, Python, 
REST principles, UNIX operating systems. Masters degree in Computer or 
Library Science and previous experience in library systems and 
application development. Previous project management experience preferred.


Compensation and Benefits:
Compensation will be competitive and commensurate with experience and 
accomplishments.  Twenty-four (24) vacation days a year, plus eleven 
(11) paid holidays. Annuity program (TIAA/CREF), group life insurance, 
health coverage insurance, disability insurance, and other benefits are 
available.


Nominations and Applications:
Review of applications will begin immediately and will continue until 
the position is filled. Nominations and applications (cover letter, 
resume and the names, titles, addresses and phone numbers of three 
references) will be accepted only from the Jobs at Princeton website: 
http://www.princeton.edu/jobs


PRINCETON UNIVERSITY IS AN EQUAL OPPORTUNITY/AFFIRMATIVE ACTION EMPLOYER.

For information about applying to Princeton, please link to
http://www.princeton.edu/jobs


Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-19 Thread Jon Stroop

Godmar,
We're using eXist for a couple of apps here, and like it quite a bit.

The full text search extensions in the 1.4 release are backed by Lucene, 
and it's pretty quick once you've tuned it (try some searches here: 
http://diglib.princeton.edu/ead/ -- this is running on a beta of 1.4) 
and set up the indexing properly. Performance will not be good until 
you've configured some indexes and tweaked the JVM settings. There is a 
bit of a learning curve involved here, but the documentation is decent, 
and the community and developers are quite active and accessible.


You can GET and PUT and DELETE documents very easily, or POST xqueries 
to get fragments.  You can also GET fragments or documents by supplying 
parameters to an xquery stored in the database--they call this their 
REST-style API[1].  There are a few other ways to get content in and 
out[2], and Java integration isn't a problem via the xml:db API[3].  You 
can also write extension modules in Java.


-Jon

1. http://exist.sourceforge.net/devguide_rest.html
2. http://exist.sourceforge.net/devguide.html
3. http://exist.sourceforge.net/devguide_xmldb.html


On 01/16/2010 11:15 AM, Godmar Back wrote:

Hi,

we're currently looking for an XML database to store a variety of
small-to-medium sized XML documents. The XML documents are
unstructured in the sense that they do not follow a schema or DTD, and
that their structure will be changing over time. We'll need to do
efficient searching based on elements, attributes, and full text
within text content. More importantly, the documents are mutable.
We'll like to bring documents or fragments into memory in a DOM
representation, manipulate them, then put them back into the database.
Ideally, this should be done in a transaction-like manner. We need to
efficiently serve document fragments over HTTP, ideally in a manner
that allows for scaling through replication. We would prefer strong
support for Java integration, but it's not a must.

Have other encountered similar problems, and what have you been using?

So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ),
Base-X (http://www.basex.org/ ), MonetDB/XQuery
(http://www.monetdb.nl/XQuery/ ), Sedna
(http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few
others here: http://en.wikipedia.org/wiki/XML_database
I'm wondering to what extent systems such as Lucene, or even digital
object repositories such as Fedora could be coaxed into this usage
scenario.

Thanks for any insight you have or experience you can share.

  - Godmar
   


--
Jon Stroop
Metadata Analyst
C-17-D2 Firestone Library
Princeton University
Princeton, NJ 08544

Email: jstr...@princeton.edu
Phone: (609)258-0059
Fax: (609)258-0441

http://diglib.princeton.edu
http://diglib.princeton.edu/ead


Re: [CODE4LIB] djatoka

2008-11-14 Thread Jon Stroop
Another possibility, if no one steps up with a presentation, would be a 
'hacking djatoka' [pre\-un]*conference activity.

-Jon

Jon Stroop
Metadata Analyst
C-17-D2 Firestone Library
Princeton University
Princeton, NJ 08544

Email: [EMAIL PROTECTED]
Phone: (609)258-0059
Fax: (609)258-0441

http://diglib.princeton.edu
http://diglib.princeton.edu/ead



Kevin S. Clarke wrote:

On Fri, Nov 14, 2008 at 6:10 AM, Birkin James Diana
[EMAIL PROTECTED] wrote:

  

If any of you have had the good fortune to experiment with it or implement
it into some workflow, get over to the code4libcon09 presentation-proposal
page pronto! And if you're as jazzed about it as I am, and know it'll be as
big in our community as I think it will, consider a pre-conf proposal, too.



Birkin++

I'm very interested in this, too, and hope someone who has had a
chance to play with djatoka will volunteer to do a pres (or better a
pre-conf!)  You'd definitely get my vote.

Kevin


  


Re: [CODE4LIB] presentation files

2008-03-04 Thread Jon Stroop

+1

Jon Stroop
Digital Library Specialist
C-19-E Firestone Library
Princeton University
Princeton, NJ 08544

Email: [EMAIL PROTECTED]
Phone: (609)258-0059
Fax: (609)258-0441

http://diglib.princeton.edu



Bess Sadler wrote:

On Mar 4, 2008, at 9:44 AM, Dan Scott wrote:


There seemed to be general support on IRC for using the Internet
Archive as the destination of choice for the code4lib videos, but
perhaps this is a good time to call for broader discussion.

Dan


+1

I think this is a great idea.

Bess


[CODE4LIB] Position Announcement

2008-01-24 Thread Jon Stroop

(Please excuse cross-postings)

Digital Library Specialist (Temporary 12 month position)
Princeton University Library
Requisition # 0700801

The Princeton University Library, one of the world's most respected
research institutions, serves a diverse community of 6,600 students and
1,100 faculty members with more than 6 million printed volumes, 5
million manuscripts, and 2 million nonprint items. The holdings in its
central library and 15 specialized libraries range from ancient papyri
and incunabula to the most advanced electronic databases and digital
collections. The Library employs a dedicated and knowledgeable staff of
more than 300 professional and support personnel, complemented by a
large student and hourly workforce. More information can be found at the
Library's Web site: http://libweb.princeton.edu

Available: Immediately

Description: Princeton University Library seeks a creative professional
to fill the temporary position of Digital Library Specialist. This
position will be responsible for the programming and improvement of the
Library’s Digital Collections' website and related web services. This is
a temporary twelve (12) month position with the possibility of renewal.

Responsibilities: The Digital Library Specialist will be instrumental in
assisting with development needs in a Java/XML environment. This
position will help develop future software and middleware, as well as
support existing technologies and applications (in Java and Ruby). The
candidate should have experience in using Java to process XML in the
library environment and an understanding of XML's related standards and
technologies. The incumbent may also assist with the provision of XQuery
and XSLT stylesheets to generate XML and XHTML markup for a variety of
library-specific outputs. The successful candidate will report to the
Digital Initiatives Coordinator. Job duration will be twelve (12) months
with possibility of renewal.

Qualifications:
Required: Bachelors degree from an accredited institution. Experience
programming XML using the Java programming language. Demonstrated
knowledge and application of XML standards (XPath, XML Schema, RELAX NG,
and XML Namespaces). Familiarity with web usability trends and knowledge
of XHTML, CSS, and W3C Web Usability Guidelines. Experience with Linux
and/or UNIX. Must have excellent oral and written communication skills.
Ability to work collaboratively and collegially in a team and with
diverse groups. Experience in a production oriented environment.

Preferred: MLS/MIS degree from an accredited institution. Experience
with native XML databases (e.g., Exist-db, Berkeley DB XML, X-Hive,
etc.), Eclipse IDE, Maven, Jetty/Tomcat, and XQuery. Experience with
Ruby. Knowledge of AJAX. Knowledge of library metadata standards.

Compensation and Benefits: Salary based on experience and
qualifications. Two (2) vacation days per month, designated paid
holidays. Medical and other benefits available.

Nominations and Applications: Review of applications will begin
immediately and will continue until the position is filled. Nominations
and applications (cover letter, resume and the names, titles, addresses
and phone numbers of three references) will be accepted only from the
Jobs at Princeton website: http://www.princeton.edu/jobs

PRINCETON UNIVERSITY IS AN EQUAL OPPORTUNITY/AFFIRMATIVE ACTION EMPLOYER.
For information about applying to Princeton, please link to
http://www.princeton.edu/jobs

--
Jon Stroop
Digital Library Specialist
C-19-E Firestone Library
Princeton University
Princeton, NJ 08544

Email: [EMAIL PROTECTED]
Phone: (609)258-0059
Fax: (609)258-0441

http://diglib.princeton.edu