[CODE4LIB] PDF manipulation

2012-08-06 Thread Yong Tang

Hi,

I am a full time information science student and a part time LAMP server 
administrator. I was recently thrown into a file dumpster containing 
hundreds of old PDF files. My job is to clearn the dumpster up by 
putting right files into right folders.  I am facing some difficulties 
when writing a Perl script to get the job done. I would appreciate it if 
you could help.


First of all, what tool /tools do you use to manipulate PDF file 
directly in a script? I tried some Perl modules such as CAM::PDF and 
PDF::API2. The results were not pretty. The original text format was lost.


I am regret that I did not take a XML class last semester, for I just 
get an intuition that the best way to do this job is to save the PDFs 
into XMLs, and then work on the XMLs with script. Instead, I have to 
save the PDFs into plain texts. I found PDFedit and Adobe Acrobat X Pro 
were good because both of them kept original text format after the 
conversion. However, I have no idea how to use them to save multiple 
PDFs into plain texts at once.  I googled for the answers but found no 
luck.  Anybody knows how to do it?


I am new to text processing. Maybe I am heading in a wrong direction for 
this project? Any input is appreciated.


Yong Tang
A student


[CODE4LIB] Job: Digital Scholarship Outreach Librarian at Indiana University-Purdue University Indianapolis

2012-08-06 Thread jobs
IUPUI University Library has an opening for a full-time, tenure track Digital
Scholarship Outreach Librarian. Minimum salary is $42,000 with benefits
including 22 vacation days, health/dental insurance options, and retirement
contributions to either TIAA-CREF or Fidelity plans. Final position offer is
contingent upon the continued availability of funding. The
anticipated start date is January 2013.

  
POSITION DESCRIPTION

  
The Digital Scholarship Outreach Librarian is a people-focused individual who
will thrive working in a team-oriented environment to develop and coordinate
digital scholarship projects through active cultivation of partnerships with
internal and external constituents. The Librarian will assist in promoting the
Program of Digital Scholarship's services and actively disperse this message
to IUPUI and our civic partners within Indianapolis and beyond. Additionally,
the successful candidate will assume subject liaison responsibilities for the
IUPUI Department of Geography.

  
Primary Duties

  
The Digital Scholarship Outreach Librarian will be responsible for:

  * Seeking and managing received external funding to support the creation of 
digital projects
  * Developing digital projects in collaboration with internal and external 
partners
  * Overseeing a digitization team of 4-6 part time staff
  * Working with a variety of digital management systems
  * Actively communicating with Library subject liaisons to academic 
departments about the Program of Digital Scholarship and its possible 
applications in the disciplines they cover
  * Developing publicity materials in collaboration with the Director of 
External Relations to inform target audiences and the public at large about the 
Program of Digital Scholarship
  * Organizing small programs or conferences for digital project stakeholders 
to explore possibilities for collaborative projects and strategies for 
statewide standards, resource sharing, and evaluation
Secondary Duties

  
As liaison to the Department of Geography, the Librarian will be responsible
for:

  * Selecting print and electronic information resources that support the 
Department's teaching and scholarship
  * Providing information literacy and skills instruction
  * Providing research consultations upon request
  * Supporting transformation of scholarly output into sustainable 
communication models and formats
  * Participating in the provision of reference services
  
EDUCATION AND EXPERIENCE

  
Minimum Qualifications:

  * Earned Master's degree from an ALA-accredited Library/Information Science 
program.
  * Experience with digital publishing tools such as: Dspace, OJS, Fedora, 
Islandora, ContentDM, Drupal, WordPress, bepress, etc.
  * Demonstrated service commitment
  * Experience selecting and writing grants and managing successfully attained 
grant funds
  * Demonstrated ability to achieve results in a collaborative work environment 
including demonstrated ability to collaborate with external partners to 
accomplish a common goal
  
Desired Qualifications:

  * Experience collaborating with university faculty and other campus 
stakeholders
  * Experience with collection development and information literacy instruction
  * Excellent oral and written communication skills
  * Educational background in relevant field
  * Understanding of GIS and its applications
  
ENVIRONMENT

  
Indiana University Purdue University Indianapolis (IUPUI) is an urban research
and academic health science university with 20 schools and academic units.
Located in downtown Indianapolis, a diverse and affordable mid-sized city,
IUPUI enrolls more than 30,000 students. The vision of the IUPUI University
Library is to be the innovative leader among urban university libraries. It
has a staff of approximately 80 and a budget in excess of $9.5 million.

  
TO APPLY

  
Please forward letter of application, resume, and contact information for four
references (include name, title, address, telephone number, and email address)
electronically to Teresa McCurry at the following email address:
uli...@iupui.edu. Applications will be accepted until September 3, 2012. IUPUI
University Library is an Affirmative Action/Equal Opportunity employer that
values and encourages diversity in its students, faculty and staff.

  
Email uli...@iupui.edu to apply for this job.



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/1793/


[CODE4LIB] Job: Digital Initiatives Librarian at Georgia State University

2012-08-06 Thread jobs
Georgia State University Library seeks an enthusiastic, collegial, self-
starter to serve as the manager of the Digital Archive @ Georgia State
University, the University's institutional repository, which highlights the
research and scholarly productivity of members of the University community.
The University Library administers the archive to collect, organize,
disseminate, and preserve the digital scholarly output of Georgia State
University faculty, students and staff. The repository is
hosted on the Digital Commons platform and currently includes ETDs, conference
materials and journals.

  
The Digital Initiatives Librarian will be responsible for overseeing services
related to the Library's institutional repository, including but not limited
to:

  * Identifying and recruiting new content, and editing existing content as 
needed
  * Developing, implementing and evaluating marketing and promotion efforts
  * Providing support for the library's scholarly communication activities
  * Raising awareness of Open Access to the GSU community
  * Providing Library and University faculty with information, training and 
assistance in depositing materials into the Archive, including establishing and 
maintaining Selected Works author pages
  * Analyzing policies, procedures and workflows to insure consistency and 
accuracy of metadata
  * Supporting established institutional repository guidelines, including 
submission rules and the development of new communities and series
  * Creating new guidelines as necessary
  * This position reports to the Associate Dean for Collections.
  
Qualifications:

  
REQUIRED:

  * ALA-accredited Master's degree in Library and/or Information Science
  * Excellent communication, presentation, and interpersonal skills
  * Familiarity with current trends and emerging issues regarding copyright, 
open access and scholarly communication as they relate to institutional 
repositories
  * Knowledge of current metadata schemas, standards, and digital content 
management systems
  * Coursework or experience resulting in knowledge of principles and practices 
governing institutional repositories
  * Academic library experience
  * Experience working directly with faculty at a research university
  * Ability to meet requirements for promotion
  * Ability to pass a background check
  
PREFERRED:

  * Experience working with Digital Commons
  * Project management experience
  * Experience developing, implementing and evaluating marketing and promotion 
efforts
  
Salary and Rank

Minimum salary of $48,000 for 12 months. Salary is
commensurate with the candidate's education and experience.
Appointment at a faculty rank, on a contract renewal basis.

  
Submit a cover letter addressing the above qualifications; resume; name,
address and phone number of three references, including immediate
supervisor. Review of materials will begin September 7,
2012 and continue until the position is filled. Send
materials to:

  
Human Resources Coordinator

University Library

Georgia State University

100 Decatur Street, SE, Atlanta, GA 30303-3202

(404) 413-2700

lib...@gsu.edu



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/1792/


[CODE4LIB] Job: Librarian - Electronic Resources at College of Western Idaho

2012-08-06 Thread jobs
The College of Western Idaho seeks a creative, enthusiastic librarian to play
an integral role in building and maintaining a vibrant and user-friendly suite
of electronic resources at a young, rapidly growing community college library.
The position will work closely with the Director and other library staff to
create and realize a vision of the ideal academic library, with a particular
focus on electronic resources management. The successful candidate will be
challenged to optimize the Library's web presence, ILS, online databases, and
ebook/ejournal collection, to integrate electronic resources across campus,
and to ensure ongoing continuous improvement. The position will also
participate in reference, instruction, and collection development activities.

  
The College of Western Idaho, the first community college in southwestern
Idaho, was established in 2007 and began offering courses in 2009. The
institution offers expansive opportunities for professional growth and
development in a welcoming community. CWI Library emphasizes agility and
innovation in support of the teaching and learning goals of the college.

  
This is a full-time, exempt position with a hiring range of $35,200 to
$44,500.

  
Essential Functions

  * Manage the library's web presence using Drupal, LibGuides, and social 
media, and in coordination with library staff and the campus Communications 
department.
  * Oversee all aspects of electronic resources acquisition and stewardship, 
including vendor negotiations, licensing, renewal, configuration, optimization, 
technical support, and assessment for the Library's ILS, online databases, and 
ebook/ejournal collection.
  * Collaborate with the campus IT department to ensure optimal user 
authentication for library electronic resources.
  * Monitor trends in electronic resources and emerging technologies to make 
creative recommendations for improvements and future directions.
  * Assist students, faculty, and staff with research questions in person and 
via email, chat, and phone.
  * Provide information literacy instruction for individuals and courses.
  * Select and recommend the purchase of print and electronic resources to 
increase the library's collection in designated areas.
  * Convey by words and actions the values expected by CWI. Other duties as 
assigned.
  * Master's in Library Science from an American Library Association accredited 
program (in hand or anticipated by December 2012)
  * Experience in the management or configuration of electronic resources
  * Strong customer service orientation
  * Ability to work independently, solve problems creatively, and take 
initiative with minimal supervision
  * Knowledge of current trends and techniques in librarianship
  * Experience with OCLC's WorldShare Management Services, WorldCat Local, 
Knowledge Base link resolver, EZProxy Hosted authentication service, LibGuides, 
Question Point and/or DrupalAcademic library experience
  * Public services experience
  * Higher education teaching experience
  * Track record of innovation and creativity in providing library services



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/1787/


[CODE4LIB] Job: UI Engineer - Lead at PeerJ

2012-08-06 Thread jobs
The user experience has been an after-thought in most academic journals. We
believe that every interaction and experience needs to be iterated upon,
studied, and made pixel perfect if we are going to accelerate scientific
communication. It could be as simple as submitting a form, browsing on a
mobile device, or as complex as designing an entire workflow, but it needs to
put the user at the center. We are looking for an experienced UI engineer who
is looking to apply their past experience to lead the development of PeerJ's
look and interactive workflows.

  
The ideal UI engineering lead has ample experience in user-driven and tested
design, user interaction design, demonstrable knowledge in the differences of
behavior on mobile vs. desktop, knowledge of python, ruby or PHP frameworks,
excels at Photoshop, CSS, Ajax, HTML and cross-browser support. You take pride
in not only knowing, but also in contributing to best-practices within your
trade and owning projects and milestones.

  
Send us your CV, examples of work, and describe how you would change academic
publishing to j...@peerj.com. No recruiting agencies.



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/1791/


[CODE4LIB] Job: Backend Applications Engineer at PeerJ

2012-08-06 Thread jobs
Located close to the "Silicon Roundabout" in the Shoreditch area of London, we
are looking for a seasoned professional to join one of the most exciting
startups happening at the intersection of tech and science today. You'll be
helping to shape the future of PeerJ's systems and define our architecture. As
one of the first engineers on board, you'll help set the tone for all future
hires and how things are accomplished. If you enjoy setting up and configuring
Redis instances just as much as debugging loggers, then this is the job for
you. This is a fantastic opportunity to be part of an exciting time in the
history of science as we rethink how science should be communicated. You'll
work side by side with a technical CEO who knows what it is like to be a
developer and writes code daily.

  
The ideal candidate brings expert knowledge or experience in working with AWS,
database management (both SQL and noSQL), and SaaS. Additional background in
python or PHP frameworks is desired, so that you can help hack your way around
tying our services together.

  
Send us your CV and describe how you would change academic publishing to
j...@peerj.com. No recruiting agencies.

  
[http://peerj.com/careers/](http://peerj.com/careers/)



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/1790/


[CODE4LIB] XTF 3.1 Now Available

2012-08-06 Thread Lisa Schiff
FOR IMMEDIATE RELEASE

Contact: Lisa Schiff
California Digital Library
University of California, Office of the President
415 20th St., 4th Floor
Oakland, CA 94612
(510) 987-0881
lisa.sch...@ucop.edu
http://xtf.cdlib.org

California Digital Library Announces Release of XTF Version 3.1

Oakland, CA, August 6, 2012 - The California Digital Library (CDL) is pleased 
to announce the release of version 3.1 of XTF 
(http://xtf.cdlib.org/), an open source, highly flexible software application 
that supports the search, browse and display of heterogeneous digital content.  
XTF provides efficient and practical methods for creating customized end-user 
interfaces for distinct digital content collections and is used by institutions 
worldwide.

Major features in the 3.1 release include:

* Improved schema handling for EAD finding aids.  In addition to EAD 
2002 DTD,  XTF now provides support for search and display of:

oEAD 2002 schema and EAD 2002 RelaxNG finding aids

oOutput from Archivists' Toolkit and Archon

  *   Better OAI 2.0 conformance
  *   Dynamic site maps to support optimal search engine indexing
See the 3.1 change log 
(http://xtf.cdlib.org/documentation/changelog/#3.1) for further details.

XTF is a combination of Java and XSLT 2.0 that indexes, queries, and displays 
digital objects and is based on open source software (e.g. Lucene and Saxon).  
XTF can be downloaded from the XTF website 
(http://xtf.cdlib.org/download/) or from the XTF Project page on 
SourceForge 
(http://sourceforge.net/projects/xtf/), where the source code can also be found.
The XTF website also provides a self-guided 
tutorial and a sample of the default 
installation 
(http://xtf.cdlib.org:8080/xtf/search), demonstrating the capabilities of the 
tool out-of-the-box. Both of these resources provide a quick view of the 
capabilities of XTF prior to download.
Offering a suite of customizable features that support diverse intellectual 
access to content, XTF interfaces can be designed to support the distinct tools 
and presentations that are useful and meaningful to specific audiences.  In 
addition, XTF offers the following core features:


  *   Easy to deploy: Drops directly in to a Java application server such as 
Tomcat or Resin; has been tested on Solaris, Mac, Linux, and Windows operating 
systems.
  *   Easy to configure: Can create indexes on any XML element or attribute; 
entire presentation layer is customizable via XSLT.
  *   Robust: Optimized to perform well on large documents (e.g., a single text 
that exceeds 10MB of encoded text); scales to perform well on collections of 
millions of documents; provides full Unicode support.
  *   Extensible:
 *   Works well with a variety of authentication systems (e.g., IP address 
lists, LDAP, Shibboleth).
 *   Provides an interface for external data lookups to support 
thesaurus-based term expansion, recommender systems, etc.
 *   Can power other digital library services (e.g., XTF contains an 
OAI-PMH data provider that allows others to harvest metadata, and an SRU 
interface that exposes searches to federated search engines).
 *   Can be deployed as separate, modular pieces of a third-party system 
(e.g., the module that displays snippets of matching text).
  *   Powerful for the end user:
 *   Spell checking of queries
 *   Faceted displays for browsing
 *   Dynamically updated browse lists
 *   Session-based bookbags
These basic features can be tuned and modified.  For instance, the same bookbag 
feature that allows users to store links to entire books can also store links 
to citable elements of an object, such as a note or other reference.

Examples of XTF-based applications both within and outside of the CDL include:


  *   eScholarship (http://www.escholarship.org), 
the University of California's open access scholarly publishing and research 
platform.
  *   Mark Twain Project Online 
(http://www.marktwainproject.org), developed by the Mark Twain Papers Project, 
the CDL and the University of California Press.
  *   Calisphere 
(http://calisphere.universityofcalifornia.edu/), a curated collection of 
primary sources keyed to the curriculum standards of California's K-12 
community, developed by the CDL.

* SNAC: The Social Networks and Archival Context Project 
(prototype) 
(http://socialarchive.iath.virginia.edu/xtf/search), linking together 
descriptions of people from finding aids using the new standard Encoded 
Archival Context-Corporate Bodies, Persons, and Families (EAC-CPF), developed 
by IATH, University of 
V

[CODE4LIB] Job: RFP For Front-End Portal Design and Development at Digital Public Library of America

2012-08-06 Thread jobs
The Digital Public Library of America (DPLA) Secretariat is delighted to
release a request for proposals (RFP) for the design and development of a
prototype front-end portal for the DPLA.

  
The DPLA seeks a skilled interactive agency to design and develop a website to
facilitate the creative discovery, sharing, and use of multimedia library
materials among the general public. This prototype website will serve as a
gesture toward the possibilities for a future, fully built out DPLA.

  
Proposals should be submitted electronically by 11:59 PM ET on Monday, August
20, 2012 to Rebekah Heacock (dpla at cyber.law.harvard.edu). Questions can be
directed via email to Rebekah Heacock.

  
Follow-up conversations with prospective teams will take place from August
21-26; we plan to announce our decision on August 27 in order to begin work by
September 4, 2012.

  
About the Project

The DPLA front end website will serve as a creative, interactive platform for
the general public to discover, share, and use books, photographs, audio and
video recordings, and cultural heritage objects online. The most important
function is the display and consumption of digital library content and
metadata.

  
Proposals should describe the process applicants would follow to define,
design, and build a website that will delight users and gesture toward the
possibilities of a highly interactive, fully built out national digital
library. The website should build upon the back-end platform currently under
development (see the full proposal for more information).

  
More information is available
at[http://blogs.law.harvard.edu/dplaalpha/2012/08/06/dpla-
releases-rfp-for-front-end-design-and-
development/](http://blogs.law.harvard.edu/dplaalpha/2012/08/06/dpla-releases-
rfp-for-front-end-design-and-development/)



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/1786/


[CODE4LIB] edUi Early Bird Deadline

2012-08-06 Thread EdUI Conference
Code4Lib folks,

The edUi conference early bird deadline is this Friday. We posted our
full schedule
a few weeks back, but in case you missed it,
here are some sessions by library web professionals you might be interested
in:

Speed Is a Feature: Performance on the Mobile
Web-
Eric Petteplace
Leveraging Student Data to Create Website
Personalization-
Ian Chan
Using SVG in HTML5 for a Library Floor
Plan-
Jeff Proehl
What the !$#% Was This Plugin for Anyway?! Maintaining a Sustainable
WordPress Multisite
Install-
Juliana Perry
Too Many Cooks in the Web Kitchen? A Successful Case of Herding Cats to
Improve the User Experience
- Rebecca Blakiston

There's a lot more to edUi too including presentations by some of the top
names in the web industry and half-day workshops on topics like Responsive
Design, jQuery and Web Accessibility.

Early Bird tickets  are $450 and that
price goes up to $550 after Friday.

Hope to see you there!

-Trey


Re: [CODE4LIB] Recommendations for a teaching OPAC?

2012-08-06 Thread Doreva Belfiore
DIALOG was still being taught at the library school I attended in
2008-2011, and from what I hear still remains.

+1 on the comment about this awesome survey class.

Doreva Belfiore, MSLIS
Temple University Libraries
Philadelphia, PA

On Fri, Aug 3, 2012 at 11:05 AM, Bohyun Kim  wrote:

> Amen to this! I suspect DIALOG is still being taught to believe it or
> not...
>
>
> "By the way, this looks like an awesome survey class. The headaches it
> would have saved me if someone had covered this stuff 10 years ago when I
> was in school, instead of teaching me how to search DIALOG!"
>
>
>
> ---
> Bohyun Kim, MA, MSLIS
> Digital Access Librarian
> bohyun@fiu.edu
> 305-348-1471
> Medical Library, College of Medicine
> Florida International University
> http://medlib.fiu.edu
> http://medlib.fiu.edu/m (Mobile)
>
> 
> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Joseph
> Montibello [joseph.montibe...@dartmouth.edu]
> Sent: Friday, August 03, 2012 10:56 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Recommendations for a teaching OPAC?
>
> Hi,
>
> When you talk about the OPAC, do you want them to be working with a full
> ILS or really just the front-end piece? If it's just the patron-facing
> search, you could probably do worse than to install Blacklight.  It
> probably doesn't really meet the "simple" criteria - there's a lot more to
> it than I could talk about.  But getting it out of the box, turned on, and
> searching against a few records is something that you and students could
> probably manage. I've got a year of unix/ssh/command line experience and
> with a bit of mucking about, googling, and asking for help I was able to
> get a local (non-production) instance up and running, so it's definitely
> easy enough.
>
> By the way, this looks like an awesome survey class. The headaches it
> would have saved me if someone had covered this stuff 10 years ago when I
> was in school, instead of teaching me how to search DIALOG!
>
> Joe Montibello, MLIS
> Library Systems Manager
> Dartmouth College Library
> 603.646.9394
> joseph.montibe...@dartmouth.edu
>
>
>
>
>
>
> On 8/2/12 1:54 PM, "David E Mussulman"  wrote:
>
> >Hi everyone,
> >
> >I teach an intro to IT survey class for the LIS school at Illinois. The
> >one-major-topic-a-week syllabus doesn't really give us time to deep dive
> >into IT topics, but it lets us explore them and give contextual
> >understanding to the building block pieces. Ideally, every topic has
> >some sort of hands-on exercise that gives real life experience with the
> >concepts/technologies. The exercises are usually independent, but I've
> >been kicking around the idea of using a simple OSS OPAC to teach
> >different elements of the class as a semester-long big cascading lesson.
> >Examples:
> >
> >Lesson: Linux, ssh and the command shell
> >Exercise: Installing Ubuntu, getting comfortable with that environment
> >
> >Lesson: OSS and software ecosystems
> >Exercise: Get a LAMP stack setup on the OS, install the OPAC
> >
> >Lesson: Interfaces, usability, accessibility
> >Exercise: Use the OPAC, populate it with some data, assess its usability
> >
> >Lesson: HTML/CSS
> >Exercise: Use CSS to skin the OPAC, customize the HTML for your "site"
> >
> >Lesson: Data management, search, IR
> >Exercise: See if we can peak under the hood about how the OPAC's search
> >works
> >
> >Lesson: Interfaces to data: databases, XML, SQL
> >Exercise: Use the OPAC as an living example to work with those interfaces
> >
> >Lesson: Cloud computing, 2.0/social network integration
> >Exercise: Not sure yet...
> >
> >This idea primarily came from trying to get some simple XML/SQL
> >exercises that didn't suck (the setup for these environments is almost
> >as involved as any exercises itself), and the fact the previous classes
> >really liked dissecting the nextgen catalogs we've explored from a
> >software selection and 2.0 integration perspective.
> >
> >But here's the catch, and this is why I need your experience, Code4Lib.
> >I'm not an OPAC admin, and have no experience running or hacking them.
> >I'm looking for recommendations for software that would help me with the
> >goals above, without being too difficult or overwhelming for the
> >students or me. :) It doesn't have to be a good/complete OPAC,
> >necessarily -- just a teaching tool to give experience with the lessons
> >above.
> >
> >Should I be looking at koha and evergreen and the big ones, or are there
> >small projects that you're aware of that might be better? My preference
> >would be MySQL and PHP, but as long as the supplemental tools and
> >documentation are good, I'm flexible. For example, if there are tools as
> >good as phpmyadmin to browse postgresql, I don't think it really
> >matters. I'm willing to sacrifice "good" for "simple and transparent". I
> >don't think Rails is a good place to go with this because I don't want
> >to teach MVC/Rails. (Maybe I'm wrong?)
> >
> 

Re: [CODE4LIB] It's all job postings!

2012-08-06 Thread Ed Summers
150 people responded about whether jobs.code4lib.org posting should
come to the discussion list:

yes: 132
no: 10
who cares: 8

93% in support or agnostic seems to be a good indicator that the
postings should continue to come to the list for now.

//Ed