[CODE4LIB] CfP: Crowdsourcing workshop at DH 2016

2016-04-15 Thread Ben Brumfield
nces of
participants. Outcomes from the workshop might include a whitepaper and/or
the further development of or support for a peer network for humanities
crowdsourcing.

The workshop is organised by Mia Ridge (British Library), Meghan Ferriter
(Smithsonian Transcription Centre), Christy Henshaw (Wellcome Library) and
Ben Brumfield (FromThePage).

We anticipate accepting 30 participants. You can apply to attend at
https://docs.google.com/forms/d/1l05Rba3EqMyy-X4UVmU9z7hQ-jlK2x2kLGvNtJfgtgQ/viewform

On notification of acceptance, we will send detailed instructions for
formal registration.

For more information, please contact benwb...@gmail.com and mia.ri...@bl.uk,
who will be in contact with the rest of the organisers.

Regards,

Ben W. Brumfield
http://fromthepage.com/
http://manuscripttranscription.blogspot.com


Re: [CODE4LIB] separate list for jobs

2014-05-08 Thread Ben Brumfield
I suspect I'm not the only mostly-lurker who subscribes to CODE4LIB in digest 
mode, finding value in a glance over the previous day's discussions each 
morning, then (very) occasionally weighing in on individual threads via the web 
interface.  I find this to be more effective and efficient than 
filtering-and-foldering individual messages, at least for my goal of  having 
some idea of the content of the conversations here, although--not being a 
full-time library technologist--I'm really just skimming.

I also suspect that I'm also not the only digest-mode subscriber who would see 
value in a digest-mode option that excluded job postings.  

Ben Brumfield
http://manuscripttranscription.blogspot.com/


[CODE4LIB] Biodiversity Specimen Label Transcription Hackathon, applications due Nov 1

2013-10-30 Thread Ben Brumfield
For those interested in exploring crowdsourcing, transcription tools, and OCR, 
this is a really neat opportunity to see what's going on in natural science 
collections.

I attended the Augmenting OCR hackathon in February and learned a tremendous 
amount about OCR.  Better yet, one of the tools I developed for processing 
entomology labels was re-used successfully by folks at the Early Modern OCR 
Project  for their work dealing with 18th-century English printed books.
I wrote up the experience here: 
http://manuscripttranscription.blogspot.com/search/label/hackathon

Ben Brumfield
http://fromthepage.com/

Forwarded announcement:
iDigBio (www.idigbio.org) and Zooniverse's Notes from Nature Project 
(www.notesfromnature.org) are pleased  to announce a hackathon to further 
enable public participation in online transcription of biodiversity specimen 
labels.  There are approximately 1 billion specimens of this type in US 
collections alone, but it is estimated that information from just 10% of them 
is currently digitized and online.  Digitization of natural history collections 
grants researchers access to vast quantities of information in their 
investigations of timely subjects such as climate change, invasive species, and 
the extinction crisis.  The magnitude of the task of bringing those collections 
into digital format exceeds that of any single organization and will require 
new, Internet-scale approaches to engage the public.  This is an exciting 
opportunity to work on a ground-breaking citizen-science endeavor with 
immediate and strong impacts in the areas of biodiversity research and applied 
conservat!
 ion.

The event will occur from December 16-20, 2013, at iDigBio in Gainesville, FL.  
There is up to $1200 for support of travel and lodging for each participant.  

The hackathon will produce new functionality and interoperability for 
Zooniverse's Notes from Nature (www.notesfromnature.org) and similar 
transcription tools.  There are four areas of development that will be 
progressively addressed throughout the week.  On Monday, the focus will be (1) 
linking images registered to the iDigBio Cloud to transcription tools to create 
efficiency and alleviate storage issues.  Starting on Tuesday, topics will 
include (2) transcription QA/QC and the reconciliation of replicate 
transcriptions, (3) integration of OCR into the transcription workflow, and (4) 
new UI features and novel incentive approaches for public engagement.  

We expect that most participants will arrive on Monday afternoon and depart on 
Friday late afternoon/evening or Saturday morning.  There will be a social at 
the Florida Museum of Natural History on Wednesday, December 18.  There will be 
opportunities to narrow the focus in each category of activity in a 
teleconference tentatively scheduled for early in the week of November 25.  

**If you wish to be considered for one of about ten open invitations (of a 
total of about 30), please send (1) your CV/resume, (2) a short description 
(<250 words) of your relevant expertise (citing example products where 
appropriate), (3) the development areas that interest you (of the four numbered 
above), and (4) the days that you can attend to Austin Mast (am...@bio.fsu.edu) 
by Friday, November 1, for assured consideration.  At least 3 slots will be 
reserved for qualified graduate students.**

With best regards,

Austin and Rob Guralnick (UC-Boulder), co-organizers

Austin Mast


Associate Professor · Director, Robert K. Godfrey Herbarium · Associate Editor, 
Systematic Biology and Systematic Botany · Treasurer, American Society of Plant 
Taxonomists · Steering Committee Member, iDigBio, The National Resource for 
Advancing Digitization of Biodiversity Collections

Department of Biological Science · 319 Stadium Drive · Florida State University 
· Tallahassee, FL 32306-4295 · U.S.A.

Office is King Life Science Building, room 4065 · Lab is King Life Science 
Building, rooms 4068 and 4084 · Herbarium is Biological Science Unit One, room 
100

Voice: 1 (850) 645-1500 · Fax:  1 (850) 645-8447 · am...@bio.fsu.edu


Re: [CODE4LIB] Python and Ruby

2013-08-01 Thread Ben Brumfield
The PyCon announcement reminds me of what may be the biggest difference between 
Python and Ruby:  if you speak at a Ruby conference, your registration fee (and 
often other expenses) is waived in gratitude for your effort.  If you speak at 
a Python conference, you pay full price in recognition of the privilege you 
have (to market yourself or something).

I have very strong opinions on this, but anyone else interested might want to 
read the links and comment thread at Marty Haught's post: 
http://martyhaught.com/articles/2011/06/07/conference-organizing-and-speakers/

Ben Brumfield
http://manuscripttranscription.blogspot.com/


[CODE4LIB] Call for Participation: Open Source Indexing

2013-04-02 Thread Ben Brumfield
>From http://opensourceindexing.org/

The Challenge
Historic documents often contain handwriting, old fonts, or other text formats 
that OCR software can't handle. We need humans--from volunteers to paid 
staff--to read the document images and transcribe what they see into databases 
which can be searched, analyzed, crawled, and used by researchers. Until now 
those efforts have required organizations either to outsource indexing to 
external partners or to cobble together their own off-line or on-site systems.

Our goal is to build a tool that can be used by libraries, archives, museums, 
historical sites, genealogy and heritage societies to run their own indexing 
projects, under their own control.

The Invitation
We'd like to invite libraries, archives, and museums; historical, genealogy, 
and heritage societies to participate in the project. Right now we need advice 
and examples of indexing projects that real organizations would like to run. 
This would allow us to work with an eye on real data outside the UK parish 
registers and English census records which have been driving our development up 
to the present.

What we need from you

Project definitions including:
Sample image files (around 5 per project in the format you'd use for access 
copies),
A maximal spec for the data you'd like to collect,
A minimal set of required fields you need, and
A description of the material and goals of the project.

In addition to example indexing project definitions, we need:
 *   Funding to continue development. Our top priority is building a tool for 
our funders' indexing projects at FreeREG and FreeCEN. Building features 
outside of the needs common to those projects will require more funds. 
  *  Code contributions and help with design and programming.
  *  Publicity and endorsement to spread the word about Open Source Indexing.

The Tool
We're basing our online indexing tool on Scribe, a tool developed by the 
Citizen Science Alliance from their Old Weather project and deployed by the 
Bodleian Library for What's the score at the Bodleian. More recently, Scribe 
has been customized by New York Public Library Labs for their Ensemble database 
of the performing arts.

We're augmenting the Scribe transcription system by adding a database that 
allows users to search and view records created by the indexing tool. We're 
also adding support for and offline/legacy transcripts imported via CSV files. 
Improvements to the data-entry UI and a system for reporting on indexing 
activity and managing volunteers will round out the effort. (See the data flow 
diagram.)

The entire system will be released under an Apache license. (In fact, the 
source code under development already is.)


Ben Brumfield
http://manuscripttranscription.blogspot.com/


Re: [CODE4LIB] Handwriting and ocr

2013-03-13 Thread Ben Brumfield
Let me echo Jim in suggesting a transcription tool rather than OCR for 
handwritten texts.  However, a lot depends on the kinds of material you're 
working with and the uses you plan for the transcripts.  Is it structured 
data, like census records, account books, or an index cards database?  
Is it free-form text like diaries or letters? Does the text contain a lot of 
genetic elements like strike-throughs, careted insertions and 
marginalia?  Do you want to index terms so that readers can view all
mentions of banjos within the text? 

At present, there is no one tool that supports all of these.  I built and 
maintain one (AGPL) tool for free-form text to be used in indexing
[Self-promotion: http://fromthepage.com/ is the tool; source is at
http://github.com/benwbrum/fromthepage/ ] and have spent the last
year building another (Apache) tool for converting tabular records into
a search database.  I think they're great, and am really excited about
them both.  Nevertheless, last week I pointed a project at Jim's 
T-PEN instead of my own tools, because the manuscripts were medieval 
Arabic donation records which needed line-based transcription.  

I maintain a list of transcription tools used  in crowdsourcing 
projects here: http://tinyurl.com/TranscriptionToolGDoc 
Currently there are around 30 that I know of, and I'd be happy
to give my opinion of what's appropriate for your project on or off
list.

Ben Brumfield
http://manuscripttranscription.blogspot.com/


Re: [CODE4LIB] web-based ocr

2013-03-13 Thread Ben Brumfield
The idea of an API-driven OCR service came up at last month's iDigBio 
Augmenting OCR Hackathon.

I wasn't involved in the team that built it, as I got distracted detecting 
handwritten sources from OCR output, so I'm afraid I don't know very much about 
how far they got.  Nevertheless, I'd recommend taking a look at the 
documentations for the REST API they developed:

https://github.com/idigbio-aocr/RESTAPI/tree/master/doc

Ben Brumfield
http://manuscripttranscription.blogspot.com/