[CODE4LIB] CfP: Crowdsourcing workshop at DH 2016
nces of participants. Outcomes from the workshop might include a whitepaper and/or the further development of or support for a peer network for humanities crowdsourcing. The workshop is organised by Mia Ridge (British Library), Meghan Ferriter (Smithsonian Transcription Centre), Christy Henshaw (Wellcome Library) and Ben Brumfield (FromThePage). We anticipate accepting 30 participants. You can apply to attend at https://docs.google.com/forms/d/1l05Rba3EqMyy-X4UVmU9z7hQ-jlK2x2kLGvNtJfgtgQ/viewform On notification of acceptance, we will send detailed instructions for formal registration. For more information, please contact benwb...@gmail.com and mia.ri...@bl.uk, who will be in contact with the rest of the organisers. Regards, Ben W. Brumfield http://fromthepage.com/ http://manuscripttranscription.blogspot.com
Re: [CODE4LIB] separate list for jobs
I suspect I'm not the only mostly-lurker who subscribes to CODE4LIB in digest mode, finding value in a glance over the previous day's discussions each morning, then (very) occasionally weighing in on individual threads via the web interface. I find this to be more effective and efficient than filtering-and-foldering individual messages, at least for my goal of having some idea of the content of the conversations here, although--not being a full-time library technologist--I'm really just skimming. I also suspect that I'm also not the only digest-mode subscriber who would see value in a digest-mode option that excluded job postings. Ben Brumfield http://manuscripttranscription.blogspot.com/
[CODE4LIB] Biodiversity Specimen Label Transcription Hackathon, applications due Nov 1
For those interested in exploring crowdsourcing, transcription tools, and OCR, this is a really neat opportunity to see what's going on in natural science collections. I attended the Augmenting OCR hackathon in February and learned a tremendous amount about OCR. Better yet, one of the tools I developed for processing entomology labels was re-used successfully by folks at the Early Modern OCR Project for their work dealing with 18th-century English printed books. I wrote up the experience here: http://manuscripttranscription.blogspot.com/search/label/hackathon Ben Brumfield http://fromthepage.com/ Forwarded announcement: iDigBio (www.idigbio.org) and Zooniverse's Notes from Nature Project (www.notesfromnature.org) are pleased to announce a hackathon to further enable public participation in online transcription of biodiversity specimen labels. There are approximately 1 billion specimens of this type in US collections alone, but it is estimated that information from just 10% of them is currently digitized and online. Digitization of natural history collections grants researchers access to vast quantities of information in their investigations of timely subjects such as climate change, invasive species, and the extinction crisis. The magnitude of the task of bringing those collections into digital format exceeds that of any single organization and will require new, Internet-scale approaches to engage the public. This is an exciting opportunity to work on a ground-breaking citizen-science endeavor with immediate and strong impacts in the areas of biodiversity research and applied conservat! ion. The event will occur from December 16-20, 2013, at iDigBio in Gainesville, FL. There is up to $1200 for support of travel and lodging for each participant. The hackathon will produce new functionality and interoperability for Zooniverse's Notes from Nature (www.notesfromnature.org) and similar transcription tools. There are four areas of development that will be progressively addressed throughout the week. On Monday, the focus will be (1) linking images registered to the iDigBio Cloud to transcription tools to create efficiency and alleviate storage issues. Starting on Tuesday, topics will include (2) transcription QA/QC and the reconciliation of replicate transcriptions, (3) integration of OCR into the transcription workflow, and (4) new UI features and novel incentive approaches for public engagement. We expect that most participants will arrive on Monday afternoon and depart on Friday late afternoon/evening or Saturday morning. There will be a social at the Florida Museum of Natural History on Wednesday, December 18. There will be opportunities to narrow the focus in each category of activity in a teleconference tentatively scheduled for early in the week of November 25. **If you wish to be considered for one of about ten open invitations (of a total of about 30), please send (1) your CV/resume, (2) a short description (<250 words) of your relevant expertise (citing example products where appropriate), (3) the development areas that interest you (of the four numbered above), and (4) the days that you can attend to Austin Mast (am...@bio.fsu.edu) by Friday, November 1, for assured consideration. At least 3 slots will be reserved for qualified graduate students.** With best regards, Austin and Rob Guralnick (UC-Boulder), co-organizers Austin Mast Associate Professor · Director, Robert K. Godfrey Herbarium · Associate Editor, Systematic Biology and Systematic Botany · Treasurer, American Society of Plant Taxonomists · Steering Committee Member, iDigBio, The National Resource for Advancing Digitization of Biodiversity Collections Department of Biological Science · 319 Stadium Drive · Florida State University · Tallahassee, FL 32306-4295 · U.S.A. Office is King Life Science Building, room 4065 · Lab is King Life Science Building, rooms 4068 and 4084 · Herbarium is Biological Science Unit One, room 100 Voice: 1 (850) 645-1500 · Fax: 1 (850) 645-8447 · am...@bio.fsu.edu
Re: [CODE4LIB] Python and Ruby
The PyCon announcement reminds me of what may be the biggest difference between Python and Ruby: if you speak at a Ruby conference, your registration fee (and often other expenses) is waived in gratitude for your effort. If you speak at a Python conference, you pay full price in recognition of the privilege you have (to market yourself or something). I have very strong opinions on this, but anyone else interested might want to read the links and comment thread at Marty Haught's post: http://martyhaught.com/articles/2011/06/07/conference-organizing-and-speakers/ Ben Brumfield http://manuscripttranscription.blogspot.com/
[CODE4LIB] Call for Participation: Open Source Indexing
>From http://opensourceindexing.org/ The Challenge Historic documents often contain handwriting, old fonts, or other text formats that OCR software can't handle. We need humans--from volunteers to paid staff--to read the document images and transcribe what they see into databases which can be searched, analyzed, crawled, and used by researchers. Until now those efforts have required organizations either to outsource indexing to external partners or to cobble together their own off-line or on-site systems. Our goal is to build a tool that can be used by libraries, archives, museums, historical sites, genealogy and heritage societies to run their own indexing projects, under their own control. The Invitation We'd like to invite libraries, archives, and museums; historical, genealogy, and heritage societies to participate in the project. Right now we need advice and examples of indexing projects that real organizations would like to run. This would allow us to work with an eye on real data outside the UK parish registers and English census records which have been driving our development up to the present. What we need from you Project definitions including: Sample image files (around 5 per project in the format you'd use for access copies), A maximal spec for the data you'd like to collect, A minimal set of required fields you need, and A description of the material and goals of the project. In addition to example indexing project definitions, we need: * Funding to continue development. Our top priority is building a tool for our funders' indexing projects at FreeREG and FreeCEN. Building features outside of the needs common to those projects will require more funds. * Code contributions and help with design and programming. * Publicity and endorsement to spread the word about Open Source Indexing. The Tool We're basing our online indexing tool on Scribe, a tool developed by the Citizen Science Alliance from their Old Weather project and deployed by the Bodleian Library for What's the score at the Bodleian. More recently, Scribe has been customized by New York Public Library Labs for their Ensemble database of the performing arts. We're augmenting the Scribe transcription system by adding a database that allows users to search and view records created by the indexing tool. We're also adding support for and offline/legacy transcripts imported via CSV files. Improvements to the data-entry UI and a system for reporting on indexing activity and managing volunteers will round out the effort. (See the data flow diagram.) The entire system will be released under an Apache license. (In fact, the source code under development already is.) Ben Brumfield http://manuscripttranscription.blogspot.com/
Re: [CODE4LIB] Handwriting and ocr
Let me echo Jim in suggesting a transcription tool rather than OCR for handwritten texts. However, a lot depends on the kinds of material you're working with and the uses you plan for the transcripts. Is it structured data, like census records, account books, or an index cards database? Is it free-form text like diaries or letters? Does the text contain a lot of genetic elements like strike-throughs, careted insertions and marginalia? Do you want to index terms so that readers can view all mentions of banjos within the text? At present, there is no one tool that supports all of these. I built and maintain one (AGPL) tool for free-form text to be used in indexing [Self-promotion: http://fromthepage.com/ is the tool; source is at http://github.com/benwbrum/fromthepage/ ] and have spent the last year building another (Apache) tool for converting tabular records into a search database. I think they're great, and am really excited about them both. Nevertheless, last week I pointed a project at Jim's T-PEN instead of my own tools, because the manuscripts were medieval Arabic donation records which needed line-based transcription. I maintain a list of transcription tools used in crowdsourcing projects here: http://tinyurl.com/TranscriptionToolGDoc Currently there are around 30 that I know of, and I'd be happy to give my opinion of what's appropriate for your project on or off list. Ben Brumfield http://manuscripttranscription.blogspot.com/
Re: [CODE4LIB] web-based ocr
The idea of an API-driven OCR service came up at last month's iDigBio Augmenting OCR Hackathon. I wasn't involved in the team that built it, as I got distracted detecting handwritten sources from OCR output, so I'm afraid I don't know very much about how far they got. Nevertheless, I'd recommend taking a look at the documentations for the REST API they developed: https://github.com/idigbio-aocr/RESTAPI/tree/master/doc Ben Brumfield http://manuscripttranscription.blogspot.com/