[CODE4LIB] OCR PDFs

2008-10-17 Thread James Tuttle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I wonder if any of you might have experience with creating text PDFs
from  TIFFs.  I've been using tiffcp to stitch TIFFs together into a
single image and then using tiff2pdf to generate PDFs from the single
TIFF.  I've had to pass this image-based PDF to someone with Acrobat to
use it's batch processing facility to OCR the text and save a text-based
PDF.  I wonder if anyone has suggestions for software I can integrate
into the script (Python on Linux) I'm using.

Thanks,
James

- --
- ---
James Tuttle
Digital Repository Librarian

NCSU Libraries, Box 7111
North Carolina State University
Raleigh, NC 27695-7111
[EMAIL PROTECTED]

(919)513-0651 Phone
(919)515-3031  Fax

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI+H1zKxpLzx+LOWMRAgxIAJwNXyeMJbk6r6hmHpNAdEvWIQbCVgCgp8JR
nyS3WZ4UuRbU/6DTH7ohe/M=
=mT2T
-END PGP SIGNATURE-


Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread Terry Harrison
You might want to look at ABBYY Fine Reader 9.0 Professional, which can be 
driven from the command line.  Fine Reader  is used at the Library of 
Congress.  Here is a info link to get you started (search command):

http://www.scanstore.com/Scanning/Document_Imaging/Software/OCR_Software/Nuance/omnipage_review.asp

Regards,
Terry 

 
Terry Harrison 
Project Manager 
CACI 
5505 Robin Hood Road, Suite F 
Norfolk, Va. 23508 
Ph: 757.321.9120 x232 
Fax: 757.321.8797 
[EMAIL PROTECTED] 


Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread Bridger Dyson-Smith
If you haven't already, take a look at tesseract (
http://code.google.com/p/tesseract-ocr/). There's some discussion of using
tesseract and shell scripting to work with tiffs to pdfs to ocr'd text,
which isn't exactly what you're wanting to do, I know, but may prove helpful
(http://www.groklaw.net/articlebasic.php?story=20061210115516438).
Cheers!
Bridger Dyson-Smith


On Fri, Oct 17, 2008 at 8:28 AM, Terry Harrison [EMAIL PROTECTED] wrote:

 You might want to look at ABBYY Fine Reader 9.0 Professional, which can be
 driven from the command line.  Fine Reader  is used at the Library of
 Congress.  Here is a info link to get you started (search command):


 http://www.scanstore.com/Scanning/Document_Imaging/Software/OCR_Software/Nuance/omnipage_review.asp

 Regards,
 Terry

 
 Terry Harrison
 Project Manager
 CACI
 5505 Robin Hood Road, Suite F
 Norfolk, Va. 23508
 Ph: 757.321.9120 x232
 Fax: 757.321.8797
 [EMAIL PROTECTED]



Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread Jonathan Brinley
This is somewhat off-topic, since you asked for something you can use
on Linux. In any case...

I've been using OmniPage 16, and I'm sorry to say I can't recommend
it. You can't run it from the command line, so you can't really
integrate it into a script. It does have a batch manager, so you can
set it to do whole folders at a time. Just make sure your folder's not
too large; it crashes fairly reliably after about 10-40 pages.

If you do use OmniPage to make your PDFs, I've found that it works
best to convert a single TIFF into a single-page PDF, then use
pdftk[1] (along with a [language of your choice] script) to put those
PDFs together however you want them.

Have a nice day,
Jonathan

[1] http://www.accesspdf.com/pdftk/

-- 
Jonathan M. Brinley
Metadata  Digital Initiatives Developer
Ball State University

[EMAIL PROTECTED]
http://xplus3.net/


On Fri, Oct 17, 2008 at 7:56 AM, James Tuttle [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 I wonder if any of you might have experience with creating text PDFs
 from  TIFFs.  I've been using tiffcp to stitch TIFFs together into a
 single image and then using tiff2pdf to generate PDFs from the single
 TIFF.  I've had to pass this image-based PDF to someone with Acrobat to
 use it's batch processing facility to OCR the text and save a text-based
 PDF.  I wonder if anyone has suggestions for software I can integrate
 into the script (Python on Linux) I'm using.

 Thanks,
 James

 - --
 - ---
 James Tuttle
 Digital Repository Librarian

 NCSU Libraries, Box 7111
 North Carolina State University
 Raleigh, NC 27695-7111
 [EMAIL PROTECTED]

 (919)513-0651 Phone
 (919)515-3031  Fax

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFI+H1zKxpLzx+LOWMRAgxIAJwNXyeMJbk6r6hmHpNAdEvWIQbCVgCgp8JR
 nyS3WZ4UuRbU/6DTH7ohe/M=
 =mT2T
 -END PGP SIGNATURE-



[CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Dibelius, Steven
***Cross-posted; apologies for duplication***

 

The eXtensible Catalog Project is pleased to announce that we have
launched our new website at http://www.extensiblecatalog.org/.  This new
website will be the main vehicle for distributing our open-source
software once it is released in 2009.  In the mean time, the website
contains a wealth of information regarding the project, including
publications, an overview of the software we are developing and the
technologies that software will use, and a blog that has already been in
use.

 

The eXtensible Catalog (XC) Project is working to design and develop a
set of open-source applications that will provide libraries with an
alternative way to reveal their collections to library users. XC will
provide easy access to all resources (both digital and physical
collections) across a variety of databases, metadata schemas and
standards, and will enable library content to be revealed through other
services that libraries may already be using, such as content management
systems and learning management systems. XC will also make library
collections more web-accessible by revealing them through web search
engines.

 

Since XC software will be open source, it will be available for download
at no cost. Libraries will be able to adopt, customize and extend the
software to meet local needs. In addition, a not-for-profit organization
will be formed to provide the infrastructure to incorporate community
contributions to the code base, encourage collaboration, and provide
maintenance and upgrades.

 

The project is hosted at the University of Rochester and funded through
a generous grant from the Andrew W. Mellon Foundation Scholarly
Communications Program as well as through significant contributions from
and in collaboration with XC partner institutions.  The project is in a
design and development phase until July 2009, at which point the
software will be released under an open-source license.

 

 

Steven Dibelius

Deployment Engineer, eXtensible Catalog Project

University of Rochester

[EMAIL PROTECTED]


Re: [CODE4LIB] registry of databases

2008-10-17 Thread White,Joanna
Hello all,

My name is Joanna White and I am the Product Manager for the WorldCat
Registry. The WorldCat Registry is a directory of libraries and services
they provide. Through a secure webtool, libraries can manage and share
information about their institutional identity, and makes institutional
metadata available to both OCLC and non-OCLC services. Currently, the
WorldCat Registry does not include the type of database information
Stephen mentioned in his original message. However, we are always
interested in the community's ideas and needs. We follow lists like
[CODE4LIB] and you can also send ideas to our mailbox at registries at
oclc dot org

You can follow the WorldCat Registry's developments via the OCLC
Newsletter or on the DevNet Blog at http://worldcat.org/devnet/blog/
You can also learn more about our API offerings under WorldCat Registry
Search, WorldCat Registry Detail and OpenURL Getaway here
http://www.worldcat.org/wcpa/content/affiliate/default.jsp.  


Thank you, Joanna White
OCLC
WorldCat Registry, http://worldcat.org/registry/institutions
mailto: Whitej at oclc dot org


-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
Stephen Francoeur
Sent: Thursday, October 16, 2008 2:49 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] registry of databases

Despite my best efforts to save things to delicious that catch my eye, I
can't seem to find an item that I know I read in the past two weeks.
Someone mentioned an effort to create a registry of databases in which
you could see what libraries had subscribed to which database. Is there
such a project or is this a figment of my fevered imagination?
I know I'm not thinking of how some libraries include databases in their
catalogs, which then gets passed on to WorldCat if the library is an
OCLC member. What I recall reading, though, may have made some reference
to the WorldCat Registry
(http://www.worldcat.org/registry/Institutions).

Any help here?

Stephen Francoeur
Information Services Librarian
Newman Library
Baruch College
151 E. 25th Street
New York, NY 10010

http://www.retaggr.com/Card/stephenfrancoeur


Re: [CODE4LIB] Vote for NE code4lib meetup location

2008-10-17 Thread Barnett, Jeffrey
I joined myself to the group just today, too late to vote, but what I see is 23 
votes for Boston and 43 for anywhere else.  Shouldn't there at least be a 
runoff?

-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Jay Luker
Sent: Wednesday, October 15, 2008 4:48 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Vote for NE code4lib meetup location

Sorry to leave you all in suspense all day. The results are in:

23 Boston, MA
18 Northampton, MA
14 Concord, NH
11 Portland, ME

Michael Klein has said he will now check when a suitable space will be
available at BPL. Then we'll update the WhenIsGood page and hope for
some availability intersection goodness.

--jay


[CODE4LIB] Job posting: Analyst Programmer Intermediate - Georgia State University Library

2008-10-17 Thread Douglas Goans
Vacancy Number: 0600774 
Position Title: Analyst Programmer Intermediate  
Type of Position: Regular Staff  
Department: Library

Duties: Reporting to the Web Development Librarian, the Analyst Programmer 
develops, maintains, and troubleshoots web based applications in support of the 
University Library's goals. Responsibilities include scripting and programming 
applications developed in-house, customization and enhancement of open-source 
and vendor applications, working with vendor or open-source Application 
Programming Interfaces (APIs), and management of in-house databases. The 
position works with project stakeholders as needed to further develop or 
enhance application design for scheduled and prioritized projects. The Analyst 
Programmer works collaboratively with library Systems personnel to implement 
and configure web servers in support of web development activities, 
authentication technologies and server security.  

Minimum Qualifications: Bachelor's degree and two years of related experience; 
or a combination of education and experience.  

Preferred Qualifications: Bachelor's degree in Computer Science or a related 
field and three years of related experience. Working knowledge of 
programming/scripting web applications in languages such as PHP, PERL, and 
Javascript. Experience working in a Linux/Unix environment and working with the 
Apache web server.  

Posting Date:  09-29-2008  
Closing Date: Open Until Filled
Special Instructions to Applicants: An application, resume and cover letter are 
required for consideration. An offer of employment will be conditional on 
background verification.  

Apply online:
https://jobs.gsu.edu/applicants/jsp/shared/frameset/Frameset.jsp?time=1224255051703
 

Job posting link:
http://www.library.gsu.edu/jobs/ 




Doug Goans
Web Development Librarian
Georgia State University Library
100 Decatur St.
Atlanta. GA 30303
Tel: (404) 413 2772 
Fax: (404) 651-4315


[CODE4LIB] FW: NAF notification service from OCLC

2008-10-17 Thread Ya'aqov Ziso
FYI: note below sent out to Karen Calhoun in the [EMAIL PROTECTED]
=
 'OCLC would be required to work with the Library of Congress as the producer
of the NAF data before OCLC could create the NAF notification service' 

Greetings Karen,
 
Per Roy's statement at the top, I have received several questions and
forward them to you. Provided that NACO contributors from participating
libraries produce (create or modify) most of the name authority records
listed in the NAF updates. They do that during their work hours at their
respective institutions, work hours paid for by those respective
institutions. The Library of Congress has evidently a role in promulgating
these records in the NAF updates, however:
1. Why do libraries interested in NAF updates  have to pay for these
updates? 
2. Why isn't the work of NACO contributors recompensed by allowing them to
access, at the least, a notification of NAF of which they have contributed?
3. What is the role of OCLC in these processes?
4. Does OCLC pay for NAF? if not, could CODE4LIB obtain NAF and NAF updates
on a similar basis?
Kind thanks for your attention and forthcoming replies,

Ya'aqov Ziso
[EMAIL PROTECTED]



 Dear Ya'aqov Ziso,
 
 Your email request/proposal of 4 October 2008 to Roy Tennant (My proposal to
 you is that OCLC will start offering a NEW service to its members/subscribers.
 That service will be a simple listing of the 010 fields for Name authority
 records that have been CHANGED that week in the OCLC NAF, and 010 for the new
 Name authority records for that have been ADDED to NAF.) has been referred by
 OCLC Research to the OCLC Metadata Services product group for consideration.
 
 We are pleased to receive your suggestion for a new service. We will add this
 suggestion to our list of potential new services and enhancements for
 consideration in our next round of planning for development in fiscal year
 2010.  
 
 Thank you for sharing your ideas with us.
 
 Karen 
 
 Karen Calhoun 
 Vice President, WorldCat and Metadata Services
 6565 Kilgour Place
 Dublin OH 43017 
 800-848-5878 x6441
 614-764-6441 
 FAX: 614-718-7457
 [EMAIL PROTECTED]
 
 




Address 6565 Kilgour Place Dublin OH 43017
Right click for SmartMenu shortcuts


-- End of Forwarded Message


Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread Binkley, Peter
And beyond Tesseract is Ocropus (http://code.google.com/p/ocropus/),
which uses Tesseract (and eventually other ocr engines) to generate
positional OCR in an HTML format. I wonder if you could process that
HTML slightly to put the TIFF in the background, then use an HTML to PDF
tool to generate your final PDF. Or something like that. Googling
ocropus pdf finds a few projects and discussions that might be
helpful.

Peter 

 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On 
 Behalf Of Bridger Dyson-Smith
 Sent: Friday, October 17, 2008 6:56 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] OCR PDFs
 
 If you haven't already, take a look at tesseract ( 
 http://code.google.com/p/tesseract-ocr/). There's some 
 discussion of using tesseract and shell scripting to work 
 with tiffs to pdfs to ocr'd text, which isn't exactly what 
 you're wanting to do, I know, but may prove helpful 
 (http://www.groklaw.net/articlebasic.php?story=20061210115516438).
 Cheers!
 Bridger Dyson-Smith
 
 
 On Fri, Oct 17, 2008 at 8:28 AM, Terry Harrison 
 [EMAIL PROTECTED] wrote:
 
  You might want to look at ABBYY Fine Reader 9.0 Professional, which 
  can be driven from the command line.  Fine Reader  is used at the 
  Library of Congress.  Here is a info link to get you 
 started (search command):
 
 
  
 http://www.scanstore.com/Scanning/Document_Imaging/Software/OCR_Softwa
  re/Nuance/omnipage_review.asp
 
  Regards,
  Terry
 
  
  Terry Harrison
  Project Manager
  CACI
  5505 Robin Hood Road, Suite F
  Norfolk, Va. 23508
  Ph: 757.321.9120 x232
  Fax: 757.321.8797
  [EMAIL PROTECTED]
 
 
 


[CODE4LIB] Fwd: Please disseminate - Release of Version 1.0 Production OAI Object Reuse and Exchange Specifications

2008-10-17 Thread Tim DiLauro

Forwarded on behalf of Carl Lagoze and the OAI-ORE authoring team...

Begin forwarded message:


From: Carl Lagoze [EMAIL PROTECTED]
Date: October 17, 2008 4:02:14 PM EDT
To: Tim DiLauro [EMAIL PROTECTED]
Subject: Please disseminate - Release of Version 1.0 Production OAI  
Object Reuse and Exchange Specifications


(The full copy of this Press Release is at http://www.openarchives.org/documents/ore-production-press-release.pdf 
 )


Over the past two years the Open Archives Initiative (OAI), in a  
project called Object Reuse and Exchange (OAI-ORE), has gathered  
international experts from the publishing, web, library, repository,  
and eScience communities to develop standards for the identification  
and description of aggregations of Web resources.   These standards  
provide the foundation for applications and services that can  
visualize, preserve, transfer, summarize, and improve access to the  
aggregations that people use in their daily Web interaction:  
including multiple page Web documents, multiple format documents in  
institutional repositories, scholarly data sets, and online photo  
and music collections.   The OAI-ORE standards leverage the core Web  
architecture and concepts emerging from related efforts including  
the semantic web, linked data, and Atom syndication.  As a result,  
they integrate both with the emerging machine-readable web, Web 2.0,  
and the future evolution of networked information.


The production versions of the OAI-ORE specifications and  
implementation documents are now available to the public, with a  
table of contents page at http://www.openarchives.org/ore/toc.  This  
public release is the culmination of several months of testing and  
review of initial alpha and beta releases. The participation and  
feedback from the wider OAI-ORE community, especially the OAI-ORE  
technical committee, was instrumental to the process leading up to  
this production release.


The documents in the release describe a data model to introduce  
aggregations as resources with URIs on the web. They also detail the  
machine-readable descriptions of aggregations expressed in the  
popular Atom syndication format, in RDF/XML, and RDFa.  The  
documents included in the release are:


 ·   ORE User Guide Documents

o   Primer

o   Resource Map Implementation in Atom

o   Resource Map Implementation in RDF/XML

o   Resource Map Implementation in RDFa

o   HTTP Implementation

o   Resource Map Discovery

·   ORE Specification Documents

o   Abstract Data Model

o   Vocabulary

·   Tools and Additional Resources



Carl Lagoze - Cornell University - [EMAIL PROTECTED]

Herbert Van de Sompel - Los Alamos National Laboratory - [EMAIL PROTECTED]



Re: [CODE4LIB] FW: NAF notification service from OCLC

2008-10-17 Thread Mark A. Matienzo
Ya'aqov,

Why don't you consider contacting the NACO program at the Library of
Congress? They would be more equipped to answer your questions.

Mark Matienzo
Applications Developer, Digital Experience Group
The New York Public Library


Re: [CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Cloutman, David
Same for me on FF3. Also, the same error on IE 7 and Safari 3 for
Windows. All browsers are identified as IE 6.

Windows XP SP 2.





---
David Cloutman [EMAIL PROTECTED]
Electronic Services Librarian
Marin County Free Library 

-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
Mark A. Matienzo
Sent: Friday, October 17, 2008 1:11 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] eXtensible Catalog - New Website


I'm using Firefox 3 on OS X and the project's website is claiming I'm
using IE 6 on Windows XP and thus not letting me access the site. Fix
this, please?

Mark Matienzo
Applications Developer, Digital Experience Group
The New York Public Library

On Fri, Oct 17, 2008 at 10:31 AM, Dibelius, Steven
[EMAIL PROTECTED] wrote:
 ***Cross-posted; apologies for duplication***



 The eXtensible Catalog Project is pleased to announce that we have
 launched our new website at http://www.extensiblecatalog.org/.  This
new
 website will be the main vehicle for distributing our open-source
 software once it is released in 2009.  In the mean time, the website
 contains a wealth of information regarding the project, including
 publications, an overview of the software we are developing and the
 technologies that software will use, and a blog that has already been
in
 use.



 The eXtensible Catalog (XC) Project is working to design and develop a
 set of open-source applications that will provide libraries with an
 alternative way to reveal their collections to library users. XC will
 provide easy access to all resources (both digital and physical
 collections) across a variety of databases, metadata schemas and
 standards, and will enable library content to be revealed through
other
 services that libraries may already be using, such as content
management
 systems and learning management systems. XC will also make library
 collections more web-accessible by revealing them through web search
 engines.



 Since XC software will be open source, it will be available for
download
 at no cost. Libraries will be able to adopt, customize and extend the
 software to meet local needs. In addition, a not-for-profit
organization
 will be formed to provide the infrastructure to incorporate community
 contributions to the code base, encourage collaboration, and provide
 maintenance and upgrades.



 The project is hosted at the University of Rochester and funded
through
 a generous grant from the Andrew W. Mellon Foundation Scholarly
 Communications Program as well as through significant contributions
from
 and in collaboration with XC partner institutions.  The project is in
a
 design and development phase until July 2009, at which point the
 software will be released under an open-source license.





 Steven Dibelius

 Deployment Engineer, eXtensible Catalog Project

 University of Rochester

 [EMAIL PROTECTED]


Email Disclaimer: http://www.co.marin.ca.us/nav/misc/EmailDisclaimer.cfm


Re: [CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Brenda Chawner
I'm having the same problem with Safari 3.1.1 on OS X, which the site thinks is 
also IE 6 on Windows XP. I haven't encountered this problem in years!

-- 
Brenda Chawner
Senior Lecturer  LIM Programmes Director
School of Information Management
Victoria University of Wellington
P O Box 600, Wellington  NEW ZEALAND
(04) 463 5780 | fax (04) 463 5446 | Room EA201 | [EMAIL PROTECTED]



-Original Message-
From: Code for Libraries on behalf of Mark A. Matienzo
Sent: Sat 18-Oct-08 9:11 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] eXtensible Catalog - New Website
 
I'm using Firefox 3 on OS X and the project's website is claiming I'm
using IE 6 on Windows XP and thus not letting me access the site. Fix
this, please?

Mark Matienzo
Applications Developer, Digital Experience Group
The New York Public Library

On Fri, Oct 17, 2008 at 10:31 AM, Dibelius, Steven
[EMAIL PROTECTED] wrote:
 ***Cross-posted; apologies for duplication***



 The eXtensible Catalog Project is pleased to announce that we have
 launched our new website at http://www.extensiblecatalog.org/.  This new
 website will be the main vehicle for distributing our open-source
 software once it is released in 2009.  In the mean time, the website
 contains a wealth of information regarding the project, including
 publications, an overview of the software we are developing and the
 technologies that software will use, and a blog that has already been in
 use.



 The eXtensible Catalog (XC) Project is working to design and develop a
 set of open-source applications that will provide libraries with an
 alternative way to reveal their collections to library users. XC will
 provide easy access to all resources (both digital and physical
 collections) across a variety of databases, metadata schemas and
 standards, and will enable library content to be revealed through other
 services that libraries may already be using, such as content management
 systems and learning management systems. XC will also make library
 collections more web-accessible by revealing them through web search
 engines.



 Since XC software will be open source, it will be available for download
 at no cost. Libraries will be able to adopt, customize and extend the
 software to meet local needs. In addition, a not-for-profit organization
 will be formed to provide the infrastructure to incorporate community
 contributions to the code base, encourage collaboration, and provide
 maintenance and upgrades.



 The project is hosted at the University of Rochester and funded through
 a generous grant from the Andrew W. Mellon Foundation Scholarly
 Communications Program as well as through significant contributions from
 and in collaboration with XC partner institutions.  The project is in a
 design and development phase until July 2009, at which point the
 software will be released under an open-source license.





 Steven Dibelius

 Deployment Engineer, eXtensible Catalog Project

 University of Rochester

 [EMAIL PROTECTED]



Re: [CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Chris Alhambra
I used Internet Explorer 7 to go this website, and I get the message You
are using *Internet Explorer* version *6.0* on *Windows XP*

-Chris Alhambra


On Fri, Oct 17, 2008 at 4:11 PM, Mark A. Matienzo [EMAIL PROTECTED] wrote:

 I'm using Firefox 3 on OS X and the project's website is claiming I'm
 using IE 6 on Windows XP and thus not letting me access the site. Fix
 this, please?

 Mark Matienzo
 Applications Developer, Digital Experience Group
 The New York Public Library

 On Fri, Oct 17, 2008 at 10:31 AM, Dibelius, Steven
 [EMAIL PROTECTED] wrote:
  ***Cross-posted; apologies for duplication***
 
 
 
  The eXtensible Catalog Project is pleased to announce that we have
  launched our new website at http://www.extensiblecatalog.org/.  This new
  website will be the main vehicle for distributing our open-source
  software once it is released in 2009.  In the mean time, the website
  contains a wealth of information regarding the project, including
  publications, an overview of the software we are developing and the
  technologies that software will use, and a blog that has already been in
  use.
 
 
 
  The eXtensible Catalog (XC) Project is working to design and develop a
  set of open-source applications that will provide libraries with an
  alternative way to reveal their collections to library users. XC will
  provide easy access to all resources (both digital and physical
  collections) across a variety of databases, metadata schemas and
  standards, and will enable library content to be revealed through other
  services that libraries may already be using, such as content management
  systems and learning management systems. XC will also make library
  collections more web-accessible by revealing them through web search
  engines.
 
 
 
  Since XC software will be open source, it will be available for download
  at no cost. Libraries will be able to adopt, customize and extend the
  software to meet local needs. In addition, a not-for-profit organization
  will be formed to provide the infrastructure to incorporate community
  contributions to the code base, encourage collaboration, and provide
  maintenance and upgrades.
 
 
 
  The project is hosted at the University of Rochester and funded through
  a generous grant from the Andrew W. Mellon Foundation Scholarly
  Communications Program as well as through significant contributions from
  and in collaboration with XC partner institutions.  The project is in a
  design and development phase until July 2009, at which point the
  software will be released under an open-source license.
 
 
 
 
 
  Steven Dibelius
 
  Deployment Engineer, eXtensible Catalog Project
 
  University of Rochester
 
  [EMAIL PROTECTED]
 



Re: [CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Ethan Gruber
I'm running FF3 on Ubuntu.   No dice.

Tried it in Opera 9.x in Ubuntu.  Still doesn't work.


On Fri, Oct 17, 2008 at 4:17 PM, Chris Alhambra [EMAIL PROTECTED] wrote:

 I used Internet Explorer 7 to go this website, and I get the message You
 are using *Internet Explorer* version *6.0* on *Windows XP*

 -Chris Alhambra


 On Fri, Oct 17, 2008 at 4:11 PM, Mark A. Matienzo [EMAIL PROTECTED]
 wrote:

  I'm using Firefox 3 on OS X and the project's website is claiming I'm
  using IE 6 on Windows XP and thus not letting me access the site. Fix
  this, please?
 
  Mark Matienzo
  Applications Developer, Digital Experience Group
  The New York Public Library
 
  On Fri, Oct 17, 2008 at 10:31 AM, Dibelius, Steven
  [EMAIL PROTECTED] wrote:
   ***Cross-posted; apologies for duplication***
  
  
  
   The eXtensible Catalog Project is pleased to announce that we have
   launched our new website at http://www.extensiblecatalog.org/.  This
 new
   website will be the main vehicle for distributing our open-source
   software once it is released in 2009.  In the mean time, the website
   contains a wealth of information regarding the project, including
   publications, an overview of the software we are developing and the
   technologies that software will use, and a blog that has already been
 in
   use.
  
  
  
   The eXtensible Catalog (XC) Project is working to design and develop a
   set of open-source applications that will provide libraries with an
   alternative way to reveal their collections to library users. XC will
   provide easy access to all resources (both digital and physical
   collections) across a variety of databases, metadata schemas and
   standards, and will enable library content to be revealed through other
   services that libraries may already be using, such as content
 management
   systems and learning management systems. XC will also make library
   collections more web-accessible by revealing them through web search
   engines.
  
  
  
   Since XC software will be open source, it will be available for
 download
   at no cost. Libraries will be able to adopt, customize and extend the
   software to meet local needs. In addition, a not-for-profit
 organization
   will be formed to provide the infrastructure to incorporate community
   contributions to the code base, encourage collaboration, and provide
   maintenance and upgrades.
  
  
  
   The project is hosted at the University of Rochester and funded through
   a generous grant from the Andrew W. Mellon Foundation Scholarly
   Communications Program as well as through significant contributions
 from
   and in collaboration with XC partner institutions.  The project is in a
   design and development phase until July 2009, at which point the
   software will be released under an open-source license.
  
  
  
  
  
   Steven Dibelius
  
   Deployment Engineer, eXtensible Catalog Project
  
   University of Rochester
  
   [EMAIL PROTECTED]
  
 



Re: [CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Custer, Mark
The site was working fine earlier, as I was able to view it with Opera
(now, of course, I've the same problems). 

For the time being, this should get you there:
http://www.extensiblecatalog.org/node/59



-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
Chris Alhambra
Sent: Friday, October 17, 2008 4:18 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] eXtensible Catalog - New Website

I used Internet Explorer 7 to go this website, and I get the message
You
are using *Internet Explorer* version *6.0* on *Windows XP*

-Chris Alhambra


On Fri, Oct 17, 2008 at 4:11 PM, Mark A. Matienzo [EMAIL PROTECTED]
wrote:

 I'm using Firefox 3 on OS X and the project's website is claiming I'm
 using IE 6 on Windows XP and thus not letting me access the site. Fix
 this, please?

 Mark Matienzo
 Applications Developer, Digital Experience Group
 The New York Public Library

 On Fri, Oct 17, 2008 at 10:31 AM, Dibelius, Steven
 [EMAIL PROTECTED] wrote:
  ***Cross-posted; apologies for duplication***
 
 
 
  The eXtensible Catalog Project is pleased to announce that we have
  launched our new website at http://www.extensiblecatalog.org/.  This
new
  website will be the main vehicle for distributing our open-source
  software once it is released in 2009.  In the mean time, the website
  contains a wealth of information regarding the project, including
  publications, an overview of the software we are developing and the
  technologies that software will use, and a blog that has already
been in
  use.
 
 
 
  The eXtensible Catalog (XC) Project is working to design and develop
a
  set of open-source applications that will provide libraries with an
  alternative way to reveal their collections to library users. XC
will
  provide easy access to all resources (both digital and physical
  collections) across a variety of databases, metadata schemas and
  standards, and will enable library content to be revealed through
other
  services that libraries may already be using, such as content
management
  systems and learning management systems. XC will also make library
  collections more web-accessible by revealing them through web search
  engines.
 
 
 
  Since XC software will be open source, it will be available for
download
  at no cost. Libraries will be able to adopt, customize and extend
the
  software to meet local needs. In addition, a not-for-profit
organization
  will be formed to provide the infrastructure to incorporate
community
  contributions to the code base, encourage collaboration, and provide
  maintenance and upgrades.
 
 
 
  The project is hosted at the University of Rochester and funded
through
  a generous grant from the Andrew W. Mellon Foundation Scholarly
  Communications Program as well as through significant contributions
from
  and in collaboration with XC partner institutions.  The project is
in a
  design and development phase until July 2009, at which point the
  software will be released under an open-source license.
 
 
 
 
 
  Steven Dibelius
 
  Deployment Engineer, eXtensible Catalog Project
 
  University of Rochester
 
  [EMAIL PROTECTED]
 



Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread James Tuttle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yes, I've tried tesseract and found it to be pretty accurate, but I
don't believe there is a way to integrate the text back into the PDF.
It's easy to pull text out of image-based PDFs, but not to put the text
back in.  Driving me crazy...

Thanks for tips,
James

Bridger Dyson-Smith wrote:
 If you haven't already, take a look at tesseract (
 http://code.google.com/p/tesseract-ocr/). There's some discussion of using
 tesseract and shell scripting to work with tiffs to pdfs to ocr'd text,
 which isn't exactly what you're wanting to do, I know, but may prove helpful
 (http://www.groklaw.net/articlebasic.php?story=20061210115516438).
 Cheers!
 Bridger Dyson-Smith
 
 
 On Fri, Oct 17, 2008 at 8:28 AM, Terry Harrison [EMAIL PROTECTED] wrote:
 
 You might want to look at ABBYY Fine Reader 9.0 Professional, which can be
 driven from the command line.  Fine Reader  is used at the Library of
 Congress.  Here is a info link to get you started (search command):


 http://www.scanstore.com/Scanning/Document_Imaging/Software/OCR_Software/Nuance/omnipage_review.asp

 Regards,
 Terry

 
 Terry Harrison
 Project Manager
 CACI
 5505 Robin Hood Road, Suite F
 Norfolk, Va. 23508
 Ph: 757.321.9120 x232
 Fax: 757.321.8797
 [EMAIL PROTECTED]


- --
- ---
James Tuttle
Digital Repository Librarian

NCSU Libraries, Box 7111
North Carolina State University
Raleigh, NC 27695-7111
[EMAIL PROTECTED]

(919)513-0651 Phone
(919)515-3031  Fax

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI+QuEKxpLzx+LOWMRAhSyAJ9+lQ/1J5SP/23XQrVrlsoNRZyKxQCfYTGw
qUBK6A9mkiLy88buUz7Wngg=
=DyZk
-END PGP SIGNATURE-


Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread James Tuttle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thanks for the tip.  Especially the part where you make it clear that
OmniPage doesn't really work.  Back to Acrobat, I guess.

Thanks all!

Jonathan Brinley wrote:
 This is somewhat off-topic, since you asked for something you can use
 on Linux. In any case...
 
 I've been using OmniPage 16, and I'm sorry to say I can't recommend
 it. You can't run it from the command line, so you can't really
 integrate it into a script. It does have a batch manager, so you can
 set it to do whole folders at a time. Just make sure your folder's not
 too large; it crashes fairly reliably after about 10-40 pages.
 
 If you do use OmniPage to make your PDFs, I've found that it works
 best to convert a single TIFF into a single-page PDF, then use
 pdftk[1] (along with a [language of your choice] script) to put those
 PDFs together however you want them.
 
 Have a nice day,
 Jonathan
 
 [1] http://www.accesspdf.com/pdftk/
 

- --
- ---
James Tuttle
Digital Repository Librarian

NCSU Libraries, Box 7111
North Carolina State University
Raleigh, NC 27695-7111
[EMAIL PROTECTED]

(919)513-0651 Phone
(919)515-3031  Fax

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI+QviKxpLzx+LOWMRAp1gAJ9ipNqWDxNPubPIl9qoo00XWqrn0gCgkR1R
fDkLic6eBVmRr6G4rvVSU3s=
=ySuL
-END PGP SIGNATURE-