Re: [Dspace-tech] Searching PDF-scanned documents: Adobe Capture asolution?

2007-07-04 Thread Graham Triggs
Hi,

The problem with your scanning attempts is that you are just capturing
an image of the page. To have searchable content, you need to perform
optical character recognition on the images.

According to:
http://www.adobe.com/uk/products/acrcapture/

Then yes, this will create PDFs that contain searchable words - although
with all OCR solutions, there is the question of accuracy, and for that
you would need the opinion of someone with experience of using the
product.

G

On Wed, 2007-07-04 at 12:55 +0200, Jennifer Ash wrote:
 Dear Community Members
 
  
 
 The Water Research Commission (WRC, South Africa) is currently
 assessing a pilot installation of DSpace.
 
 We want to use DSpace to store, search and retrieve all our WRC
 research reports and Water SA (a scientific publication, 4 issues pa)
 issues (this is the primary goal; other collections will most likely
 be added over time).
 
 We are faced with a problem in that most of our older publications are
 not in electronic format and will have to be scanned.
 
 Scanning and saving as PDF does not provide a full text searchable
 document in DSpace; I've tried it.
 
  
 
 A product, Adobe Capture, is advertised as a 'tool that teams with
 your scanner to convert volumes of paper documents into searchable
 Adobe Portable Document Format (PDF) files'.
 
 We are keen to investigate this product but there are no trial
 downloads offered by Adobe.
 
 Do you have any knowledge of this product? Can you advise on a
 suitable tehnology solution for our problem? Our backlog is vast and
 spans many years, so there are loads of documents that need to be
 scanned.
 
  
 
 I do hope someone can give me advice.
 
  
 
 Kind regards
 
  
 
  
 
 Jennifer Ash 
 ……
 Business Systems Manager
 Water Research Commission 
 Private Bag X03 
 GEZINA (Pretoria) 
 0031 
 Tel: (012) 330-9036 / 330-0340 
 Fax: (012) 330-9010 / 331-2565 
 E-mail: [EMAIL PROTECTED] 
 
  
 
 
  
 DISCLAIMER AND CONFIDENTIALITY NOTE: All factual and other information
 within this e-mail, including any attachments relating to the official
 business of the Water Research Commission (WRC), is the property of
 the WRC. It is confidential, legally privileged and protected against
 unauthorized use. The WRC neither owns nor endorses any other content.
 Views and opinions are those of the senders unless clearly stated as
 being that of the WRC. The addressee in the e-mail is the intended
 recipient. Please notify the sender immediately if it has
 unintentionally reached you and do not read, disclose or use the
 content in any way whatsoever. The WRC cannot assure that the
 integrity of this communication has been maintained nor that it is
 free of errors, viruses, interception or interferences.
 
  
 
  
 
 
 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___ DSpace-tech mailing list 
 DSpace-tech@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/dspace-tech 
 
 
This e-mail is confidential and should not be used by anyone who is not the 
original intended recipient. BioMed Central Limited does not accept liability 
for any statements made which are clearly the sender's own and not expressly 
made on behalf of BioMed Central Limited. No contracts may be concluded on 
behalf of BioMed Central Limited by means of e-mail communication. BioMed 
Central Limited Registered in England and Wales with registered number 3680030 
Registered Office Middlesex House, 34-42 Cleveland Street, London W1T 4LB

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Searching PDF-scanned documents: Adobe Capture asolution?

2007-07-04 Thread Cory Snavely
Another way to get experience with the quality of Acrobat OCR is to use Acrobat 
Pro, which can do functionally the same thing, with a less batch-oriented 
interface. We ended up using this at a fairly large scale to meet a similar 
need.

We have documentation on preparing PDFs that we supply for submitters, and that 
you may find useful, at

http://deepblue.lib.umich.edu/html/2027.42/40244/PDF-Best_Practice.html

The section toward the bottom provides instructions on making image PDF files 
searchable.

Cory Snavely
University of Michigan Library IT Core Services
  - Original Message - 
  From: Jennifer Ash 
  To: dspace-tech@lists.sourceforge.net 
  Sent: Wednesday, July 04, 2007 6:55 AM
  Subject: [Dspace-tech] Searching PDF-scanned documents: Adobe Capture 
asolution?


  Dear Community Members



  The Water Research Commission (WRC, South Africa) is currently assessing a 
pilot installation of DSpace.

  We want to use DSpace to store, search and retrieve all our WRC research 
reports and Water SA (a scientific publication, 4 issues pa) issues (this is 
the primary goal; other collections will most likely be added over time).

  We are faced with a problem in that most of our older publications are not in 
electronic format and will have to be scanned.

  Scanning and saving as PDF does not provide a full text searchable document 
in DSpace; I've tried it.



  A product, Adobe Capture, is advertised as a 'tool that teams with your 
scanner to convert volumes of paper documents into searchable Adobe Portable 
Document Format (PDF) files'.

  We are keen to investigate this product but there are no trial downloads 
offered by Adobe.

  Do you have any knowledge of this product? Can you advise on a suitable 
tehnology solution for our problem? Our backlog is vast and spans many years, 
so there are loads of documents that need to be scanned.



  I do hope someone can give me advice.



  Kind regards





  Jennifer Ash 
  ..
  Business Systems Manager
  Water Research Commission 
  Private Bag X03 
  GEZINA (Pretoria) 
  0031 
  Tel: (012) 330-9036 / 330-0340 
  Fax: (012) 330-9010 / 331-2565 
  E-mail: [EMAIL PROTECTED] 




  DISCLAIMER AND CONFIDENTIALITY NOTE: All factual and other information within 
this e-mail, including any attachments relating to the official business of the 
Water Research Commission (WRC), is the property of the WRC. It is 
confidential, legally privileged and protected against unauthorized use. The 
WRC neither owns nor endorses any other content. Views and opinions are those 
of the senders unless clearly stated as being that of the WRC. The addressee in 
the e-mail is the intended recipient. Please notify the sender immediately if 
it has unintentionally reached you and do not read, disclose or use the content 
in any way whatsoever. The WRC cannot assure that the integrity of this 
communication has been maintained nor that it is free of errors, viruses, 
interception or interferences. 

   






--


  -
  This SF.net email is sponsored by DB2 Express
  Download DB2 Express C - the FREE version of DB2 express and take
  control of your XML. No limits. Just data. Click to get it now.
  http://sourceforge.net/powerbar/db2/


--


  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech