What about just a Google site search?

-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nathan 
Tallman
Sent: Wednesday, February 20, 2013 12:54 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Providing Search Across PDFs

My institution is looking for ways to provide search across PDFs through our 
website. Specifically, PDFs linked from finding aids. Ideally searching within 
a collection's PDFs or possibly across all PDFs linked from all finding aids.

We do not have a CMS or a digital repository. A digital repository is on the 
horizon, but it's a ways out and we need to offer the search sooner.
I've looked into Swish-e but haven't had much luck getting anything off the 
ground.

One way we know we can do this through our discovery layer VuFind, using it's 
ability to full-text index a website based on a sitemap (which would includes 
PDFs linked from finding aids). Facets could be created for  collections, and 
we may be able to create a search box on the finding aid nav that searches 
specifically that collection.

But, I'm not sure how scalable that solution is. The indexing agent cannot 
discern when a page was updated, so it has to re-scrape, everything, 
every-night. The impetus collection is going to have about over
1000 PDFs. And that's to start. Creating the index will start to take a long, 
long time.

Does anyone have any ideas or know of any useful tools for this project?
Doesn't have to be perfect, quick and dirty may work. (The OCR's dirty anyway 
:-)

Thanks,
Nathan

Reply via email to