On Tue, Jul 28, 2009 at 1:58 PM, Torsten
Bronger<bron...@physik.rwth-aachen.de> wrote:
>
> Hallöchen!
>
> Russell Keith-Magee writes:
>
>> [...]
>>
>> As one data point - I use Sphinx fairly extensively at work. During
>> development, I looked at Lucene as well, in the form of Solr (which is
>> a nice wrapper around Lucene).
>
> I program a bibliography manager with Django.  Users are able to
> upload the papers as PDFs, and I'd like to add full-text search.  So
> far, I planned to user Solr Cell for it, but can one also do that
> with Sphinx?

Sphinx doesn't provide any native utilities for handling PDF (that I
am aware of), so you'll need to find a way to extract the text from
the PDF and put it into your database ready for indexing.

There are plenty of tools available to do this sort of processing.
Their effectiveness depends very much on the quality of the PDF going
in. PDFs that have been digitally produced (i.e. PDF printer)
generally work pretty well; PDFs containing bitmaps of scanned pages
are a minefield.

Yours,
Russ Magee %-)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to