See if they are stored in BSON format using GridFS. If so, you can simply
use the mongofiles command to retrieve the PDF into a local file and index
that in Solr either using Solr Cell or Tika.
See:
http://blog.mongodb.org/post/183689081/storing-large-objects-and-files-in-mongodb
https://docs.mong
Anyone have any experience indexing pdfs stored in binary form in mongodb?
.
Gabe Arnett
Senior Director
Moody's Analytics
-
The information contained in this e-mail message, and any attachment thereto,
is c