Sorry, I forgot to include the attachment. 2011/5/23 Antonio Perez-Aranda <aperezara...@yaco.es>: > Indexing mail attachments with Dovecot + Solr. > > This patch has been tested with these versions: > * dovecot 2.0.9 > * apache-solr 1.4.1 > > This is a patch for the fts-solr plugin (that indexes mail messages > for Dovecot with Solr). In main stream, the plugin does not index > attachments; With this patch, you can index mails and their > attachments (pdf, docs, openoffice docs...) . You can get others > goodies with this patch and the Solr > Config provided, like Synonyms and Stemming (Spanish by default). > > Attachment indexing is provided by Solr Cell and Tika > (ExtractingRequestHandler) > * http://wiki.apache.org/solr/ExtractingRequestHandler > > Synonyms and Stemming are provided by SnowballPorterFilterFactory from > Solr Language Analysis: > * http://wiki.apache.org/solr/LanguageAnalysis > > We have tested Solr with Tomcat and Jetty. Tomcat is better to handle > UTF-8 and bigger POSTS. > > Attachments file format supported > * http://tika.apache.org/0.9/formats.html > > At present, attachments in attachments (like, for example, attachments > in fordwarded "eml" attachments) are not indexed. Also, keep in mind > that there are many types of files, and many variants of the same file > type. Per Example, some pdf files are "not readable" by solr pdf > reader. > > Config: > > There are two new options added to fts_solr property: > * index-attachments > Enable attachments indexing. > * manual-update > Avoid index on user search. You can trigger indexing using > doveadm search or doveadm index commands. > > There is a new property for the section plugin to filter the mimetypes > that you want to index. > * fts_solr_mimetype > files with this mimetype will be sent to solr. > > After integrating solr directory in your solr config, and building > Dovecot with fts-solr support and with fts-solr-attachments-r885.patch > applied, you can update your dovecot config by adding to your > dovecot.conf: > > ... > mail_pluings = $mail_plugins fts fts_solr > > plugin { > fts = solr > fts_solr = url=http://solrhost:8983/solr/ break-imap-search > index-attachments > fts_solr_mimetype = application/x-pdf > application/vnd.openxmlformats-officedocument.wordprocessingml.document > } > ... > > > > -- > Antonio Pérez-Aranda Alcaide > aperezara...@yaco.es > > Yaco Sistemas S.L. > http://www.yaco.es/ > C/ Rioja 5, 41001 Sevilla > Teléfono +34 954 50 00 57 > Fax +34 954 50 09 29 >
-- Antonio Pérez-Aranda Alcaide aperezara...@yaco.es Yaco Sistemas S.L. http://www.yaco.es/ C/ Rioja 5, 41001 Sevilla Teléfono +34 954 50 00 57 Fax +34 954 50 09 29
fts-solr-attachments-r885.tar.gz
Description: GNU Zip compressed data