Sorry, I forgot to include the attachment.

2011/5/23 Antonio Perez-Aranda <aperezara...@yaco.es>:
> Indexing mail attachments with Dovecot + Solr.
>
> This patch has been tested with these versions:
>  * dovecot 2.0.9
>  * apache-solr 1.4.1
>
> This is a patch for the fts-solr plugin (that indexes mail messages
> for Dovecot with Solr). In main stream, the plugin does not index
> attachments; With this patch, you can index mails and their
> attachments (pdf, docs, openoffice docs...) . You can get others
> goodies with this patch and the Solr
> Config provided, like Synonyms and Stemming (Spanish by default).
>
> Attachment indexing is provided by Solr Cell and Tika 
> (ExtractingRequestHandler)
>  * http://wiki.apache.org/solr/ExtractingRequestHandler
>
> Synonyms and Stemming are provided by SnowballPorterFilterFactory from
> Solr Language Analysis:
>  * http://wiki.apache.org/solr/LanguageAnalysis
>
> We have tested Solr with Tomcat and Jetty. Tomcat is better to handle
> UTF-8 and bigger POSTS.
>
> Attachments file format supported
>  * http://tika.apache.org/0.9/formats.html
>
> At present, attachments in attachments (like, for example, attachments
> in fordwarded "eml" attachments) are not indexed. Also, keep in mind
> that there are many types of files, and many variants of the same file
> type. Per Example, some pdf files are "not readable" by solr pdf
> reader.
>
> Config:
>
> There are two new options added to fts_solr property:
>  * index-attachments
>       Enable attachments indexing.
>  * manual-update
>       Avoid index on user search. You can trigger indexing using
> doveadm search or doveadm index commands.
>
> There is a new property for the section plugin to filter the mimetypes
> that you want to index.
>  * fts_solr_mimetype
>       files with this mimetype will be sent to solr.
>
> After integrating solr directory in your solr config, and building
> Dovecot with fts-solr support and with fts-solr-attachments-r885.patch
> applied, you can update your dovecot config by adding to your
> dovecot.conf:
>
> ...
> mail_pluings = $mail_plugins fts fts_solr
>
> plugin {
>   fts = solr
>   fts_solr = url=http://solrhost:8983/solr/ break-imap-search
> index-attachments
>   fts_solr_mimetype = application/x-pdf
> application/vnd.openxmlformats-officedocument.wordprocessingml.document
> }
> ...
>
>
>
> --
> Antonio Pérez-Aranda Alcaide
> aperezara...@yaco.es
>
> Yaco Sistemas S.L.
> http://www.yaco.es/
> C/ Rioja 5, 41001 Sevilla
> Teléfono +34 954 50 00 57
> Fax      +34 954 50 09 29
>



-- 
Antonio Pérez-Aranda Alcaide
aperezara...@yaco.es

Yaco Sistemas S.L.
http://www.yaco.es/
C/ Rioja 5, 41001 Sevilla
Teléfono +34 954 50 00 57
Fax      +34 954 50 09 29

Attachment: fts-solr-attachments-r885.tar.gz
Description: GNU Zip compressed data

Reply via email to