[ 
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934800#action_12934800
 ] 

Peter Sturge commented on SOLR-2245:
------------------------------------

This latest version of the updated MailEntityProcessor adds a few new features:

1. Incorporated SOLR-1958 (exception if fetchMailsSince isn't specified) into 
this patch
2. Added a hacky version of delta mail retrieval for scheduled import runs:
       The new property is called 'deltaFetch'. If 'true', the first time the 
import is run, it will read the 'fetchMailsSince' property and import as normal
       On subsequent runs (within the same process session), the import will 
only fetch mail since the last run.
       Because it uses a runtime system property to hold the last_index_time, 
and there is currently no persistence, if/when the server is restarted, the 
last_index_time is not saved and the original fetchMailsSince value is used.
       I couldn't find exposed APIs for the dataimport.properties file (all the 
methods are private or pkg protected), persistence is not included in this 
patch version
3. Added support for including shared folders in the import
4. Added support for including personal folders (other folders) in the import

A typical {{monospaced}}<entity>{{monospaced}} element in data-config.xml might 
look something like this:

{{monospaced}}
    <entity name="email"
      user="u...@mydomain.com" 
      password="userpwd" 
      host="imap.mydomain.com" 
      fetchMailsSince="2010-08-01 00:00:00" 
      deltaFetch="true"
      include=""
      exclude=""
      recurse="false"
      folders="INBOX,Inbox,inbox"
      includeContent="true"
      processAttachments="true"
      includeOtherUserFolders="true"
      includeSharedFolders="true"
      batchSize="100"
      processor="MailEntityProcessor"
      protocol="imap"/>
{{monospaced}}


> MailEntityProcessor Update
> --------------------------
>
>                 Key: SOLR-2245
>                 URL: https://issues.apache.org/jira/browse/SOLR-2245
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4, 1.4.1
>            Reporter: Peter Sturge
>            Priority: Minor
>             Fix For: 1.4.2
>
>         Attachments: SOLR-2245.patch
>
>
> This patch addresses a number of issues in the MailEntityProcessor 
> contrib-extras module.
> The changes are outlined here:
> * Added an 'includeContent' entity attribute to allow specifying content to 
> be included independently of processing attachments
>      e.g. <entity includeContent="true" processAttachments="false" . . . /> 
> would include message content, but not attachment content
> * Added a synonym called 'processAttachments', which is synonymous to the 
> mis-spelled (and singular) 'processAttachement' property. This property 
> functions the same as processAttachement. Default= 'true' - if either is 
> false, then attachments are not processed. Note that only one of these should 
> really be specified in a given <entity> tag.
> * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is 
> unread, not deleted etc.), there is still a property value stored in the 
> 'flags' field (the value is the string "none")
> Note: there is a potential backward compat issue with FLAGS.NONE for clients 
> that expect the absence of the 'flags' field to mean 'Not read'. I'm 
> calculating this would be extremely rare, and is inadviasable in any case as 
> user flags can be arbitrarily set, so fixing it up now will ensure future 
> client access will be consistent.
> * The folder name of an email is now included as a field called 'folder' 
> (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing 
> processing
> * The addPartToDocument() method that processes attachments is significantly 
> re-written, as there looked to be no real way the existing code would ever 
> actually process attachment content and add it to the row data
> Tested on the 3.x trunk with a number of popular imap servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to