[jira] Commented: (SOLR-947) QueryParsing.toString() should check ConstantScoreRangeQuery before RangeQuery

2009-01-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660193#action_12660193
 ] 

Yonik Seeley commented on SOLR-947:
---

Ah, this was introduced when Lucene changed ConstantScoreRangeQuery to inherit 
from RangeQuery.


> QueryParsing.toString() should check ConstantScoreRangeQuery before RangeQuery
> --
>
> Key: SOLR-947
> URL: https://issues.apache.org/jira/browse/SOLR-947
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.3
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-947.patch
>
>
> This
> {code:title=QueryParsing.toString()}
> if (query instanceof TermQuery) {
> :
> } else if (query instanceof RangeQuery) {
> :
> } else if (query instanceof ConstantScoreRangeQuery) {
> :
> }
> :
> {code}
> should be
> {code:title=QueryParsing.toString()}
> if (query instanceof TermQuery) {
> :
> } else if (query instanceof ConstantScoreRangeQuery) {
>   :
> } else if (query instanceof RangeQuery) {
>   :
> }
> :
> {code}
> This causes NPE when open ended range query (price:[1 TO *]) with 
> debugQuery=on.
> This is reported on the thread:
> http://www.nabble.com/http-internal-error-if-i-enable-debugQuery%3Don-td21210570.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-01 Thread Preetam Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Preetam Rao updated SOLR-934:
-

Attachment: SOLR-934.patch

Most of the features are implemented now.
Test cases also updated.

> Enable importing of mails into a solr index through DIH.
> 
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Preetam Rao
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments. The folders to fetch can be made configurable based on various 
> criteria. Apache Tika is used for extracting content from different kinds of 
> attachments. JavaMail is used for mail box related operations like fetching 
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> 
> password="something" host="imap.gmail.com" protocol="imaps"/>
> 
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> -
> *user* 
> *pwd* 
> *protocol*  (only "imaps" supported now)
> *host* 
> {color:green}Optional{color}
> -
> *folders* - comma seperated list of folders. 
> If not specified, default folder is used. Nested folders can be specified 
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns. 
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder. 
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 6ms
> *conectTimeout* - defaults to 3ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in miliiseconds, mails received after which will be fetched. Useful 
> for delta import.
> *customFilter* - class name.  
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-01 Thread Preetam Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Preetam Rao updated SOLR-934:
-

   Description: 
Enable importing of mails into solr through DIH. Take one or more mailbox 
credentials, download and index their content along with the content from 
attachments. The folders to fetch can be made configurable based on various 
criteria. Apache Tika is used for extracting content from different kinds of 
attachments. JavaMail is used for mail box related operations like fetching 
mails, filtering them etc.

The basic configuration for one mail box is as below:

{code:xml}

   

{code}

The below is the list of all configuration available:

{color:green}Required{color}
-
*user* 
*pwd* 
*protocol*  (only "imaps" supported now)
*host* 

{color:green}Optional{color}
-
*folders* - comma seperated list of folders. 
If not specified, default folder is used. Nested folders can be specified like 
a/b/c
*recurse* - index subfolders. Defaults to true.
*exclude* - comma seperated list of patterns. 
*include* - comma seperated list of patterns.
*batchSize* - mails to fetch at once in a given folder. 
Only headers can be prefetched in Javamail IMAP.
*readTimeout* - defaults to 6ms
*conectTimeout* - defaults to 3ms
*fetchSize* - IMAP config. 32KB default
*fetchMailsSince* -
date/time in miliiseconds, mails received after which will be fetched. Useful 
for delta import.
*customFilter* - class name.  
{code}
import javax.mail.Folder;
import javax.mail.SearchTerm;

clz implements MailEntityProcessor.CustomFilter() {
public SearchTerm getCustomSearch(Folder folder);
}
{code}
*processAttachement* - defaults to true

  was:
Enable importing of mails into solr through DIH. Take one or more mailbox 
credentials, download and index their content along with the content from 
attachments.

The folders to fetch can be made configurable based on various criteria.

Apache Tika can be used for extracting content from different kinds of 
attachments.
JavaMail can be used for mail box related operations like fetching mails, 
filtering them etc.

The basic configuration for one mail box can look something like this:
{code:xml}

   

{code}

- This can be enhanced with timeouts, list to be read from a file, folder 
filters, delta import etc.

Remaining Estimate: 24h  (was: 120h)
 Original Estimate: 24h  (was: 120h)

> Enable importing of mails into a solr index through DIH.
> 
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Preetam Rao
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments. The folders to fetch can be made configurable based on various 
> criteria. Apache Tika is used for extracting content from different kinds of 
> attachments. JavaMail is used for mail box related operations like fetching 
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> 
> password="something" host="imap.gmail.com" protocol="imaps"/>
> 
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> -
> *user* 
> *pwd* 
> *protocol*  (only "imaps" supported now)
> *host* 
> {color:green}Optional{color}
> -
> *folders* - comma seperated list of folders. 
> If not specified, default folder is used. Nested folders can be specified 
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns. 
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder. 
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 6ms
> *conectTimeout* - defaults to 3ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in miliiseconds, mails received after which will be fetched. Useful 
> for delta import.
> *customFilter* - class name.  
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-01 Thread Preetam Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660194#action_12660194
 ] 

preetam edited comment on SOLR-934 at 1/1/09 7:00 AM:
--

Most of the features are implemented now.
Test cases also updated.

- recursion supported.
- folders can be selected/excluded by list of comma separated patterns
- mails can be fetched since a predefined receive date/time
- custom filters can be plugged in
- batching supported

TODO
- currently testbed needs to be setup manually. Create folders in testcase 
setup().
- support POP3
- any reveiws/feedbacks/cleanup

attaching all the dependency jars as an attachment so that one does not have to 
search them. May be it should be integrated through ant-maven tasks or maven 
directly.

  was (Author: preetam):
Most of the features are implemented now.
Test cases also updated.
  
> Enable importing of mails into a solr index through DIH.
> 
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Preetam Rao
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments. The folders to fetch can be made configurable based on various 
> criteria. Apache Tika is used for extracting content from different kinds of 
> attachments. JavaMail is used for mail box related operations like fetching 
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> 
> password="something" host="imap.gmail.com" protocol="imaps"/>
> 
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> -
> *user* 
> *pwd* 
> *protocol*  (only "imaps" supported now)
> *host* 
> {color:green}Optional{color}
> -
> *folders* - comma seperated list of folders. 
> If not specified, default folder is used. Nested folders can be specified 
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns. 
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder. 
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 6ms
> *conectTimeout* - defaults to 3ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in miliiseconds, mails received after which will be fetched. Useful 
> for delta import.
> *customFilter* - class name.  
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-01 Thread Preetam Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Preetam Rao updated SOLR-934:
-

Description: 
Enable importing of mails into solr through DIH. Take one or more mailbox 
credentials, download and index their content along with the content from 
attachments. The folders to fetch can be made configurable based on various 
criteria. Apache Tika is used for extracting content from different kinds of 
attachments. JavaMail is used for mail box related operations like fetching 
mails, filtering them etc.

The basic configuration for one mail box is as below:

{code:xml}

   

{code}

The below is the list of all configuration available:

{color:green}Required{color}
-
*user* 
*pwd* 
*protocol*  (only "imaps" supported now)
*host* 

{color:green}Optional{color}
-
*folders* - comma seperated list of folders. 
If not specified, default folder is used. Nested folders can be specified like 
a/b/c
*recurse* - index subfolders. Defaults to true.
*exclude* - comma seperated list of patterns. 
*include* - comma seperated list of patterns.
*batchSize* - mails to fetch at once in a given folder. 
Only headers can be prefetched in Javamail IMAP.
*readTimeout* - defaults to 6ms
*conectTimeout* - defaults to 3ms
*fetchSize* - IMAP config. 32KB default
*fetchMailsSince* -
date/time in miliiseconds, mails received after which will be fetched. Useful 
for delta import.
*customFilter* - class name.  
{code}
import javax.mail.Folder;
import javax.mail.SearchTerm;

clz implements MailEntityProcessor.CustomFilter() {
public SearchTerm getCustomSearch(Folder folder);
}
{code}
*processAttachement* - defaults to true

The below are the indexed fields.

{code}
  // Fields To Index
  // single valued
  private static final String SUBJECT = "subject";
  private static final String FROM = "from";
  private static final String SENT_DATE = "sentDate";
  private static final String XMAILER = "xMailer";
  // multi valued
  private static final String TO_CC_BCC = "allTo";
  private static final String FLAGS = "flags";
  private static final String CONTENT = "content";
  private static final String ATTACHMENT = "attachement";
  private static final String ATTACHMENT_NAMES = "attachementNames";
  // flag values
  private static final String FLAG_ANSWERED = "answered";
  private static final String FLAG_DELETED = "deleted";
  private static final String FLAG_DRAFT = "draft";
  private static final String FLAG_FLAGGED = "flagged";
  private static final String FLAG_RECENT = "recent";
  private static final String FLAG_SEEN = "seen";
{code}

  was:
Enable importing of mails into solr through DIH. Take one or more mailbox 
credentials, download and index their content along with the content from 
attachments. The folders to fetch can be made configurable based on various 
criteria. Apache Tika is used for extracting content from different kinds of 
attachments. JavaMail is used for mail box related operations like fetching 
mails, filtering them etc.

The basic configuration for one mail box is as below:

{code:xml}

   

{code}

The below is the list of all configuration available:

{color:green}Required{color}
-
*user* 
*pwd* 
*protocol*  (only "imaps" supported now)
*host* 

{color:green}Optional{color}
-
*folders* - comma seperated list of folders. 
If not specified, default folder is used. Nested folders can be specified like 
a/b/c
*recurse* - index subfolders. Defaults to true.
*exclude* - comma seperated list of patterns. 
*include* - comma seperated list of patterns.
*batchSize* - mails to fetch at once in a given folder. 
Only headers can be prefetched in Javamail IMAP.
*readTimeout* - defaults to 6ms
*conectTimeout* - defaults to 3ms
*fetchSize* - IMAP config. 32KB default
*fetchMailsSince* -
date/time in miliiseconds, mails received after which will be fetched. Useful 
for delta import.
*customFilter* - class name.  
{code}
import javax.mail.Folder;
import javax.mail.SearchTerm;

clz implements MailEntityProcessor.CustomFilter() {
public SearchTerm getCustomSearch(Folder folder);
}
{code}
*processAttachement* - defaults to true


> Enable importing of mails into a solr index through DIH.
> 
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Preetam Rao
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments. The fol

[jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-01 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660208#action_12660208
 ] 

Noble Paul commented on SOLR-934:
-

looks good. A few observations.
* the init must call super.init()
* Right before returning nextRow() ,call super.applyTransformer(row)
* Returning null signals end of rows. Close any connections or do cleanup
* 'exclude' and 'include' should either allow for escaping comma (between 
multiple regex) or it can just take one reex for the time being

> Enable importing of mails into a solr index through DIH.
> 
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Preetam Rao
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments. The folders to fetch can be made configurable based on various 
> criteria. Apache Tika is used for extracting content from different kinds of 
> attachments. JavaMail is used for mail box related operations like fetching 
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> 
> password="something" host="imap.gmail.com" protocol="imaps"/>
> 
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> -
> *user* 
> *pwd* 
> *protocol*  (only "imaps" supported now)
> *host* 
> {color:green}Optional{color}
> -
> *folders* - comma seperated list of folders. 
> If not specified, default folder is used. Nested folders can be specified 
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns. 
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder. 
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 6ms
> *conectTimeout* - defaults to 3ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in miliiseconds, mails received after which will be fetched. Useful 
> for delta import.
> *customFilter* - class name.  
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true
> The below are the indexed fields.
> {code}
>   // Fields To Index
>   // single valued
>   private static final String SUBJECT = "subject";
>   private static final String FROM = "from";
>   private static final String SENT_DATE = "sentDate";
>   private static final String XMAILER = "xMailer";
>   // multi valued
>   private static final String TO_CC_BCC = "allTo";
>   private static final String FLAGS = "flags";
>   private static final String CONTENT = "content";
>   private static final String ATTACHMENT = "attachement";
>   private static final String ATTACHMENT_NAMES = "attachementNames";
>   // flag values
>   private static final String FLAG_ANSWERED = "answered";
>   private static final String FLAG_DELETED = "deleted";
>   private static final String FLAG_DRAFT = "draft";
>   private static final String FLAG_FLAGGED = "flagged";
>   private static final String FLAG_RECENT = "recent";
>   private static final String FLAG_SEEN = "seen";
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-01 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660210#action_12660210
 ] 

Grant Ingersoll commented on SOLR-934:
--

Would it make more sense for DIH to farm out it's content acquisition to a 
library like Droids?  Then, we could have real crawling, etc. all through a 
pluggable connector framework.

> Enable importing of mails into a solr index through DIH.
> 
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Preetam Rao
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments. The folders to fetch can be made configurable based on various 
> criteria. Apache Tika is used for extracting content from different kinds of 
> attachments. JavaMail is used for mail box related operations like fetching 
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> 
> password="something" host="imap.gmail.com" protocol="imaps"/>
> 
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> -
> *user* 
> *pwd* 
> *protocol*  (only "imaps" supported now)
> *host* 
> {color:green}Optional{color}
> -
> *folders* - comma seperated list of folders. 
> If not specified, default folder is used. Nested folders can be specified 
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns. 
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder. 
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 6ms
> *conectTimeout* - defaults to 3ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in miliiseconds, mails received after which will be fetched. Useful 
> for delta import.
> *customFilter* - class name.  
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true
> The below are the indexed fields.
> {code}
>   // Fields To Index
>   // single valued
>   private static final String SUBJECT = "subject";
>   private static final String FROM = "from";
>   private static final String SENT_DATE = "sentDate";
>   private static final String XMAILER = "xMailer";
>   // multi valued
>   private static final String TO_CC_BCC = "allTo";
>   private static final String FLAGS = "flags";
>   private static final String CONTENT = "content";
>   private static final String ATTACHMENT = "attachement";
>   private static final String ATTACHMENT_NAMES = "attachementNames";
>   // flag values
>   private static final String FLAG_ANSWERED = "answered";
>   private static final String FLAG_DELETED = "deleted";
>   private static final String FLAG_DRAFT = "draft";
>   private static final String FLAG_FLAGGED = "flagged";
>   private static final String FLAG_RECENT = "recent";
>   private static final String FLAG_SEEN = "seen";
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-01 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660214#action_12660214
 ] 

Noble Paul commented on SOLR-934:
-

bq.Would it make more sense for DIH to farm out it's content acquisition to a 
library like Droids

It would be great. It should be possible to have a DroidEntityProcessor one 
day.  


> Enable importing of mails into a solr index through DIH.
> 
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Preetam Rao
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments. The folders to fetch can be made configurable based on various 
> criteria. Apache Tika is used for extracting content from different kinds of 
> attachments. JavaMail is used for mail box related operations like fetching 
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> 
> password="something" host="imap.gmail.com" protocol="imaps"/>
> 
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> -
> *user* 
> *pwd* 
> *protocol*  (only "imaps" supported now)
> *host* 
> {color:green}Optional{color}
> -
> *folders* - comma seperated list of folders. 
> If not specified, default folder is used. Nested folders can be specified 
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns. 
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder. 
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 6ms
> *conectTimeout* - defaults to 3ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in miliiseconds, mails received after which will be fetched. Useful 
> for delta import.
> *customFilter* - class name.  
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true
> The below are the indexed fields.
> {code}
>   // Fields To Index
>   // single valued
>   private static final String SUBJECT = "subject";
>   private static final String FROM = "from";
>   private static final String SENT_DATE = "sentDate";
>   private static final String XMAILER = "xMailer";
>   // multi valued
>   private static final String TO_CC_BCC = "allTo";
>   private static final String FLAGS = "flags";
>   private static final String CONTENT = "content";
>   private static final String ATTACHMENT = "attachement";
>   private static final String ATTACHMENT_NAMES = "attachementNames";
>   // flag values
>   private static final String FLAG_ANSWERED = "answered";
>   private static final String FLAG_DELETED = "deleted";
>   private static final String FLAG_DRAFT = "draft";
>   private static final String FLAG_FLAGGED = "flagged";
>   private static final String FLAG_RECENT = "recent";
>   private static final String FLAG_SEEN = "seen";
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-01 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660208#action_12660208
 ] 

noble.paul edited comment on SOLR-934 at 1/1/09 11:22 AM:
--

looks good. A few observations.
* the init must call super.init()
* Right before returning nextRow() ,call super.applyTransformer(row)
* Returning null signals end of rows. Close any connections or do cleanup
* 'exclude' and 'include' should either allow for escaping comma (between 
multiple regex) or it can just take one reex for the time being
* For fetchMailsSince use the format -MM-dd HH:mm:ss. There is already an 
instance DataImporter.DATE_TIME_FORMAT

  was (Author: noble.paul):
looks good. A few observations.
* the init must call super.init()
* Right before returning nextRow() ,call super.applyTransformer(row)
* Returning null signals end of rows. Close any connections or do cleanup
* 'exclude' and 'include' should either allow for escaping comma (between 
multiple regex) or it can just take one reex for the time being
  
> Enable importing of mails into a solr index through DIH.
> 
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Preetam Rao
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments. The folders to fetch can be made configurable based on various 
> criteria. Apache Tika is used for extracting content from different kinds of 
> attachments. JavaMail is used for mail box related operations like fetching 
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> 
> password="something" host="imap.gmail.com" protocol="imaps"/>
> 
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> -
> *user* 
> *pwd* 
> *protocol*  (only "imaps" supported now)
> *host* 
> {color:green}Optional{color}
> -
> *folders* - comma seperated list of folders. 
> If not specified, default folder is used. Nested folders can be specified 
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns. 
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder. 
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 6ms
> *conectTimeout* - defaults to 3ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in miliiseconds, mails received after which will be fetched. Useful 
> for delta import.
> *customFilter* - class name.  
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true
> The below are the indexed fields.
> {code}
>   // Fields To Index
>   // single valued
>   private static final String SUBJECT = "subject";
>   private static final String FROM = "from";
>   private static final String SENT_DATE = "sentDate";
>   private static final String XMAILER = "xMailer";
>   // multi valued
>   private static final String TO_CC_BCC = "allTo";
>   private static final String FLAGS = "flags";
>   private static final String CONTENT = "content";
>   private static final String ATTACHMENT = "attachement";
>   private static final String ATTACHMENT_NAMES = "attachementNames";
>   // flag values
>   private static final String FLAG_ANSWERED = "answered";
>   private static final String FLAG_DELETED = "deleted";
>   private static final String FLAG_DRAFT = "draft";
>   private static final String FLAG_FLAGGED = "flagged";
>   private static final String FLAG_RECENT = "recent";
>   private static final String FLAG_SEEN = "seen";
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-01-01 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660219#action_12660219
 ] 

Ryan McKinley commented on SOLR-773:


Thanks patrick!

Two things stick out to me:

1. LocalSolrQueryComponent duplicates most of the code from SolrQueryComponent. 
 Perhaps a better solution would be to have a custom 
[QParser|http://lucene.apache.org/solr/api/org/apache/solr/search/QParser.html] 
that builds the query and then add a SearchComponent to the chain to augment 
the results with the calculated distance.

2. (related)  If the query is implemented as a QParser, we would just need to 
implement:
{code:java}
  public SortSpec getSort(boolean useGlobalParams) throws ParseException 
{code}
rather then use the LocalSolrSortParser.

> Incorporate Local Lucene/Solr
> -
>
> Key: SOLR-773
> URL: https://issues.apache.org/jira/browse/SOLR-773
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr 
> components, but we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-01-01 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660220#action_12660220
 ] 

Ryan McKinley commented on SOLR-773:


Not a big deal, but it looks like the List plotters could 
be initialized once for the Factory then reused rather then initializing it for 
each request.

> Incorporate Local Lucene/Solr
> -
>
> Key: SOLR-773
> URL: https://issues.apache.org/jira/browse/SOLR-773
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr 
> components, but we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-01 Thread Ben Johnson
I'm watching this issue with interest, but I'm having trouble understanding 
the bigger picture.  I am prototyping a system that uses Restlet to store 
and index objects (mainly MS Office and OpenOffice documents and emails), so 
I am planning to use Solr with Tika to index the objects.


I know nothing about DIH (Distributed Index Handler?), so I'm not sure what 
role it plays with Solr.  Is it a vendor-specific technology (from 
Autonomy)?  What does it do?  Do you give it objects to index and it handles 
them by passing it to one or more Solr/Tika indexing servers?  And are you 
thinking that this would therefore be a good place to not only index the 
objects, but also pass the information about the digital content to DROID?


Reading a bit about DROID (from TNA, The National Archives), it seems like 
it is used to capture information about the digital content of objects 
stored in a content repository.  How does this fit with Solr?  I thought 
Solr with Tika just did the indexing of text-based objects, but the actual 
storage of the objects would be elsewhere (probably in the file system). 
From what I can tell, DROID would operate on the file system objects, not 

the indexing information.  Have I got this right?

Ideally, I would also like to convert any suitable content into PDF/A format 
for long-term archival - probably not relevant to this issue, but I thought 
I'd mention it in case you see an application of this as part of email and 
attachment storage.


Sorry for all the questions, but hopefully someone could clarify this for 
me!


Thanks very much
Ben Johnson

--
From: "Grant Ingersoll (JIRA)" 
Sent: Thursday, January 01, 2009 7:07 PM
To: 
Subject: [jira] Commented: (SOLR-934) Enable importing of mails into a solr 
index through DIH.




   [ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660210#action_12660210 ]


Grant Ingersoll commented on SOLR-934:
--

Would it make more sense for DIH to farm out it's content acquisition to a 
library like Droids?  Then, we could have real crawling, etc. all through 
a pluggable connector framework.



Enable importing of mails into a solr index through DIH.


Key: SOLR-934
URL: https://issues.apache.org/jira/browse/SOLR-934
Project: Solr
 Issue Type: New Feature
 Components: contrib - DataImportHandler
   Affects Versions: 1.4
   Reporter: Preetam Rao
   Assignee: Shalin Shekhar Mangar
Fix For: 1.4

Attachments: SOLR-934.patch, SOLR-934.patch

  Original Estimate: 24h
 Remaining Estimate: 24h

Enable importing of mails into solr through DIH. Take one or more mailbox 
credentials, download and index their content along with the content from 
attachments. The folders to fetch can be made configurable based on 
various criteria. Apache Tika is used for extracting content from 
different kinds of attachments. JavaMail is used for mail box related 
operations like fetching mails, filtering them etc.

The basic configuration for one mail box is as below:
{code:xml}

   password="something" host="imap.gmail.com" 
protocol="imaps"/>


{code}
The below is the list of all configuration available:
{color:green}Required{color}
-
*user*
*pwd*
*protocol*  (only "imaps" supported now)
*host*
{color:green}Optional{color}
-
*folders* - comma seperated list of folders.
If not specified, default folder is used. Nested folders can be specified 
like a/b/c

*recurse* - index subfolders. Defaults to true.
*exclude* - comma seperated list of patterns.
*include* - comma seperated list of patterns.
*batchSize* - mails to fetch at once in a given folder.
Only headers can be prefetched in Javamail IMAP.
*readTimeout* - defaults to 6ms
*conectTimeout* - defaults to 3ms
*fetchSize* - IMAP config. 32KB default
*fetchMailsSince* -
date/time in miliiseconds, mails received after which will be fetched. 
Useful for delta import.

*customFilter* - class name.
{code}
import javax.mail.Folder;
import javax.mail.SearchTerm;
clz implements MailEntityProcessor.CustomFilter() {
public SearchTerm getCustomSearch(Folder folder);
}
{code}
*processAttachement* - defaults to true
The below are the indexed fields.
{code}
  // Fields To Index
  // single valued
  private static final String SUBJECT = "subject";
  private static final String FROM = "from";
  private static final String SENT_DATE = "sentDate";
  private static final String XMAILER = "xMailer";
  // multi valued
  private static final String TO_CC_BCC = "allTo";
  private static final String FLAGS = "flags";
  private static final String CONTENT = "content";
  private static final String ATTACHMENT = "attachement";
  private static final String ATTACHMENT_NAMES = "attachement

[jira] Updated: (SOLR-773) Incorporate Local Lucene/Solr

2009-01-01 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-773:
---

Attachment: SOLR-773-local-lucene.patch

Here is a (totally untested) patch that uses QParser.

This requires some small tweeks to the QParser class to make the sort parsing 
extensible.

Take a look and see what you think...

> Incorporate Local Lucene/Solr
> -
>
> Key: SOLR-773
> URL: https://issues.apache.org/jira/browse/SOLR-773
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-773-local-lucene.patch, spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr 
> components, but we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-01 Thread Preetam Rao
Hi Ben,
DIH stands for Data Import Handler. Its main aim is to provide a away of
indexing data into solr from different kind of sources, mainly DB, Rest
APIs, Files etc. This issue deals with adding one more data source (which is
handled by something called EntityProcessor in DIH lingo) which is a IMAP
mail box. Tika is used in this case, for indexing attachments
which can be off any mime type.

I don't know much of Droid. But a quick read suggests that its for getting
the meta data from digital content, similar to Tika for various mime types.
One integration of solr and droid would be searching a digital library. We
use driod to get the meta data on content and store the data in solr for
searching. The real digital content would be still be somewhere and solr
documents will hold a pointer to that content.

Regarding storage, lucene is used for storing anything that a user stores as
key value pairs. One can store the extracted content itself. But since its
not going to help much, usually one would just the index content and store
the index into solr and have the indexed documents contain the pointer to
real document.

Hope this helps.

On Fri, Jan 2, 2009 at 4:30 AM, Ben Johnson <
ben.john...@jandpconsulting.co.uk> wrote:

> I'm watching this issue with interest, but I'm having trouble understanding
> the bigger picture.  I am prototyping a system that uses Restlet to store
> and index objects (mainly MS Office and OpenOffice documents and emails), so
> I am planning to use Solr with Tika to index the objects.
>
> I know nothing about DIH (Distributed Index Handler?), so I'm not sure what
> role it plays with Solr.  Is it a vendor-specific technology (from
> Autonomy)?  What does it do?  Do you give it objects to index and it handles
> them by passing it to one or more Solr/Tika indexing servers?  And are you
> thinking that this would therefore be a good place to not only index the
> objects, but also pass the information about the digital content to DROID?
>
> Reading a bit about DROID (from TNA, The National Archives), it seems like
> it is used to capture information about the digital content of objects
> stored in a content repository.  How does this fit with Solr?  I thought
> Solr with Tika just did the indexing of text-based objects, but the actual
> storage of the objects would be elsewhere (probably in the file system).
> From what I can tell, DROID would operate on the file system objects, not
> the indexing information.  Have I got this right?
>
> Ideally, I would also like to convert any suitable content into PDF/A
> format for long-term archival - probably not relevant to this issue, but I
> thought I'd mention it in case you see an application of this as part of
> email and attachment storage.
>
> Sorry for all the questions, but hopefully someone could clarify this for
> me!
>
> Thanks very much
> Ben Johnson
>
> --
> From: "Grant Ingersoll (JIRA)" 
> Sent: Thursday, January 01, 2009 7:07 PM
> To: 
> Subject: [jira] Commented: (SOLR-934) Enable importing of mails into a solr
> index through DIH.
>
>
>
>>   [
>> https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660210#action_12660210]
>>
>> Grant Ingersoll commented on SOLR-934:
>> --
>>
>> Would it make more sense for DIH to farm out it's content acquisition to a
>> library like Droids?  Then, we could have real crawling, etc. all through a
>> pluggable connector framework.
>>
>>  Enable importing of mails into a solr index through DIH.
>>> 
>>>
>>>Key: SOLR-934
>>>URL: https://issues.apache.org/jira/browse/SOLR-934
>>>Project: Solr
>>> Issue Type: New Feature
>>> Components: contrib - DataImportHandler
>>>   Affects Versions: 1.4
>>>   Reporter: Preetam Rao
>>>   Assignee: Shalin Shekhar Mangar
>>>Fix For: 1.4
>>>
>>>Attachments: SOLR-934.patch, SOLR-934.patch
>>>
>>>  Original Estimate: 24h
>>>  Remaining Estimate: 24h
>>>
>>> Enable importing of mails into solr through DIH. Take one or more mailbox
>>> credentials, download and index their content along with the content from
>>> attachments. The folders to fetch can be made configurable based on various
>>> criteria. Apache Tika is used for extracting content from different kinds of
>>> attachments. JavaMail is used for mail box related operations like fetching
>>> mails, filtering them etc.
>>> The basic configuration for one mail box is as below:
>>> {code:xml}
>>> 
>>>   >>password="something" host="imap.gmail.com"
>>> protocol="imaps"/>
>>> 
>>> {code}
>>> The below is the list of all configuration available:
>>> {color:green}Required{color}
>>> -
>>> *user*
>>> *pwd*
>>> *protocol*  (only "imaps" supported now)
>>> *host*
>>> {color:green}Opti

[jira] Updated: (SOLR-773) Incorporate Local Lucene/Solr

2009-01-01 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-773:
---

Attachment: SOLR-773-local-lucene.patch

This version runs, but still no tests.

I added spatial stuff to the example configs, but I'm not sure I like that long 
term.  The examples are getting a bit over cluttered.

http://localhost:8983/solr/select?q=*:*&qt=geo&lat=40&long=-75&radius=99

> Incorporate Local Lucene/Solr
> -
>
> Key: SOLR-773
> URL: https://issues.apache.org/jira/browse/SOLR-773
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr 
> components, but we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-01 Thread Noble Paul നോബിള്‍ नोब्ळ्
Hi Ben,
You can take a look at the wiki page for DIH
http://wiki.apache.org/solr/DataImportHandler

It helps you index mostly structured data into Solr from db, xml etc .
It can be considered as an ETL tool
(http://en.wikipedia.org/wiki/Extract,_transform,_load ) for Solr.

Adding mail support means you can index your emails into Sols with a
few lines of configuration
--Noble

On Fri, Jan 2, 2009 at 4:30 AM, Ben Johnson
 wrote:
> I'm watching this issue with interest, but I'm having trouble understanding
> the bigger picture.  I am prototyping a system that uses Restlet to store
> and index objects (mainly MS Office and OpenOffice documents and emails), so
> I am planning to use Solr with Tika to index the objects.
>
> I know nothing about DIH (Distributed Index Handler?), so I'm not sure what
> role it plays with Solr.  Is it a vendor-specific technology (from
> Autonomy)?  What does it do?  Do you give it objects to index and it handles
> them by passing it to one or more Solr/Tika indexing servers?  And are you
> thinking that this would therefore be a good place to not only index the
> objects, but also pass the information about the digital content to DROID?
>
> Reading a bit about DROID (from TNA, The National Archives), it seems like
> it is used to capture information about the digital content of objects
> stored in a content repository.  How does this fit with Solr?  I thought
> Solr with Tika just did the indexing of text-based objects, but the actual
> storage of the objects would be elsewhere (probably in the file system).
> From what I can tell, DROID would operate on the file system objects, not
> the indexing information.  Have I got this right?
>
> Ideally, I would also like to convert any suitable content into PDF/A format
> for long-term archival - probably not relevant to this issue, but I thought
> I'd mention it in case you see an application of this as part of email and
> attachment storage.
>
> Sorry for all the questions, but hopefully someone could clarify this for
> me!
>
> Thanks very much
> Ben Johnson
>
> --
> From: "Grant Ingersoll (JIRA)" 
> Sent: Thursday, January 01, 2009 7:07 PM
> To: 
> Subject: [jira] Commented: (SOLR-934) Enable importing of mails into a solr
> index through DIH.
>
>>
>>   [
>> https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660210#action_12660210
>> ]
>>
>> Grant Ingersoll commented on SOLR-934:
>> --
>>
>> Would it make more sense for DIH to farm out it's content acquisition to a
>> library like Droids?  Then, we could have real crawling, etc. all through a
>> pluggable connector framework.
>>
>>> Enable importing of mails into a solr index through DIH.
>>> 
>>>
>>>Key: SOLR-934
>>>URL: https://issues.apache.org/jira/browse/SOLR-934
>>>Project: Solr
>>> Issue Type: New Feature
>>> Components: contrib - DataImportHandler
>>>   Affects Versions: 1.4
>>>   Reporter: Preetam Rao
>>>   Assignee: Shalin Shekhar Mangar
>>>Fix For: 1.4
>>>
>>>Attachments: SOLR-934.patch, SOLR-934.patch
>>>
>>>  Original Estimate: 24h
>>>  Remaining Estimate: 24h
>>>
>>> Enable importing of mails into solr through DIH. Take one or more mailbox
>>> credentials, download and index their content along with the content from
>>> attachments. The folders to fetch can be made configurable based on various
>>> criteria. Apache Tika is used for extracting content from different kinds of
>>> attachments. JavaMail is used for mail box related operations like fetching
>>> mails, filtering them etc.
>>> The basic configuration for one mail box is as below:
>>> {code:xml}
>>> 
>>>   >>password="something" host="imap.gmail.com"
>>> protocol="imaps"/>
>>> 
>>> {code}
>>> The below is the list of all configuration available:
>>> {color:green}Required{color}
>>> -
>>> *user*
>>> *pwd*
>>> *protocol*  (only "imaps" supported now)
>>> *host*
>>> {color:green}Optional{color}
>>> -
>>> *folders* - comma seperated list of folders.
>>> If not specified, default folder is used. Nested folders can be specified
>>> like a/b/c
>>> *recurse* - index subfolders. Defaults to true.
>>> *exclude* - comma seperated list of patterns.
>>> *include* - comma seperated list of patterns.
>>> *batchSize* - mails to fetch at once in a given folder.
>>> Only headers can be prefetched in Javamail IMAP.
>>> *readTimeout* - defaults to 6ms
>>> *conectTimeout* - defaults to 3ms
>>> *fetchSize* - IMAP config. 32KB default
>>> *fetchMailsSince* -
>>> date/time in miliiseconds, mails received after which will be fetched.
>>> Useful for delta import.
>>> *customFilter* - class name.
>>> {code}
>>> import javax.mail.Folder;
>>> import javax.mail.SearchTerm;
>>> clz imple