Lenient parsing of Mailadresses should be a little more lenient
---------------------------------------------------------------

                 Key: MIME4J-196
                 URL: https://issues.apache.org/jira/browse/MIME4J-196
             Project: JAMES Mime4j
          Issue Type: Wish
          Components: parser (core)
            Reporter: Jens Wilmer
            Priority: Trivial


Parsing a mailaddress as in https://issues.apache.org/jira/browse/MIME4J-31 
results in a ParseException. Parsing a mailaddress starting with a dot (.) 
results in a ParseException.
When parsing an addressfield with multiple adresses, the Exception occuring 
while parsing a single address is caught and null is returned as the resulting 
addresslist. (this breaks tika as it expects an empty list rather than null)

It would be nice if invalid addresses would be handled more gracefully when in 
lenient mode. And it would be nice if at least the correct addresses would be 
returned while parsing an addresslist with a corrupted address.


I am using Mime4J via the Apache Tika project to extract text from emails for 
indexing in Lucene. The textstream of tika is directly read by a lucene field 
and indexing fails if an exception is thrown by Mime4J. This currently happens 
every time a headerfield contains more than 1000 characters due to tika using 
the unusable mime4j standardconfiguration ( 
https://issues.apache.org/jira/browse/TIKA-640 ), and every time a malformed 
emailaddress is encountered ( https://issues.apache.org/jira/browse/TIKA-641 ). 

These problems can be taken care of in Tika, but there is no way for Tika to 
retrieve the working mailaddresses out of a list, if Mime4j returns only none; 
maybe this problem could be addressed in Mime4J.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to