[
https://issues.apache.org/jira/browse/MIME4J-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022770#comment-13022770
]
Oleg Kalnichevski commented on MIME4J-116:
------------------------------------------
It appears that this issue can only be resolved by moving Field and FieldPaser
interfaces from DOM to Core. If we want to keep a very strict separation of
responsibilities between Core and DOM (Core deals with RawFields only whereas
DOM is responsible for parsing raw fields into complex structured fields)
_some_ duplication of field parsing seems unavoidable. Core parser needs
content-type, content-transfer-encoding, charset and boundary bits in order to
be able to decode mime entities. This can also lead to potential
inconsistencies in handling of malformed fields (as the one recently reported
by Stefano): the default message builder may succeed in building an object
model for a particular message, but the default message formatter may fail when
serialising the very same model, because some fields get re-parsed using a
stricter routine.
If we did move Field and FieldPaser interfaces to Core, however, not only could
we avoid duplicate parsing of some headers, but we could also potentially
simplify the API by getting rid of RawField class and potentially
Maximal/DefaultBodyDescriptors. All fields would get a parser assigned to them
as soon as they are read from the MIME stream and would only need to be parsed
once when accessed (if at all). Body descriptors could also be built lazily
from properties of individual fields. They would no longer be a reason for
having reduced (default) body descriptors and maximal ones.
If I hear no objections, I'll go ahead and experiment with the idea of moving
field parsing interfaces to Core.
Oleg
> Avoid duplicate parsing of header fields
> ----------------------------------------
>
> Key: MIME4J-116
> URL: https://issues.apache.org/jira/browse/MIME4J-116
> Project: JAMES Mime4j
> Issue Type: Improvement
> Affects Versions: 0.6
> Reporter: Markus Wiederkehr
> Fix For: 0.7
>
>
> Currently some header fields are parsed twice when building a DOM. Once by
> DefaultBodyDescriptor or MaximalBodyDescriptor and a second time by
> MessageBuilder using Field.parse().
> Also different parsers are used in both stages. The body descriptors use
> handcrafted parsers whereas Field.parse uses JavaCC generated parsers. The
> handcrafted version does not seem to handle comments in a header correctly.
> The situation should be improved by parsing a header field only once and
> passing that already parsed field to a content handler. Also only one sort of
> field parser should be used; either handcrafted or generated. My personal
> opinion is that it might be easier for a handcrafted parser to be more
> tolerant against malformed header fields.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira