Re: [jira] Commented: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Robert Burrell Donkin Tue, 24 Feb 2009 13:24:15 -0800

On Tue, Feb 24, 2009 at 7:59 PM, Markus Wiederkehr
<[email protected]> wrote:
> On Tue, Feb 24, 2009 at 2:46 PM, Robert Burrell Donkin (JIRA)
> <[email protected]> wrote:
>>
>>    [ 
>> https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676270#action_12676270
>>  ]
>>
>> Robert Burrell Donkin commented on MIME4J-118:
>> ----------------------------------------------
>>
>> I suspect that there may be longer term issues with this general approach 
>> but i think we should accept that the current proposal is good enough for 
>> this release. release early, release often.
>
> +1 on the release part but I need a few days to clean up that patch.


fine

>> I think that the best way to approach is to preserve the original document 
>> together with boundary meta-data. In other words, that a 'Content-Type' 
>> header starts at byte 99 in the document rather than trying to slice up the 
>> document and re-assemble from lots of small byte buffers. But this is 
>> related to other issues which should wait until after this release so I 
>> think we should patch and look to ship.
>
> We can cross that bridge when we come to it but I don't particularly
> like the idea of having to open a file, seek to position 99 and read
> 50 bytes just to obtain the raw value of a Content-Type field, for
> example.

nio manages this quite adequately ;-)

i worry about the quantity of copying and new buffers that will need
to be created to store a single complex, large document when every
component has to be stored as a string and also as bytes to ensure
round tripping in non-compliant corner cases. i would much rather
encourage users to retain the original when absolute fidelity is
required.

> Also please mind that Field instances may be shared between multiple
> messages and they can be created from a constructor or factory without
> an original document to back them up.

the difficult problems with round tripping should not occur when
fields are created programmatically

> And last but not least with nested encodings there is no meaningful
> offset into a file..

i'm not sure i agree with that

IIRC in a multipart document, the mime headers must be encoded in
ASCII. so, the first level headers can all be access through byte
offsets. a part may contain a transfer encoded document. there are a
couple of distinct cases which are interesting: when the document is
an embedded message or an embedded multipart document. when this is
encoded in Base64 then a bytewise offset is not available in the
original stream but is from the decoded stream. so, the bytewise
offset in the decoding stream can be used. this is a rare use case and
though the approach would be slow in this case, it would be a rare
one.

- robert

Re: [jira] Commented: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Reply via email to