On Tue, Feb 24, 2009 at 7:59 PM, Markus Wiederkehr <[email protected]> wrote: > On Tue, Feb 24, 2009 at 2:46 PM, Robert Burrell Donkin (JIRA) > <[email protected]> wrote: >> >> [ >> https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676270#action_12676270 >> ] >> >> Robert Burrell Donkin commented on MIME4J-118: >> ---------------------------------------------- >> >> I suspect that there may be longer term issues with this general approach >> but i think we should accept that the current proposal is good enough for >> this release. release early, release often. > > +1 on the release part but I need a few days to clean up that patch.
fine >> I think that the best way to approach is to preserve the original document >> together with boundary meta-data. In other words, that a 'Content-Type' >> header starts at byte 99 in the document rather than trying to slice up the >> document and re-assemble from lots of small byte buffers. But this is >> related to other issues which should wait until after this release so I >> think we should patch and look to ship. > > We can cross that bridge when we come to it but I don't particularly > like the idea of having to open a file, seek to position 99 and read > 50 bytes just to obtain the raw value of a Content-Type field, for > example. nio manages this quite adequately ;-) i worry about the quantity of copying and new buffers that will need to be created to store a single complex, large document when every component has to be stored as a string and also as bytes to ensure round tripping in non-compliant corner cases. i would much rather encourage users to retain the original when absolute fidelity is required. > Also please mind that Field instances may be shared between multiple > messages and they can be created from a constructor or factory without > an original document to back them up. the difficult problems with round tripping should not occur when fields are created programmatically > And last but not least with nested encodings there is no meaningful > offset into a file.. i'm not sure i agree with that IIRC in a multipart document, the mime headers must be encoded in ASCII. so, the first level headers can all be access through byte offsets. a part may contain a transfer encoded document. there are a couple of distinct cases which are interesting: when the document is an embedded message or an embedded multipart document. when this is encoded in Base64 then a bytewise offset is not available in the original stream but is from the decoded stream. so, the bytewise offset in the decoding stream can be used. this is a rare use case and though the approach would be slow in this case, it would be a rare one. - robert
