Thomas Fricker created MIME4J-316:
-------------------------------------
Summary: Parts missing in case of a specific combination of
boundaries
Key: MIME4J-316
URL: https://issues.apache.org/jira/browse/MIME4J-316
Project: James Mime4j
Issue Type: Bug
Components: parser (core)
Affects Versions: 0.7.2
Reporter: Thomas Fricker
The problem can be reproduced by parsing a very specific email structure, where
an inner boundary matches the name of another outer boundary followed by a "-"
character.
In the following example, the attached pdf file will be ignored by the parser.
{code:java}
Content-Type: multipart/mixed;
boundary="--boundary.1652331600846930886"
----boundary.1652331600846930886
Content-Type: multipart/alternative;
boundary="--boundary.1652331600846930886-1"
----boundary.1652331600846930886-1
Content-Type: text/plain; charset=utf-8
sometext
----boundary.1652331600846930886-1
Content-Type: text/html; charset=utf-8
<html lang="en">
<body>
</body>
</html>
----boundary.1652331600846930886-1--
----boundary.1652331600846930886
Content-Type: application/pdf;
name="test.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: Attachment;
filename="test.pdf"
JVBERi0xLj4Kc3RhcnR4cmVmCjUzNjEwCiUlRU9GCgshortened==
----boundary.1652331600846930886--
{code}
Dumping the EntityState during parsing produces
{code:java}
State: T_START_MULTIPART
State: T_PREAMBLE
State: T_END_MULTIPART
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/plain; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 43][limit:
103]....]], header data = [mimeType=text/plain, mediaType=text, subType=plain,
boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/html; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 42][limit:
313]], header data = [mimeType=text/html, mediaType=text, subType=html,
boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_EPILOGUE
State: T_END_MULTIPART
State: T_END_MESSAGE {code}
The PDF attachment is missing.
I proposed following fix : [https://github.com/apache/james-mime4j/pull/71]
which produces following structure:
{code:java}
State: T_START_MULTIPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: multipart/alternative;
boundary="--boundary.1652331600846930886-1"
State: T_END_HEADER
Multipart message detexted, header data = [mimeType=multipart/alternative,
mediaType=multipart, subType=alternative,
boundary=--boundary.1652331600846930886-1, charset=null]
State: T_START_MULTIPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/plain; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 43][limit:
103]], header data = [mimeType=text/plain, mediaType=text, subType=plain,
boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/html; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 42][limit:
313]]], header data = [mimeType=text/html, mediaType=text, subType=html,
boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_END_MULTIPART
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: application/pdf;
name="Daily_Stats-2022-05-12-0700.pdf"
Header field detected: Content-Transfer-Encoding: base64
Header field detected: Content-Disposition: Attachment;
filename="Daily_Stats-2022-05-12-0700.pdf"
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 189][limit:
235][JVBERi0xLj4Kc3RhcnR4cmVmCjUzNjEwCiUlRU9GCg==
]], header data = [mimeType=application/pdf, mediaType=application,
subType=pdf, boundary=null, charset=null]
State: T_END_BODYPART
State: T_END_MULTIPART
State: T_END_MESSAGE {code}
I shortened the output of the body parts.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)