[ https://issues.apache.org/jira/browse/TIKA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960943#comment-15960943 ]
Wing-Hong Andrew Ko commented on TIKA-2044: ------------------------------------------- Hello Luis! Thanks for the explanation! Is there an easy way for one to attach a different EmbeddedDocumentExtractor for mbox files vs pst files, or am I supposed to register a single EmbeddedDocumentExtractor instance and do branching logic internally in the parseEmbedded method based on e.g. metadata.get(Metadata.CONTENT_TYPE)? Submitted the [PR|https://github.com/apache/tika/pull/166 ] with a refactor and unit tests. Cheers, Andrew > MboxParser wrongly concatenates multiple text lines into single header line > --------------------------------------------------------------------------- > > Key: TIKA-2044 > URL: https://issues.apache.org/jira/browse/TIKA-2044 > Project: Tika > Issue Type: Bug > Affects Versions: 1.13 > Environment: Tika 1.13, and 1.14 nightly build at the time of this > writing > Reporter: Vjeran Marcinko > > MboxParser combines multiple text lines into single header value by > (suposedly) using LIFO structure (stack - java deque), but instead it uses > FIFO (queue) to fetch last inserted line and to extend it with current line > in incorrect way: > Current code: > Queue<String> multiline = new LinkedList<String>(); > ... few lines below... > String latestLine = multiline.poll(); > Whereas it should be: > Deque<String> multiline = new LinkedList<String>(); > ... few lines below... > String latestLine = multiline.pollLast(); -- This message was sent by Atlassian JIRA (v6.3.15#6346)