[ https://issues.apache.org/jira/browse/TIKA-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074303#comment-16074303 ]
Boopathi commented on TIKA-2403: -------------------------------- Thanks you so much for the help. Just curious to know why it has been designed in such a way to parse bookmark names too. Just want to understand business use case. Otherwise this issue can be closed. Thanks so much for the help again. > Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue > ------------------------------------------------------- > > Key: TIKA-2403 > URL: https://issues.apache.org/jira/browse/TIKA-2403 > Project: Tika > Issue Type: Bug > Reporter: Boopathi > Attachments: SampleDocument.pdf > > > We are using Elasticsearch 5.2.2 for Full text search. With the help of > ingest node we are able to parse the content of files which tika supports. We > are facing some issue while parsing the content of some PDF files . It parsed > the content of file successfully and in addition to that some additional > terms which is not even the content of that document. [sample screen > shot|https://www.screencast.com/t/AQWK9Rzvrdo8]. Kindly let me know what is > reason for this and how can it be fixed -- This message was sent by Atlassian JIRA (v6.4.14#64029)