[jira] [Commented] (TIKA-2403) Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue

2017-06-29 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068557#comment-16068557
 ] 

Tim Allison commented on TIKA-2403:
---

Thank you for the ping.  Are you able to share the triggering document with us? 
 If not publicly, can you send it to me privately.  If that won't work, we'll 
try to figure out some other means of figuring this out.

> Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue
> ---
>
> Key: TIKA-2403
> URL: https://issues.apache.org/jira/browse/TIKA-2403
> Project: Tika
>  Issue Type: Bug
>Reporter: Boopathi
>
> We are using Elasticsearch 5.2.2  for Full text search. With the help of 
> ingest node we are able to parse the content of files which tika supports. We 
> are facing some issue while parsing the content of some PDF files . It parsed 
> the content of file successfully and in addition to that some additional 
> terms which is not even the content of that document. [sample screen 
> shot|https://www.screencast.com/t/AQWK9Rzvrdo8]. Kindly let me know what is 
> reason for this and how can it be fixed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2403) Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue

2017-07-03 Thread Boopathi (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072355#comment-16072355
 ] 

Boopathi commented on TIKA-2403:


Hope you have received the file

> Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue
> ---
>
> Key: TIKA-2403
> URL: https://issues.apache.org/jira/browse/TIKA-2403
> Project: Tika
>  Issue Type: Bug
>Reporter: Boopathi
> Attachments: SampleDocument.pdf
>
>
> We are using Elasticsearch 5.2.2  for Full text search. With the help of 
> ingest node we are able to parse the content of files which tika supports. We 
> are facing some issue while parsing the content of some PDF files . It parsed 
> the content of file successfully and in addition to that some additional 
> terms which is not even the content of that document. [sample screen 
> shot|https://www.screencast.com/t/AQWK9Rzvrdo8]. Kindly let me know what is 
> reason for this and how can it be fixed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2403) Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue

2017-07-03 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072483#comment-16072483
 ] 

Tim Allison commented on TIKA-2403:
---

Y.  Thank you.  Sorry for the delay.  The text you don't want comes from the 
PDF's bookmarks.  You can turn this off with a tika-config.xml...see: 
https://wiki.apache.org/tika/TikaConfig

I'm not sure how to specify the tika-config.xml in ES, but I would hope that 
that is straightforward.

Let us know if you have any other questions.



> Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue
> ---
>
> Key: TIKA-2403
> URL: https://issues.apache.org/jira/browse/TIKA-2403
> Project: Tika
>  Issue Type: Bug
>Reporter: Boopathi
> Attachments: SampleDocument.pdf
>
>
> We are using Elasticsearch 5.2.2  for Full text search. With the help of 
> ingest node we are able to parse the content of files which tika supports. We 
> are facing some issue while parsing the content of some PDF files . It parsed 
> the content of file successfully and in addition to that some additional 
> terms which is not even the content of that document. [sample screen 
> shot|https://www.screencast.com/t/AQWK9Rzvrdo8]. Kindly let me know what is 
> reason for this and how can it be fixed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2403) Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue

2017-07-04 Thread Boopathi (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074303#comment-16074303
 ] 

Boopathi commented on TIKA-2403:


Thanks you so much for the help. Just curious to know why it has been designed 
in such a way to parse bookmark names too. Just want to understand business use 
case. Otherwise this issue can be closed. Thanks so much for the help again.

> Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue
> ---
>
> Key: TIKA-2403
> URL: https://issues.apache.org/jira/browse/TIKA-2403
> Project: Tika
>  Issue Type: Bug
>Reporter: Boopathi
> Attachments: SampleDocument.pdf
>
>
> We are using Elasticsearch 5.2.2  for Full text search. With the help of 
> ingest node we are able to parse the content of files which tika supports. We 
> are facing some issue while parsing the content of some PDF files . It parsed 
> the content of file successfully and in addition to that some additional 
> terms which is not even the content of that document. [sample screen 
> shot|https://www.screencast.com/t/AQWK9Rzvrdo8]. Kindly let me know what is 
> reason for this and how can it be fixed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2403) Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue

2017-07-05 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074584#comment-16074584
 ] 

Tim Allison commented on TIKA-2403:
---

Some users want everything.  Some want only the visible parts.

Let us know if you have any other questions/issues.

> Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue
> ---
>
> Key: TIKA-2403
> URL: https://issues.apache.org/jira/browse/TIKA-2403
> Project: Tika
>  Issue Type: Bug
>Reporter: Boopathi
> Attachments: SampleDocument.pdf
>
>
> We are using Elasticsearch 5.2.2  for Full text search. With the help of 
> ingest node we are able to parse the content of files which tika supports. We 
> are facing some issue while parsing the content of some PDF files . It parsed 
> the content of file successfully and in addition to that some additional 
> terms which is not even the content of that document. [sample screen 
> shot|https://www.screencast.com/t/AQWK9Rzvrdo8]. Kindly let me know what is 
> reason for this and how can it be fixed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)