[jira] [Commented] (TIKA-2249) Tika not able to parse tables from pdf

Tim Allison (JIRA) Tue, 24 Jan 2017 12:40:48 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836601#comment-15836601
 ]


Tim Allison commented on TIKA-2249:
-----------------------------------

bq. Is there a place where I can find any facts about how to identify different 
elements in PDF so that they can then be converted into html format, sort of 
how to implement it, how PDF stores data internally etc

Well...there's the [PDF 
spec|http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf],
 all 1310 pages of it.  You could take a look at our {{PDFParser}}, 
{{PDF2XHTML}} (including {{AbstractPDF2XHTML}}), and of course 
{{PDFTextStripper}}.  

What, specifically, are you trying to pull out?

> Tika not able to parse tables from pdf 
> ---------------------------------------
>
>                 Key: TIKA-2249
>                 URL: https://issues.apache.org/jira/browse/TIKA-2249
>             Project: Tika
>          Issue Type: Bug
>          Components: handler
>            Reporter: Amit Kumar
>         Attachments: Japanese.pdf
>
>
> Tika not able to parse tables from pdf. I want to attach sample pdf which I 
> tried but attachment/browse link is not visible to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-2249) Tika not able to parse tables from pdf

Reply via email to