[
https://issues.apache.org/jira/browse/TIKA-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836601#comment-15836601
]
Tim Allison commented on TIKA-2249:
-----------------------------------
bq. Is there a place where I can find any facts about how to identify different
elements in PDF so that they can then be converted into html format, sort of
how to implement it, how PDF stores data internally etc
Well...there's the [PDF
spec|http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf],
all 1310 pages of it. You could take a look at our {{PDFParser}},
{{PDF2XHTML}} (including {{AbstractPDF2XHTML}}), and of course
{{PDFTextStripper}}.
What, specifically, are you trying to pull out?
> Tika not able to parse tables from pdf
> ---------------------------------------
>
> Key: TIKA-2249
> URL: https://issues.apache.org/jira/browse/TIKA-2249
> Project: Tika
> Issue Type: Bug
> Components: handler
> Reporter: Amit Kumar
> Attachments: Japanese.pdf
>
>
> Tika not able to parse tables from pdf. I want to attach sample pdf which I
> tried but attachment/browse link is not visible to me.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)