[ https://issues.apache.org/jira/browse/TIKA-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tyler Palsulich closed TIKA-100. -------------------------------- Resolution: Fixed > Structured PDF parsing > ---------------------- > > Key: TIKA-100 > URL: https://issues.apache.org/jira/browse/TIKA-100 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Jukka Zitting > Priority: Minor > > The PDF parser currently extracts and outputs document content as a single > string. PDFBox could be used to support structuring at least down to page and > paragraph (not sure how accurate) level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)