[ https://issues.apache.org/jira/browse/TIKA-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045184#comment-13045184 ]
Gregory Kanevsky commented on TIKA-100: --------------------------------------- The issue with 'sortByPosition' is addressed by TIKA-612. > Structured PDF parsing > ---------------------- > > Key: TIKA-100 > URL: https://issues.apache.org/jira/browse/TIKA-100 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Jukka Zitting > Priority: Minor > > The PDF parser currently extracts and outputs document content as a single > string. PDFBox could be used to support structuring at least down to page and > paragraph (not sure how accurate) level. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira