[jira] [Commented] (TIKA-2303) PDFParser with optional bookmarks text extraction

ASF GitHub Bot (JIRA) Thu, 16 Mar 2017 10:27:11 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928470#comment-15928470
 ]


ASF GitHub Bot commented on TIKA-2303:
--------------------------------------

ppalazon opened a new pull request #157: Fix for TIKA-2303 contributed by 
ppalazon.
URL: https://github.com/apache/tika/pull/157
 
 
   Added a new option parameter on PDFParserConfig for
   extract bookmarks from a PDF. Its name is extractBookmarksText.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> PDFParser with optional bookmarks text extraction
> -------------------------------------------------
>
>                 Key: TIKA-2303
>                 URL: https://issues.apache.org/jira/browse/TIKA-2303
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.14
>            Reporter: Pablo Palazon
>              Labels: option, parser, pdf
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> I would like to parse an PDF without extract its bookmarks and outlines.
> I was thinking about create a new PDFParser parameter in PDFParserConfig with 
> a option such as 'ExtractBookmarks'. And check it out on 'AbstractPDF2XHTML'
> I can do it, and I would like to present you a patch with this change.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (TIKA-2303) PDFParser with optional bookmarks text extraction

Reply via email to