[jira] [Commented] (TIKA-2493) Allow Extraction of Javascript from PDFs

Rahul Veeramalla (JIRA) Mon, 06 Nov 2017 07:08:58 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240420#comment-16240420
 ]


Rahul Veeramalla commented on TIKA-2493:
----------------------------------------

Thanks for the swift response.
Based on the changes in TIKA-2090, I tried changing the PDFparser config and 
setExtractActions() to true. This helps me get the javascript elements along 
with other data in the PDF like text and fields.
I need to extract only JavaScript elements or at least separate it out from the 
rest of the data in the PDF.
Is there any other method or call or an API or a separate handler which I can 
use?


> Allow Extraction of Javascript from PDFs
> ----------------------------------------
>
>                 Key: TIKA-2493
>                 URL: https://issues.apache.org/jira/browse/TIKA-2493
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Rahul Veeramalla
>            Priority: Blocker
>
> I have a use case wherein I need to upload PDFs as part of a File Upload 
> feature that I am currently building for my application. Based on Security 
> teams recommendation, I need to scan the PDFs for any embedded Javascript, 
> attachments and links contained in them and block such PDFs.
> I was able to figure out the solution to extract hyperlinks and attachments 
> from the PDF using TIKA.
> However, I am unable to find anything to extract javascript from PDFs.
> **I need help to figure out if a PDF contains Javascript elements/code or not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TIKA-2493) Allow Extraction of Javascript from PDFs

Reply via email to