[ 
https://issues.apache.org/jira/browse/PDFBOX-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1555:
--------------------------------

    Component/s:     (was: Swing GUI)
                 Parsing

> Javascript at the end of the PDF document fails parsing
> -------------------------------------------------------
>
>                 Key: PDFBOX-1555
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1555
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.0
>            Reporter: Jinder Aujla
>         Attachments: 
> 0001-MA-1981-Analyzer-Production-heitman.com-PDF-attachme.patch, 
> 0002-MA-1981-Analyzer-Production-heitman.com-PDF-attachme.patch
>
>
> Hi
> I was investigating a failure to parse and debugging the pdfbox code when I 
> noticed in the PDF document that I can't forward at the end of the file this:
> %%EOF^M
> ^M
> ^M
> <script type="text/javascript">^M
> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl."; : 
> "http://www.";);^M
> document.write(unescape("%3Cscript src='" + gaJsHost + 
> "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));^M
> </script>^M
> <script type="text/javascript">^M
> try {^M
> var pageTracker = _gat._getTracker("UA-7429935-1");^M
> pageTracker._trackPageview();^M
> } catch(err) {}</script>^M
> ^M
> ^M
> So the document ends.. but there is more content.. basically some javascript. 
> What the parser does is it gets to 
> line 492 in org.apache.pdfbox.pdfparser.PDFParser
> isEndOfFile get's set to true, but because it's not the end of the actual 
> stream.. it continues this was a fix in PDFBOX-979.
> Next time around in the loop it reads
> <script type="text/javascript">
> which I think it ignores.. then trys to read 
> var
> twice as a number. Then blows up.. so I've playing around thinking of 
> sensible thing to do. But worried that I might introduce some other issue. I 
> assume this is legal structure for a PDFDocument. It opens fine in a viewer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to