[
https://issues.apache.org/jira/browse/PDFBOX-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934744#comment-13934744
]
Maruan Sahyoun commented on PDFBOX-1987:
----------------------------------------
I attached a version of a PDF lexer together with a set of tests and some
helper classes which extend RandomAccessRead to be able to read test data from
strings for easier testing.
The purpose is that people who are interested - and have a better programming
background - can inspect and comment on the code.
An are which I kept out is how to handle malformed tokens such as strings which
have an unbalanced number of parenthesis. For a relaxed processing such errors
should be fixed. For a strict processing such errors should be reported and
potentially fixed as the process shouldn’t stop with the first error.
The current idea I have in mind is that the lexer throws events in such cases
which a parser could listen and react upon. Again looking for comments and
ideas on this.
> Provide a PDF Lexer as a base for PDF parsing
> ---------------------------------------------
>
> Key: PDFBOX-1987
> URL: https://issues.apache.org/jira/browse/PDFBOX-1987
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Reporter: Maruan Sahyoun
> Priority: Minor
> Fix For: 2.0.0
>
> Attachments: src.zip
>
>
> In order to enhance the parsing process and as a foundation for a combination
> of the different parsers a PDF lexer should be provided.
--
This message was sent by Atlassian JIRA
(v6.2#6252)