[ 
https://issues.apache.org/jira/browse/PDFBOX-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075254#comment-15075254
 ] 

Tilman Hausherr commented on PDFBOX-3177:
-----------------------------------------

Why not overwrite writeString instead?
{code}
protected void writeString(String text, List<TextPosition> textPositions) 
throws IOException
{code}

> Change some modifiers from private to protected in PDFTextStripper Class
> ------------------------------------------------------------------------
>
>                 Key: PDFBOX-3177
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3177
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>    Affects Versions: 1.8.10
>         Environment: All
>            Reporter: Praveer
>             Fix For: 1.8.10
>
>
> Hi,
> I am parsing a very complicated PDF for which text extraction is not in 
> proper sequence, so I had to enable setSortByPosition = True.
> Now I want to access each TextPosition element and do some processing with 
> them, normally i would override processTextPosition method and do my stuff 
> there, But since I have enabled setSortByPosition, the code that sorts before 
> extracting text is invoked after processTextPosition, so I can not override 
> processTextPosition to get text according to their position.
> I did some research and found that overriding writeLine method of 
> PDFTextStripper can be useful for me
> because it processes each TextPosition after they are sorted according to 
> their position.
> So I have done a POC in my personal computer by doing following changes in 
> PDFTextStripper class
> 1  - 'private' void writeLine() changed to 'protected'
> 2 -  'private' static final class WordWithTextPositions changed to 
> 'protected' 
> After this everything works as per my expectation, I think these changes are 
> also going to help other people who use this library.
> I can contribute this code myself, if you suggest, let me know, thanks and 
> regards
> Praveer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to