192902649 commented on PR #352:
URL: https://github.com/apache/pdfbox/pull/352#issuecomment-3616079640

   Hey team,
   
   We’ve been working on advanced table recognition and structured content 
extraction based on PDFTextStripper. However, in version 3.0.6, the writeLine() 
method is declared private, which makes it impossible to customize the 
line-level extraction logic.
   
   Because of this restriction, we currently have no choice but to copy a large 
amount of code from the original PDFTextStripper implementation just to 
override internal behavior. This significantly increases maintenance cost and 
reduces the value of relying on PDFBox as a stable dependency.
   
   Would it be possible for the next release to change writeLine() (and related 
internal hooks) to protected or public, so developers can extend the class 
cleanly without copying internal source?
   
   This change would make custom text-layout logic—especially table 
parsing—much easier to implement, and it would greatly reduce the need to fork 
or replicate code from PDFBox.
   
           Thanks for considering this improvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to