192902649 commented on PR #352:
URL: https://github.com/apache/pdfbox/pull/352#issuecomment-3616079640
Hey team,
We’ve been working on advanced table recognition and structured content
extraction based on PDFTextStripper. However, in version 3.0.6, the writeLine()
method is declared private, which makes it impossible to customize the
line-level extraction logic.
Because of this restriction, we currently have no choice but to copy a large
amount of code from the original PDFTextStripper implementation just to
override internal behavior. This significantly increases maintenance cost and
reduces the value of relying on PDFBox as a stable dependency.
Would it be possible for the next release to change writeLine() (and related
internal hooks) to protected or public, so developers can extend the class
cleanly without copying internal source?
This change would make custom text-layout logic—especially table
parsing—much easier to implement, and it would greatly reduce the need to fork
or replicate code from PDFBox.
Thanks for considering this improvement.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]