[
https://issues.apache.org/jira/browse/PDFBOX-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576372#comment-17576372
]
Andreas Lehmkühler commented on PDFBOX-5486:
--------------------------------------------
[~tilman] That is still correct. The origin idea of the on demand parsing of
glyph data was to minimize the time to load huge external fonts if just some of
the glyphs are needed for rendering, see PDFBOX-2303
PDFBox 2.0.x doesn't close true type fonts until the corresponding pdf document
is closed or the finalizer closes them. Furthermore some external fonts are
cached and PDFBox keeps them open until the JVM is terminated. Some of the font
data is copied to ScratchFileBuffers and depending on the configuration ends up
in the memory.
In the current trunk some of the caches are removed and the font data is copied
to memory before parsing it. In many cases the input data of the parser is
closed after parsing it so that the memory is released. Just the data for the
glyph data is chached in memory so that the on demand creation of the glyphs
still works.
True type fonts keep the origin data of the font if they are embedded and
aren't closed until the corresponding pdf is closed. The font embedding code
needs the origin data. That is on my TODO list for 4.0.x or later as well.
Some parts of the code have concurrent caching mechanisms and those doesn't
complement one another but may be sometimes counterproductive. At least the
code is hard to maintain.
I guess everybody knows my opinion about that ;-) Let us remove such
constructs, simplify and see where it ends. If there is really need for some
caching we might reimplement something new which suits better to the structures
we have. It is not a new and of course not my finding that every now or then a
refactoring is a good idea needed to break up old structures.
> "RandomAccessBuffer already closed" when opening smaller fonts
> --------------------------------------------------------------
>
> Key: PDFBOX-5486
> URL: https://issues.apache.org/jira/browse/PDFBOX-5486
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 3.0.0 PDFBox
> Reporter: Tilman Hausherr
> Assignee: Andreas Lehmkühler
> Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> I wonder if this is related to one of the memory management / inputstream
> changes, PDTrueTypeFont.load() can't load smaller ttf fonts (I discovered
> this while working with the font from PDFBOX-5484):
> {code}
> public static void main(String[] args) throws IOException
> {
> File fontDir = new File("C:/windows/fonts");
> File[] files = fontDir.listFiles((File dir, String name) ->
> name.toLowerCase().endsWith(".ttf"));
> for (File file : files)
> {
> PDDocument doc = new PDDocument();
> PDTrueTypeFont ttf = PDTrueTypeFont.load(doc, file,
> WinAnsiEncoding.INSTANCE);
> if (ttf.hasGlyph("A"))
> {
> try
> {
> ttf.getPath("A");
> }
> catch (IOException ex)
> {
> System.out.println("font " + ttf.getName() + " failed,
> size: " + file.length() +
> ", glyphs: " +
> ttf.getTrueTypeFont().getNumberOfGlyphs() + ": " + ex.getMessage());
> ex.printStackTrace();
> }
> }
> }
> }
> {code}
> {noformat}
> font BookAntiqua-Bold failed, size: 151000, glyphs: 669: RandomAccessBuffer
> already closed
> java.io.IOException: RandomAccessBuffer already closed
> at
> org.apache.pdfbox.io.RandomAccessReadBuffer.checkClosed(RandomAccessReadBuffer.java:337)
> at
> org.apache.pdfbox.io.RandomAccessReadBuffer.getPosition(RandomAccessReadBuffer.java:188)
> at
> org.apache.fontbox.ttf.RandomAccessReadDataStream.getCurrentPosition(RandomAccessReadDataStream.java:80)
> at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:135)
> at
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getPath(PDTrueTypeFont.java:498)
>
> {noformat}
> It does not happen with larger fonts, e.g. Arial.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]