Hello Toshiaki,

Thank you, I've committed that code as well.

Tilman


On 07.05.2024 16:15, Toshiaki Ito wrote:
Hi,

Additional suggestions.

throw new IllegalStateException(
"could not find the glyphId for the character: " + codePoint);
This part, before the fix, was outputting the character that caused the error.
After the fix, however, the code point value was output, making it
difficult to understand the cause.
Therefore, we made a change to get the actual character from the code
point and output it.

I also created a test (assumed to be added to TestFontEmbedding.java).
LiberationSans-Regular.ttf does not contain Japanese characters, and
we are checking for exceptions and output of expected messages.


"あ" -> Character.isBmpCodePoint() == true
"𩸽" -> Character.isValidCodePoint() == true


**** update code  PDAbstractContentStream.java  applyGSUBRules ****

             int glyphId = cmapLookup.getGlyphId(codePoint);
             if (glyphId <= 0)
             {
                 String source;
                 if (Character.isBmpCodePoint(codePoint))
                 {
                    source = String.valueOf((char) codePoint);
                 }
                 else if (Character.isValidCodePoint(codePoint))
                 {
                    source = new String(new int[]{codePoint},0,1);
                 }
                 else
                 {
                     source = "?";
                 }
                 throw new IllegalStateException(
                         "could not find the glyphId for the character:
" + source);
             }
             originalGlyphIds.add(glyphId);


**** Unit Test ****

     @Test
     void testSurrogatePairCharacterExceptionIsBmpCodePoint() throws IOException
     {
         final String message = "あ";

         try (PDDocument doc = new PDDocument())
         {
             PDPage page = new PDPage();
             doc.addPage(page);
             PDFont font = PDType0Font.load(doc,
this.getClass().getResourceAsStream("/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"));

             try (PDPageContentStream contents = new
PDPageContentStream(doc, page))
             {
                 contents.beginText();
                 contents.setFont(font, 64);
                 contents.newLineAtOffset(100, 700);
                 contents.showText(message);
                 contents.endText();
             }

             fail();
         }
         catch (IllegalStateException e)
         {
             assertEquals("could not find the glyphId for the
character: あ", e.getMessage());
         }
         catch (Exception e)
         {
             fail();
         }
     }

     @Test
     void testSurrogatePairCharacterExceptionIsValidCodePoint() throws
IOException
     {
         final String message = "𩸽";
         try (PDDocument doc = new PDDocument())
         {
             PDPage page = new PDPage();
             doc.addPage(page);
             PDFont font = PDType0Font.load(doc,
this.getClass().getResourceAsStream("/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"));

             try (PDPageContentStream contents = new
PDPageContentStream(doc, page))
             {
                 contents.beginText();
                 contents.setFont(font, 64);
                 contents.newLineAtOffset(100, 700);
                 contents.showText(message);
                 contents.endText();
             }

             fail();
         }
         catch (IllegalStateException e)
         {
             assertEquals("could not find the glyphId for the
character: 𩸽" ,e.getMessage());
         }
         catch (Exception e)
         {
             fail();
         }
     }

2024年5月5日(日) 18:00 Toshiaki Ito <evolut...@1024kb.cx>:
Hi, Tilman.

I used the snapshot "3.0.3-20240505.072852-59" and got the expected results!
I also tried a few other Kanji characters besides "𩸽" and none of
them had any problems!

I am glad I could contribute :)

2024年5月5日(日) 16:32 Tilman Hausherr <thaush...@t-online.de>:
Hello Toshiaki,

It's been committed and available as a snapshot:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/

I've also added a test for the 2.0 version to avoid we break this in the
future.

Thanks again
Tilman

On 04.05.2024 22:06, Toshiaki Ito wrote:
Hi, Tilman.

Thank you for checking and correcting the attached code.
I look forward to waiting for it to be committed!

2024年5月5日(日) 2:05 Tilman Hausherr<thaush...@t-online.de>:
Hello,

I can confirm that your proposed change works, it also passes the
"private" tests that aren't in the repository. Thank you so much in
solving this! I'll commit these soon (probably tomorrow) and will report
it here. Another (smaller) good news is that one of the fonts we use for
tests (ipafont) has the glyph, I have prepared a small test also based
on your code.

Tilman

On 04.05.2024 16:39, Tilman Hausherr wrote:
On 04.05.2024 15:21, Toshiaki Ito wrote:
By the way, with pdbox 2.0.31, the same code produces the expected
output.
Ouch, I can confirm that. I have created a new ticket:

https://issues.apache.org/jira/browse/PDFBOX-5812

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail:users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:users-h...@pdfbox.apache.org



--
Toshiaki Ito
Mail: evolut...@1024kb.cx




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to