[ 
https://issues.apache.org/jira/browse/PDFBOX-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634125#comment-15634125
 ] 

John Hewson edited comment on PDFBOX-3550 at 11/3/16 8:29 PM:
--------------------------------------------------------------

> we had issues in the past with using awt font for rendering as you know. Hope 
> that won't hit us here again. I don't think that will be an issue if the font 
> used is from a TTF or so file but with embedded font programs that might be 
> the case. Usage of that is very typical for forms filling.

The issues with AWT's font renderer shouldn't have any bearing on shaping. We 
can sanitize the font files as much as we like before passing them to AWT, e.g. 
extract just the tables we need, clean them up, and put them in a new minimal 
font file. We don't even need the glyph outlines. Hopefully we won't need to do 
any of that, but the option is available to us. One thing that makes me feel 
good about AWT is that Java 9 will use Harfbuzz, so we know that the future is 
bright and won't be filled with languishing JDK bugs.

> If we can keep the implementation from the public API let's start with Java's 
> inbuilt functions (which will even get better with Harfbuzz).

Sounds good.

> I do suggest that we start with a layout model and the definition of it's 
> properties in line with the PDF spec (Rich Text for example) and other 
> established properties and terms which should allow for layout of rectangular 
> text blocks consisting of multiple paragraphs

That's the top layer of my architecture from the bullet points above, and while 
relevant, isn't really part of shaping. That's the middle layer. It's really 
possible to build anything you like on top. There's all sort of stuff we'll 
need like Bidi. But they're separate concerns.

> That should be the public API and the rest is hidden. 

But that would prevent advanced users from doing their own text layout. We need 
to support basic string drawing as we have always done, and to also allow basic 
complex string drawing. Then on top of that we can expose whatever high-level 
stuff we like, but the low-level APIs need to be there.  Obviously any 
implementation details should be kept private - which will be most of the code, 
but we will still need both high and low-level APIs.

> Although openhtmltopdf is LGPLd we might consider getting in touch with the 
> project as they have some dependencies on PDFBox and might be open to help 
> out. The same for other current projects using PDFBox for text formatting 
> purposes.

I don't see how they'd help really - they just use ICU for text layout, and 
ICU's text layout is long-dead and deprecated. If you want to know how to do 
text layout I can tell you :). Java has some pretty good APIs and documentation 
if you look at 
[TextLayout|https://docs.oracle.com/javase/tutorial/2d/text/textlayoutconcepts.html].


was (Author: jahewson):
> we had issues in the past with using awt font for rendering as you know. Hope 
> that won't hit us here again. I don't think that will be an issue if the font 
> used is from a TTF or so file but with embedded font programs that might be 
> the case. Usage of that is very typical for forms filling.

The issues with AWT's font renderer shouldn't have any bearing on shaping. We 
can sanitize the font files as much as we like before passing them to AWT, e.g. 
extract just the tables we need, clean them up, and put them in a new minimal 
font file. We don't even need the glyph outlines. Hopefully we won't need to do 
any of that, but the option is available to us. One thing that makes me feel 
good about AWT is that Java 9 will use Harfbuzz, so we know that the future is 
bright and won't be filled with languishing JDK bugs.

> If we can keep the implementation from the public API let's start with Java's 
> inbuilt functions (which will even get better with Harfbuzz).

Sounds good.

> I do suggest that we start with a layout model and the definition of it's 
> properties in line with the PDF spec (Rich Text for example) and other 
> established properties and terms which should allow for layout of rectangular 
> text blocks consisting of multiple paragraphs

That's the top layer of my architecture from the bullet points above, and while 
relevant, isn't really part of shaping. That's the middle layer. It's really 
possible to build anything you like on top. There's all sort of stuff we'll 
need like Bidi. But they're separate concerns.

> That should be the public API and the rest is hidden. 

But that would prevent advanced users from doing their own text layout. We need 
to support basic string drawing as we have always done, and to also allow basic 
complex string drawing. Then on top of that we can expose whatever high-level 
stuff we like, but the low-level APIs need to be there.  Obviously any 
implementation details should be kept private - which will be most of the code, 
but we will still need both high and low-level APIs.

> Although openhtmltopdf is LGPLd we might consider getting in touch with the 
> project as they have some dependencies on PDFBox and might be open to help 
> out. The same for other current projects using PDFBox for text formatting 
> purposes.

I don't see how they'd help really - they just use ICU for text layout, and 
ICU's text layout is long-dead and deprecated. If you want to know how to do 
text layout I can tell you :)

> OpenType Shaping
> ----------------
>
>                 Key: PDFBOX-3550
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3550
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: FontBox, PDModel
>         Environment: All
>            Reporter: Omid Pourhadi
>              Labels: unicode
>         Attachments: BYekan.ttf
>
>
> the problem is, in some languages letters need to be joined together for 
> example, consider this word 
> {color:red}
> سلام 
> {color}
> but after creating a pdf it contorts to 
> {color:red}
> س‌ل‌ام
> {color}
> with extra semi-spaces. I think this is a bug in pdfbox and definetly is not 
> related to font.
> {code:title=SampleCode.java|borderStyle=solid}
> public class SampleCode
> {
>     public static void main(String[] args) throws IOException
>     {
>         
>         PDDocument document = new PDDocument();
>       //this font perfectly works in iText and JasperReport with the same text
>         PDFont titleFont = PDType0Font.load(document, 
> SampleCode.class.getResourceAsStream("/BYekan.ttf"));
>         PDPage page = new PDPage(PDRectangle.A4);
>         document.addPage(page);
>         PDPageContentStream contentStream = new PDPageContentStream(document, 
> page);
>         contentStream.beginText();
>         contentStream.setFont(titleFont, 12);
>         contentStream.newLineAtOffset(0, 100);
>         contentStream.showText("سلام");
>         contentStream.endText();
>         contentStream.close();
>         
>       
>         document.save(new File("/home/omidp/temp/htmltopdf/output.pdf"));
>         document.close();
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to