[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914 ] John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:45 AM: -- {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code we have for forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be the list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. Maybe layout() should be called shapeText() to emphasize this distinction? was (Author: jahewson): {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code we have for forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. Maybe layout() should be called shapeText() to emphasize this distinction? > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914 ] John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:44 AM: -- {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code we have for forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. Maybe layout() should be called shapeText() to emphasize this distinction? was (Author: jahewson): {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. Maybe layout() should be called shapeText() to emphasize this distinction? > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914 ] John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:41 AM: -- {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. Maybe layout() should be called shapeText() to emphasize this distinction? was (Author: jahewson): {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914 ] John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:40 AM: -- {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. was (Author: jahewson): For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt bq. here BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914 ] John Hewson commented on PDFBOX-4189: - For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt bq. here BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4188) "Maximum allowed scratch file memory exceeded." Exception when merging large number of small PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438699#comment-16438699 ] Maruan Sahyoun commented on PDFBOX-4188: [~gary.potagal] I've taken a quick look at the patch and would like to discuss some topics - PDFMergerUtility was using {{MemoryUsageSetting getPartitionedCopy}} where now the setting is passed on for each PDDocument and is no longer partitioned. So although the value used for {{MemoryUsageSetting}} is much lower now isn't that at the end the same result? - I haven't understood the main benefit of the changes done to {{MemoryUsageSetting}} and {{ScratchFile}}. What is the reason for these? - I think the patch should be divided in two parts - the changes to {{MemoryUsageSetting}} / {{ScratchFile}} and the changes to PDFMerger with test cases to show the improvements for each. - Do you see a benefit in using {{MappedByteBuffer}} - the handling of openAction doesn't belong into this patch. It should be part of a new issue. - the code doesn't follow the coding conventions https://pdfbox.apache.org/codingconventions.html so there is some effort to bring it in line with these. (I think that this section might be difficult to find on our website - any suggestions to make it easier to find the information is highly appreciated) Many of the questions are because this part of PDFBox is something I rarely touch - so I hope you're a little patient with me. > "Maximum allowed scratch file memory exceeded." Exception when merging large > number of small PDFs > -- > > Key: PDFBOX-4188 > URL: https://issues.apache.org/jira/browse/PDFBOX-4188 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.9, 3.0.0 PDFBox >Reporter: Gary Potagal >Priority: Major > Attachments: PDFBOX-4188-MemoryManagerPatch.zip, > PDFBOX-4188-breakingTest.zip, PDFMergerUtility.java-20180412.patch > > > > Am 06.04.2018 um 23:10 schrieb Gary Potagal: > > We wanted to address one more merge issue in > org.apache.pdfbox.multipdf.PDFMergerUtility#mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting). > We need to merge a large number of small files. We use mixed mode, memory > and disk for cache. Initially, we would often get "Maximum allowed scratch > file memory exceeded.", unless we turned off the check by passing "-1" to > org.apache.pdfbox.io.MemoryUsageSetting#MemoryUsageSetting. I believe, this > is what the users that opened PDFBOX-3721 where running into. > Our research indicates that the core issue with the memory model is that > instead of sharing a single cache, it breaks it up into equal sized fixed > partitions based on the number of input + output files being merged. This > means that each partition must be big enough to hold the final output file. > When 400 1-page files are merged, this creates 401 partitions, but each of > which needs to be big enough to hold the final 400 pages. Even worse, the > merge algorithm needs to keep all files open until the end. > Given this, near the end of the merge, we're actually caching 400 x 1-page > input files, and 1 x 400-page output file, or 801 pages. > However, with the partitioned cache, we need to declare room for 401 x > 400-pages, or 160,400 pages in total when specifying "maxStorageBytes". This > would be a very high number, usually in GIGs. > > Given the current limitation that we need to keep all the input files open > until the output file is written (HUGE), we came up with 2 options. (See > PDFBOX-4182) > > 1. Good: Split the cache in ½, give ½ to the output file, and segment the > other ½ across the input files. (Still keeping them open until then end). > 2. Better: Dynamically allocate in 16 page (64K) chunks from memory or disk > on demand, release cache as documents are closed after merge. This is our > current implementation till PDFBOX-3999, PDFBOX-4003 and PDFBOX-4004 are > addressed. > > We would like to submit our current implementation as a Patch to 2.0.10 and > 3.0.0, unless this is already addressed. > > Thank you -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438640#comment-16438640 ] Palash Ray commented on PDFBOX-4189: Thanks a lot guys, for the detailed comments. It seems that there is some more work for me to ensure that this patch fits in into the broader scheme of things. I am ready to work with you to make this happen. I think PdfBox is a great piece of software, and I am committed to make it more feature rich. This particular feature is imporant to support any Indian or South East Asian Language. So, in my perspective, I would like to make it happen. John, thanks specially for taking the time out to explain the architecture. Let me do a bit of refactoring, and incorporate your suggestions. I will let you know how that goes. I plan to handle subsetting. Thanks, Palash. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2618) Add an Example to create paragraphs with PDFBox
[ https://issues.apache.org/jira/browse/PDFBOX-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438638#comment-16438638 ] Matthew Broadhead commented on PDFBOX-2618: --- I am evaluating [http://dhorions.github.io/boxable/]. A table cell does not have the ability to contain anything other than text or an image. It allows html to be passed as a string in limited cases which is not a good idea. But none of these libraries are more than proof of concept and it is not advisable to build apps depending on these projects. I am surprised that you are seriously recommending them. They all have disclaimers on their readme pages. Also they are not projects under the Apache umbrella. So the question is what is Apache recommending as the way forward for people who need a high level PDF API? As FOP is now out of the picture due to XSLT no longer being supported in Tomcat or TomEE. > Add an Example to create paragraphs with PDFBox > --- > > Key: PDFBOX-2618 > URL: https://issues.apache.org/jira/browse/PDFBOX-2618 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.0 >Reporter: Tilman Hausherr >Priority: Major > > [~mkl] wrote this morning on stackoverflow on the topic about creating tables > with PDFBox: > {quote}I'm afraid all those samples IMO meely are proofs of concept, probably > of use in limited use cases but by far not for generic use. PDFBox has its > strengths, e.g. a quite versatile content extraction framework and a content > rendering capability, but the absence a proper layouting API is a serious > weakness.{quote} > To which I answered: > {quote}I know... I just don't want to create another iText. We're not the > Samwer brothers.{quote} > But he's right. We could of course look at what iText offers and implement > that on our own, that wouldn't even be illegal, but it wouldn't be nice. I've > never looked at or used iText, except once when answering this: > http://stackoverflow.com/a/26820598/535646 > IMO what we need to start, is a method to write a paragraph to a PDF. Such a > method would have these parameters: > - text > - rectangle (or width and height from current position) > Such a method would then output the text and break the lines at the end of > the rectangle, and throw an exception if the space isn't enough. > *UPDATE*: This will be implemented as an example, using either Java's > built-in TextLayout or ICU4J. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2618) Add an Example to create paragraphs with PDFBox
[ https://issues.apache.org/jira/browse/PDFBOX-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438634#comment-16438634 ] Maruan Sahyoun commented on PDFBOX-2618: Hi, there are already some projects. This is not a complete list though - https://github.com/GlenKPeterson/PdfLayoutManager - http://dhorions.github.io/boxable/ - https://github.com/vandeseer/easytable As for the fop related issues - did you ask how the fop people are thinking about these issues? To replace fop there would be a lot to add to PDFBox- - a layout model - complex script support - hypenation . > Add an Example to create paragraphs with PDFBox > --- > > Key: PDFBOX-2618 > URL: https://issues.apache.org/jira/browse/PDFBOX-2618 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.0 >Reporter: Tilman Hausherr >Priority: Major > > [~mkl] wrote this morning on stackoverflow on the topic about creating tables > with PDFBox: > {quote}I'm afraid all those samples IMO meely are proofs of concept, probably > of use in limited use cases but by far not for generic use. PDFBox has its > strengths, e.g. a quite versatile content extraction framework and a content > rendering capability, but the absence a proper layouting API is a serious > weakness.{quote} > To which I answered: > {quote}I know... I just don't want to create another iText. We're not the > Samwer brothers.{quote} > But he's right. We could of course look at what iText offers and implement > that on our own, that wouldn't even be illegal, but it wouldn't be nice. I've > never looked at or used iText, except once when answering this: > http://stackoverflow.com/a/26820598/535646 > IMO what we need to start, is a method to write a paragraph to a PDF. Such a > method would have these parameters: > - text > - rectangle (or width and height from current position) > Such a method would then output the text and break the lines at the end of > the rectangle, and throw an exception if the space isn't enough. > *UPDATE*: This will be implemented as an example, using either Java's > built-in TextLayout or ICU4J. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438609#comment-16438609 ] Maruan Sahyoun edited comment on PDFBOX-4189 at 4/15/18 8:34 AM: - The patch is a great and - given several questions we had in the past - important addition to PDFBox. On the longer run I'd see some additions we might conceptually already think about and/or start introducing in the public API. As I haven't reviewed the patch the below list is meant to be a hint for possible addition. They may already be included For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt To allow the user to override the language system identified by the script being used we might want to add {{setLanguage/getLanguage}} so that can be called prior to {{showText}} if an override needs to be done. Putting that into an internal {{layout}} method as John suggested would also allow us to put it behind a feature flag where one could enable/disable the processing. We might also mark that feature as **experimental** and specify which languages it has been tested with (to some extend). This is mainly meant to understand which capabilities belong where as I'm looking to add the processing to layout of interactive form field values. was (Author: msahyoun): The patch is a great and - given several questions we had in the past - important addition to PDFBox. On the longer run I'd see some additions we might conceptually already think about and/or start introducing in the public API. As I haven't reviewed the patch the below list is meant to be a hint for possible addition. They may already be included For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt To allow the user to override the language system identified by the script being used we might want to add {{setLanguage/getLanguage}} so that can be called prior to {{showText}} if an override needs to be done. Putting that into an internal {{layout}} method as John suggested would also allow us to put it behind a feature flag where one could enable/disable the processing. We might also mark that feature as **experimental** and specify which languages it has been tested with (to some extend). > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2618) Add an Example to create paragraphs with PDFBox
[ https://issues.apache.org/jira/browse/PDFBOX-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438621#comment-16438621 ] Matthew Broadhead commented on PDFBOX-2618: --- I am trying to migrate from FOP to PDFBox because XSLT support is broken in Tomcat and TomEE due to JSP taglibs ([https://bz.apache.org/bugzilla/show_bug.cgi?id=61875] and https://bz.apache.org/bugzilla/show_bug.cgi?id=27717). This is due to Xalan being broken https://issues.apache.org/jira/browse/XALANJ-2540. Xalan is now abandoned and nobody is going to fix it. Therefore I am assuming that nobody using Tomcat or TomEE should use FOP any more? I agree that PDFBox should not have a high level API but a new project should be started that depends on PDFBox that could at first offer some basic features and slowly migrate everything from FOP. Could be called pdfbox-fop or something. I think it needs to follow the structure of FOP because if a new project starts without learning the lessons of FOP then it will almost certainly run into trouble. It could replicate all the structures as POJOs like fo:block etc. > Add an Example to create paragraphs with PDFBox > --- > > Key: PDFBOX-2618 > URL: https://issues.apache.org/jira/browse/PDFBOX-2618 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.0 >Reporter: Tilman Hausherr >Priority: Major > > [~mkl] wrote this morning on stackoverflow on the topic about creating tables > with PDFBox: > {quote}I'm afraid all those samples IMO meely are proofs of concept, probably > of use in limited use cases but by far not for generic use. PDFBox has its > strengths, e.g. a quite versatile content extraction framework and a content > rendering capability, but the absence a proper layouting API is a serious > weakness.{quote} > To which I answered: > {quote}I know... I just don't want to create another iText. We're not the > Samwer brothers.{quote} > But he's right. We could of course look at what iText offers and implement > that on our own, that wouldn't even be illegal, but it wouldn't be nice. I've > never looked at or used iText, except once when answering this: > http://stackoverflow.com/a/26820598/535646 > IMO what we need to start, is a method to write a paragraph to a PDF. Such a > method would have these parameters: > - text > - rectangle (or width and height from current position) > Such a method would then output the text and break the lines at the end of > the rectangle, and throw an exception if the space isn't enough. > *UPDATE*: This will be implemented as an example, using either Java's > built-in TextLayout or ICU4J. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438609#comment-16438609 ] Maruan Sahyoun commented on PDFBOX-4189: The patch is a great and - given several questions we had in the past - important addition to PDFBox. On the longer run I'd see some additions we might conceptually already think about and/or start introducing in the public API. As I haven't reviewed the patch the below list is meant to be a hint for possible addition. They may already be included For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt To allow the user to override the language system identified by the script being used we might want to add {{setLanguage/getLanguage}} so that can be called prior to {{showText}} if an override needs to be done. Putting that into an internal {{layout}} method as John suggested would also allow us to put it behind a feature flag where one could enable/disable the processing. We might also mark that feature as **experimental** and specify which languages it has been tested with (to some extend). > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org