[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:45 AM:
--

{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code we have for forms, for example). So in layout 
we're really only going to be concerned with GPOS and GSUB features. That way 
the only options that one might want to pass to layout would be the list of 
which [feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

Maybe layout() should be called shapeText() to emphasize this distinction?


was (Author: jahewson):
{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code we have for forms, for example). So in layout 
we're really only going to be concerned with GPOS and GSUB features. That way 
the only options that one might want to pass to layout would be this list of 
which [feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

Maybe layout() should be called shapeText() to emphasize this distinction?

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:44 AM:
--

{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code we have for forms, for example). So in layout 
we're really only going to be concerned with GPOS and GSUB features. That way 
the only options that one might want to pass to layout would be this list of 
which [feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

Maybe layout() should be called shapeText() to emphasize this distinction?


was (Author: jahewson):
{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code form forms, for example). So in layout we're 
really only going to be concerned with GPOS and GSUB features. That way the 
only options that one might want to pass to layout would be this list of which 
[feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

Maybe layout() should be called shapeText() to emphasize this distinction?

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:41 AM:
--

{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code form forms, for example). So in layout we're 
really only going to be concerned with GPOS and GSUB features. That way the 
only options that one might want to pass to layout would be this list of which 
[feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

Maybe layout() should be called shapeText() to emphasize this distinction?


was (Author: jahewson):
{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code form forms, for example). So in layout we're 
really only going to be concerned with GPOS and GSUB features. That way the 
only options that one might want to pass to layout would be this list of which 
[feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:40 AM:
--

{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code form forms, for example). So in layout we're 
really only going to be concerned with GPOS and GSUB features. That way the 
only options that one might want to pass to layout would be this list of which 
[feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.


was (Author: jahewson):
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt

bq. here

BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens 
at a higher level than the proposed layout() - which would be concerned with 
only a single script in a single direction (i.e. only OpenType _shaping_). BASE 
and BiDi are related to changes between different scripts, while JSTF is to aid 
in making good line break choices. So all of that functionality will happen 
somewhere else (this fits very closely with the layout code form forms, for 
example). So in layout we're really only going to be concerned with GPOS and 
GSUB features. That way the only options that one might want to pass to layout 
would be this list of which [feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914
 ] 

John Hewson commented on PDFBOX-4189:
-

For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt

bq. here

BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens 
at a higher level than the proposed layout() - which would be concerned with 
only a single script in a single direction (i.e. only OpenType _shaping_). BASE 
and BiDi are related to changes between different scripts, while JSTF is to aid 
in making good line break choices. So all of that functionality will happen 
somewhere else (this fits very closely with the layout code form forms, for 
example). So in layout we're really only going to be concerned with GPOS and 
GSUB features. That way the only options that one might want to pass to layout 
would be this list of which [feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4188) "Maximum allowed scratch file memory exceeded." Exception when merging large number of small PDFs

2018-04-15 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438699#comment-16438699
 ] 

Maruan Sahyoun commented on PDFBOX-4188:


[~gary.potagal] I've taken a quick look at the patch and would like to discuss 
some topics

- PDFMergerUtility was using {{MemoryUsageSetting getPartitionedCopy}} where 
now the setting is passed on for each PDDocument and is no longer partitioned. 
So although the value used for {{MemoryUsageSetting}} is much lower now isn't 
that at the end the same result?
- I haven't understood the main benefit of the changes done to 
{{MemoryUsageSetting}} and {{ScratchFile}}. What is the reason for these?
- I think the patch should be divided in two parts - the changes to 
{{MemoryUsageSetting}} / {{ScratchFile}} and the changes to PDFMerger with test 
cases to show the improvements for each.
- Do you see a benefit in using {{MappedByteBuffer}}
- the handling of openAction doesn't belong into this patch. It should be part 
of a new issue.
- the code doesn't follow the coding conventions 
https://pdfbox.apache.org/codingconventions.html so there is some effort to 
bring it in line with these. (I think that this section might be difficult to 
find on our website - any suggestions to make it easier to find the information 
is highly appreciated)

Many of the questions are because this part of PDFBox is something I rarely 
touch - so I hope you're a little patient with me.


>  "Maximum allowed scratch file memory exceeded." Exception when merging large 
> number of small PDFs
> --
>
> Key: PDFBOX-4188
> URL: https://issues.apache.org/jira/browse/PDFBOX-4188
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.9, 3.0.0 PDFBox
>Reporter: Gary Potagal
>Priority: Major
> Attachments: PDFBOX-4188-MemoryManagerPatch.zip, 
> PDFBOX-4188-breakingTest.zip, PDFMergerUtility.java-20180412.patch
>
>
>  
> Am 06.04.2018 um 23:10 schrieb Gary Potagal:
>  
> We wanted to address one more merge issue in 
> org.apache.pdfbox.multipdf.PDFMergerUtility#mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting).
> We need to merge a large number of small files.  We use mixed mode, memory 
> and disk for cache.  Initially, we would often get "Maximum allowed scratch 
> file memory exceeded.", unless we turned off the check by passing "-1" to 
> org.apache.pdfbox.io.MemoryUsageSetting#MemoryUsageSetting.  I believe, this 
> is what the users that opened PDFBOX-3721 where running into.
> Our research indicates that the core issue with the memory model is that 
> instead of sharing a single cache, it breaks it up into equal sized fixed 
> partitions based on the number of input + output files being merged.  This 
> means that each partition must be big enough to hold the final output file.  
> When 400 1-page files are merged, this creates 401 partitions, but each of 
> which needs to be big enough to hold the final 400 pages.  Even worse, the 
> merge algorithm needs to keep all files open until the end.
> Given this, near the end of the merge, we're actually caching 400 x 1-page 
> input files, and 1 x 400-page output file, or 801 pages.
> However, with the partitioned cache, we need to declare room for 401  x 
> 400-pages, or 160,400 pages in total when specifying "maxStorageBytes".  This 
> would be a very high number, usually in GIGs.
>  
> Given the current limitation that we need to keep all the input files open 
> until the output file is written (HUGE), we came up with 2 options.  (See 
> PDFBOX-4182)  
>  
> 1.  Good: Split the cache in ½, give ½ to the output file, and segment the 
> other ½ across the input files. (Still keeping them open until then end).
> 2.  Better: Dynamically allocate in 16 page (64K) chunks from memory or disk 
> on demand, release cache as documents are closed after merge.  This is our 
> current implementation till PDFBOX-3999, PDFBOX-4003 and PDFBOX-4004 are 
> addressed.
>  
> We would like to submit our current implementation as a Patch to 2.0.10 and 
> 3.0.0, unless this is already addressed.
>  
>  Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread Palash Ray (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438640#comment-16438640
 ] 

Palash Ray commented on PDFBOX-4189:


Thanks a lot guys, for the detailed comments. It seems that there is some more 
work for me to ensure that this patch fits in into the broader scheme of 
things. I am ready to work with you to make this happen. I think PdfBox is a 
great piece of software, and I am committed to make it more feature rich. This 
particular feature is imporant to support any Indian or South East Asian 
Language. So, in my perspective, I would like to make it happen.

 

John, thanks specially for taking the time out to explain the architecture. Let 
me do a bit of refactoring, and incorporate your suggestions. I will let you 
know how that goes. I plan to handle subsetting.

 

Thanks,

Palash.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2618) Add an Example to create paragraphs with PDFBox

2018-04-15 Thread Matthew Broadhead (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438638#comment-16438638
 ] 

Matthew Broadhead commented on PDFBOX-2618:
---

I am evaluating [http://dhorions.github.io/boxable/].  A table cell does not 
have the ability to contain anything other than text or an image.  It allows 
html to be passed as a string in limited cases which is not a good idea.  But 
none of these libraries are more than proof of concept and it is not advisable 
to build apps depending on these projects.  I am surprised that you are 
seriously recommending them. They all have disclaimers on their readme pages.  
Also they are not projects under the Apache umbrella.  So the question is what 
is Apache recommending as the way forward for people who need a high level PDF 
API?  As FOP is now out of the picture due to XSLT no longer being supported in 
Tomcat or TomEE.

> Add an Example to create paragraphs with PDFBox
> ---
>
> Key: PDFBOX-2618
> URL: https://issues.apache.org/jira/browse/PDFBOX-2618
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Priority: Major
>
> [~mkl] wrote this morning on stackoverflow on the topic about creating tables 
> with PDFBox: 
> {quote}I'm afraid all those samples IMO meely are proofs of concept, probably 
> of use in limited use cases but by far not for generic use. PDFBox has its 
> strengths, e.g. a quite versatile content extraction framework and a content 
> rendering capability, but the absence a proper layouting API is a serious 
> weakness.{quote}
> To which I answered:
> {quote}I know... I just don't want to create another iText. We're not the 
> Samwer brothers.{quote}
> But he's right. We could of course look at what iText offers and implement 
> that on our own, that wouldn't even be illegal, but it wouldn't be nice. I've 
> never looked at or used iText, except once when answering this: 
> http://stackoverflow.com/a/26820598/535646
> IMO what we need to start, is a method to write a paragraph to a PDF. Such a 
> method would have these parameters:
> - text
> - rectangle (or width and height from current position)
> Such a method would then output the text and break the lines at the end of 
> the rectangle, and throw an exception if the space isn't enough.
> *UPDATE*: This will be implemented as an example, using either Java's 
> built-in TextLayout or ICU4J.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2618) Add an Example to create paragraphs with PDFBox

2018-04-15 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438634#comment-16438634
 ] 

Maruan Sahyoun commented on PDFBOX-2618:


Hi,

there are already some projects. This is not a complete list though

- https://github.com/GlenKPeterson/PdfLayoutManager
- http://dhorions.github.io/boxable/
- https://github.com/vandeseer/easytable

As for the fop related issues - did you ask how the fop people are thinking 
about these issues?

To replace fop there would be a lot to add to PDFBox-
- a layout model
- complex script support
- hypenation
.



> Add an Example to create paragraphs with PDFBox
> ---
>
> Key: PDFBOX-2618
> URL: https://issues.apache.org/jira/browse/PDFBOX-2618
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Priority: Major
>
> [~mkl] wrote this morning on stackoverflow on the topic about creating tables 
> with PDFBox: 
> {quote}I'm afraid all those samples IMO meely are proofs of concept, probably 
> of use in limited use cases but by far not for generic use. PDFBox has its 
> strengths, e.g. a quite versatile content extraction framework and a content 
> rendering capability, but the absence a proper layouting API is a serious 
> weakness.{quote}
> To which I answered:
> {quote}I know... I just don't want to create another iText. We're not the 
> Samwer brothers.{quote}
> But he's right. We could of course look at what iText offers and implement 
> that on our own, that wouldn't even be illegal, but it wouldn't be nice. I've 
> never looked at or used iText, except once when answering this: 
> http://stackoverflow.com/a/26820598/535646
> IMO what we need to start, is a method to write a paragraph to a PDF. Such a 
> method would have these parameters:
> - text
> - rectangle (or width and height from current position)
> Such a method would then output the text and break the lines at the end of 
> the rectangle, and throw an exception if the space isn't enough.
> *UPDATE*: This will be implemented as an example, using either Java's 
> built-in TextLayout or ICU4J.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438609#comment-16438609
 ] 

Maruan Sahyoun edited comment on PDFBOX-4189 at 4/15/18 8:34 AM:
-

The patch is a great and - given several questions we had in the past - 
important addition to PDFBox.

On the longer run I'd see some additions we might conceptually already think 
about and/or start introducing in the public API. As I haven't reviewed the 
patch the below list is meant to be a hint for possible addition. They may 
already be included

For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt

To allow the user to override the language system identified by the script 
being used we might want to add {{setLanguage/getLanguage}} so that can be 
called prior to {{showText}} if an override needs to be done.

Putting that into an internal {{layout}} method as John suggested would also 
allow us to put it behind a feature flag where one could enable/disable the 
processing. We might also mark that feature as **experimental** and specify 
which languages it has been tested with (to some extend).

This is mainly meant to understand which capabilities belong where as I'm 
looking to add the processing to layout of interactive form field values.


was (Author: msahyoun):
The patch is a great and - given several questions we had in the past - 
important addition to PDFBox.

On the longer run I'd see some additions we might conceptually already think 
about and/or start introducing in the public API. As I haven't reviewed the 
patch the below list is meant to be a hint for possible addition. They may 
already be included

For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt

To allow the user to override the language system identified by the script 
being used we might want to add {{setLanguage/getLanguage}} so that can be 
called prior to {{showText}} if an override needs to be done.

Putting that into an internal {{layout}} method as John suggested would also 
allow us to put it behind a feature flag where one could enable/disable the 
processing. We might also mark that feature as **experimental** and specify 
which languages it has been tested with (to some extend).

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2618) Add an Example to create paragraphs with PDFBox

2018-04-15 Thread Matthew Broadhead (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438621#comment-16438621
 ] 

Matthew Broadhead commented on PDFBOX-2618:
---

I am trying to migrate from FOP to PDFBox because XSLT support is broken in 
Tomcat and TomEE due to JSP taglibs 
([https://bz.apache.org/bugzilla/show_bug.cgi?id=61875] and 
https://bz.apache.org/bugzilla/show_bug.cgi?id=27717).  This is due to Xalan 
being broken https://issues.apache.org/jira/browse/XALANJ-2540.  Xalan is now 
abandoned and nobody is going to fix it.  Therefore I am assuming that nobody 
using Tomcat or TomEE should use FOP any more?

I agree that PDFBox should not have a high level API but a new project should 
be started that depends on PDFBox that could at first offer some basic features 
and slowly migrate everything from FOP.  Could be called pdfbox-fop or 
something.  I think it needs to follow the structure of FOP because if a new 
project starts without learning the lessons of FOP then it will almost 
certainly run into trouble.

It could replicate all the structures as POJOs like fo:block etc. 

> Add an Example to create paragraphs with PDFBox
> ---
>
> Key: PDFBOX-2618
> URL: https://issues.apache.org/jira/browse/PDFBOX-2618
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Priority: Major
>
> [~mkl] wrote this morning on stackoverflow on the topic about creating tables 
> with PDFBox: 
> {quote}I'm afraid all those samples IMO meely are proofs of concept, probably 
> of use in limited use cases but by far not for generic use. PDFBox has its 
> strengths, e.g. a quite versatile content extraction framework and a content 
> rendering capability, but the absence a proper layouting API is a serious 
> weakness.{quote}
> To which I answered:
> {quote}I know... I just don't want to create another iText. We're not the 
> Samwer brothers.{quote}
> But he's right. We could of course look at what iText offers and implement 
> that on our own, that wouldn't even be illegal, but it wouldn't be nice. I've 
> never looked at or used iText, except once when answering this: 
> http://stackoverflow.com/a/26820598/535646
> IMO what we need to start, is a method to write a paragraph to a PDF. Such a 
> method would have these parameters:
> - text
> - rectangle (or width and height from current position)
> Such a method would then output the text and break the lines at the end of 
> the rectangle, and throw an exception if the space isn't enough.
> *UPDATE*: This will be implemented as an example, using either Java's 
> built-in TextLayout or ICU4J.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438609#comment-16438609
 ] 

Maruan Sahyoun commented on PDFBOX-4189:


The patch is a great and - given several questions we had in the past - 
important addition to PDFBox.

On the longer run I'd see some additions we might conceptually already think 
about and/or start introducing in the public API. As I haven't reviewed the 
patch the below list is meant to be a hint for possible addition. They may 
already be included

For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt

To allow the user to override the language system identified by the script 
being used we might want to add {{setLanguage/getLanguage}} so that can be 
called prior to {{showText}} if an override needs to be done.

Putting that into an internal {{layout}} method as John suggested would also 
allow us to put it behind a feature flag where one could enable/disable the 
processing. We might also mark that feature as **experimental** and specify 
which languages it has been tested with (to some extend).

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org